Parallel heap vacuum
Hi all,
The parallel vacuum we have today supports only for index vacuuming.
Therefore, while multiple workers can work on different indexes in
parallel, the heap table is always processed by the single process.
I'd like to propose $subject, which enables us to have multiple
workers running on the single heap table. This would be helpful to
speedup vacuuming for tables without indexes or tables with
INDEX_CLENAUP = off.
I've attached a PoC patch for this feature. It implements only
parallel heap scans in lazyvacum. We can extend this feature to
support parallel heap vacuum as well in the future or in the same
patch.
# Overall idea (for parallel heap scan in lazy vacuum)
At the beginning of vacuum, we determine how many workers to launch
based on the table size like other parallel query operations. The
number of workers is capped by max_parallel_maitenance_workers. Once
we decided to use parallel heap scan, we prepared DSM to share data
among parallel workers and leader. The information include at least
the vacuum option such as aggressive, the counters collected during
lazy vacuum such as scanned_pages, vacuum cutoff such as VacuumCutoffs
and GlobalVisState, and parallel scan description.
Before starting heap scan in lazy vacuum, we launch parallel workers
and then each worker (and the leader) process different blocks. Each
worker does HOT-pruning on pages and collects dead tuple TIDs. When
adding dead tuple TIDs, workers need to hold an exclusive lock on
TidStore. At the end of heap scan phase, workers exit and the leader
will wait for all workers to exit. After that, the leader process
gather the counters collected by parallel workers, and compute the
oldest relfrozenxid (and relminmxid). Then if parallel index vacuum is
also enabled, we launch other parallel workers for parallel index
vacuuming.
When it comes to parallel heap scan in lazy vacuum, I think we can use
the table_block_parallelscan_XXX() family. One tricky thing we need to
deal with is that if the TideStore memory usage reaches the limit, we
stop the parallel scan, do index vacuum and table vacuum, and then
resume the parallel scan from the previous state. In order to do that,
in the patch, we store ParallelBlockTableScanWorker, per-worker
parallel scan state, into DSM so that different parallel workers can
resume the scan using the same parallel scan state.
In addition to that, since we could end up launching fewer workers
than requested, it could happen that some ParallelBlockTableScanWorker
data is used once and never be used while remaining unprocessed
blocks. To handle this case, in the patch, the leader process checks
at the end of the parallel scan if there is an uncompleted parallel
scan. If so, the leader process does the scan using worker's
ParallelBlockTableScanWorker data on behalf of workers.
# Discussions
I'm somewhat convinced the brief design of this feature, but there are
some points regarding the implementation we need to discuss.
In the patch, I extended vacuumparalle.c to support parallel table
scan (and vacuum in the future). So I was required to add some table
AM callbacks such as DSM size estimation, DSM initialization, and
actual table scans etc. We need to verify these APIs are appropriate.
Specifically, if we want to support both parallel heap scan and
parallel heap vacuum, do we want to add separate callbacks for them?
It could be overkill since such a 2-pass vacuum strategy is specific
to heap AM.
As another implementation idea, we might want to implement parallel
heap scan/vacuum in lazyvacuum.c while minimizing changes for
vacuumparallel.c. That way, we would not need to add table AM
callbacks. However, we would end up having duplicate codes related to
parallel operation in vacuum such as vacuum delays.
Also, we might need to add some functions to share GlobalVisState
among parallel workers, since GlobalVisState is a private struct.
Other points I'm somewhat uncomfortable with or need to be discussed
remain in the code with XXX comments.
# Benchmark results
* Test-1: parallel heap scan on the table without indexes
I created 20GB table, made garbage on the table, and run vacuum while
changing parallel degree:
create unlogged table test (a int) with (autovacuum_enabled = off);
insert into test select generate_series(1, 600000000); --- 20GB table
delete from test where a % 5 = 0;
vacuum (verbose, parallel 0) test;
Here are the results (total time and heap scan time):
PARALLEL 0: 21.99 s (single process)
PARALLEL 1: 11.39 s
PARALLEL 2: 8.36 s
PARALLEL 3: 6.14 s
PARALLEL 4: 5.08 s
* Test-2: parallel heap scan on the table with one index
I used a similar table to the test case 1 but created one btree index on it:
create unlogged table test (a int) with (autovacuum_enabled = off);
insert into test select generate_series(1, 600000000); --- 20GB table
create index on test (a);
delete from test where a % 5 = 0;
vacuum (verbose, parallel 0) test;
I've measured the total execution time as well as the time of each
vacuum phase (from left heap scan time, index vacuum time, and heap
vacuum time):
PARALLEL 0: 45.11 s (21.89, 16.74, 6.48)
PARALLEL 1: 42.13 s (12.75, 22.04, 7.23)
PARALLEL 2: 39.27 s (8.93, 22.78, 7.45)
PARALLEL 3: 36.53 s (6.76, 22.00, 7.65)
PARALLEL 4: 35.84 s (5.85, 22.04, 7.83)
Overall, I can see the parallel heap scan in lazy vacuum has a decent
scalability; In both test-1 and test-2, the execution time of heap
scan got ~4x faster with 4 parallel workers. On the other hand, when
it comes to the total vacuum execution time, I could not see much
performance improvement in test-2 (45.11 vs. 35.84). Looking at the
results PARALLEL 0 vs. PARALLEL 1 in test-2, the heap scan got faster
(21.89 vs. 12.75) whereas index vacuum got slower (16.74 vs. 22.04),
and heap scan in case 2 was not as fast as in case 1 with 1 parallel
worker (12.75 vs. 11.39).
I think the reason is the shared TidStore is not very scalable since
we have a single lock on it. In all cases in the test-1, we don't use
the shared TidStore since all dead tuples are removed during heap
pruning. So the scalability was better overall than in test-2. In
parallel 0 case in test-2, we use the local TidStore, and from
parallel degree of 1 in test-2, we use the shared TidStore and
parallel worker concurrently update it. Also, I guess that the lookup
performance of the local TidStore is better than the shared TidStore's
lookup performance because of the differences between a bump context
and an DSA area. I think that this difference contributed the fact
that index vacuuming got slower (16.74 vs. 22.04).
There are two obvious improvement ideas to improve overall vacuum
execution time: (1) improve the shared TidStore scalability and (2)
support parallel heap vacuum. For (1), several ideas are proposed by
the ART authors[1]https://db.in.tum.de/~leis/papers/artsync.pdf. I've not tried these ideas but it might be
applicable to our ART implementation. But I prefer to start with (2)
since it would be easier. Feedback is very welcome.
Regards,
[1]: https://db.in.tum.de/~leis/papers/artsync.pdf
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
parallel_heap_vacuum_scan.patchapplication/x-patch; name=parallel_heap_vacuum_scan.patchDownload
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6f8b1b7929..cf8c6614cd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2630,6 +2630,12 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_data = heapam_relation_copy_data,
.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
.relation_vacuum = heap_vacuum_rel,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,
+ .parallel_vacuum_estimate = heap_parallel_vacuum_estimate,
+ .parallel_vacuum_initialize = heap_parallel_vacuum_initialize,
+ .parallel_vacuum_scan_worker = heap_parallel_vacuum_scan_worker,
+
.scan_analyze_next_block = heapam_scan_analyze_next_block,
.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
.index_build_range_scan = heapam_index_build_range_scan,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3f88cf1e8e..4ccf15ffe3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -49,6 +49,7 @@
#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
+#include "optimizer/paths.h"
#include "pgstat.h"
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
@@ -117,10 +118,22 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * Macro to check if we are in a parallel vacuum. If true, we are in the
- * parallel mode and the DSM segment is initialized.
+ * DSM keys for heap parallel vacuum scan. Unlike other parallel execution code, we
+ * we don't need to worry about DSM keys conflicting with plan_node_id, but need to
+ * avoid conflicting with DSM keys used in vacuumparallel.c.
+ */
+#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003
+
+/*
+ * Macro to check if we are in a parallel vacuum. If ParallelVacuumIsActive() is
+ * true, we are in the parallel mode, meaning that we do either parallel index
+ * vacuuming or parallel table vacuuming, or both. If ParallelHeapVacuumIsActive()
+ * is true, we do at least parallel table vacuuming.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
+#define ParallelHeapVacuumIsActive(vacrel) ((vacrel)->phvstate != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -133,6 +146,80 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;
+/*
+ * Relation statistics collected during heap scanning and need to be shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelCounters
+{
+ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation truncation */
+ BlockNumber frozen_pages; /* # pages with newly frozen tuples */
+ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
+ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+
+ /* Counters that follow are only for scanned_pages */
+ int64 tuples_deleted; /* # deleted from table */
+ int64 tuples_frozen; /* # newly frozen */
+ int64 lpdead_items; /* # deleted from indexes */
+ int64 live_tuples; /* # live tuples remaining */
+ int64 recently_dead_tuples; /* # dead, but not yet removable */
+ int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid. */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+} LVRelCounters;
+
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
+ /* The initial values shared by the leader process */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+
+ /* VACUUM operation's cutoffs for freezing and pruning */
+ struct VacuumCutoffs cutoffs;
+ GlobalVisState vistest;
+
+ LVRelCounters worker_relcnts[FLEXIBLE_ARRAY_MEMBER];
+} PHVShared;
+#define SizeOfPHVShared (offsetof(PHVShared, worker_relcnts))
+
+/* Per-worker scan state */
+typedef struct PHVScanWorkerState
+{
+ ParallelBlockTableScanWorkerData state;
+ bool maybe_have_blocks;
+} PHVScanWorkerState;
+
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
+
+ /* Per-worker scan state */
+ PHVScanWorkerState *myscanstate;
+
+ /* Points to all per-worker scan state array */
+ PHVScanWorkerState *scanstates;
+
+ /* The number of workers launched for parallel heap vacuum */
+ int nworkers_launched;
+} PHVState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -144,6 +231,12 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
+ /* Parallel heap vacuum state and sizes for each struct */
+ PHVState *phvstate;
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
@@ -159,10 +252,6 @@ typedef struct LVRelState
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
- /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
- TransactionId NewRelfrozenXid;
- MultiXactId NewRelminMxid;
- bool skippedallvis;
/* Error reporting state */
char *dbname;
@@ -188,12 +277,10 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
- BlockNumber removed_pages; /* # pages removed by relation truncation */
- BlockNumber frozen_pages; /* # pages with newly frozen tuples */
- BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
- BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber next_fsm_block_to_vacuum;
+
+ /* Block and tuple counters for the relation */
+ LVRelCounters *counters;
/* Statistics output by us, for table */
double new_rel_tuples; /* new estimated total # of tuples */
@@ -203,13 +290,6 @@ typedef struct LVRelState
/* Instrumentation counters */
int num_index_scans;
- /* Counters that follow are only for scanned_pages */
- int64 tuples_deleted; /* # deleted from table */
- int64 tuples_frozen; /* # newly frozen */
- int64 lpdead_items; /* # deleted from indexes */
- int64 live_tuples; /* # live tuples remaining */
- int64 recently_dead_tuples; /* # dead, but not yet removable */
- int64 missed_dead_tuples; /* # removable, but not removed */
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
@@ -229,6 +309,7 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
+static bool do_lazy_scan_heap(LVRelState *vacrel);
static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
bool *all_visible_according_to_vm);
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
@@ -271,6 +352,12 @@ static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
+
+
+static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
+static void parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel);
+static void parallel_heap_complete_unfinised_scan(LVRelState *vacrel);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -296,6 +383,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
+ LVRelCounters *counters;
bool verbose,
instrument,
skipwithvm,
@@ -406,14 +494,28 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
+ vacrel->next_fsm_block_to_vacuum = 0;
+
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->frozen_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
- /* dead_items_alloc allocates vacrel->dead_items later on */
+ counters = palloc(sizeof(LVRelCounters));
+ counters->scanned_pages = 0;
+ counters->removed_pages = 0;
+ counters->frozen_pages = 0;
+ counters->lpdead_item_pages = 0;
+ counters->missed_dead_pages = 0;
+ counters->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ counters->tuples_deleted = 0;
+ counters->tuples_frozen = 0;
+ counters->lpdead_items = 0;
+ counters->live_tuples = 0;
+ counters->recently_dead_tuples = 0;
+ counters->missed_dead_tuples = 0;
+
+ vacrel->counters = counters;
+
+ vacrel->num_index_scans = 0;
/* Allocate/initialize output statistics state */
vacrel->new_rel_tuples = 0;
@@ -421,14 +523,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
- /* Initialize remaining counters (be tidy) */
- vacrel->num_index_scans = 0;
- vacrel->tuples_deleted = 0;
- vacrel->tuples_frozen = 0;
- vacrel->lpdead_items = 0;
- vacrel->live_tuples = 0;
- vacrel->recently_dead_tuples = 0;
- vacrel->missed_dead_tuples = 0;
+ /* dead_items_alloc allocates vacrel->dead_items later on */
/*
* Get cutoffs that determine which deleted tuples are considered DEAD,
@@ -450,9 +545,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
/* Initialize state used to track oldest extant XID/MXID */
- vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
- vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- vacrel->skippedallvis = false;
+ vacrel->counters->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
+ vacrel->counters->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+ vacrel->counters->skippedallvis = false;
skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
{
@@ -533,15 +628,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
* Non-aggressive VACUUMs may advance them by any amount, or not at all.
*/
- Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
+ Assert(vacrel->counters->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
vacrel->cutoffs.relfrozenxid,
- vacrel->NewRelfrozenXid));
- Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
+ vacrel->counters->NewRelfrozenXid));
+ Assert(vacrel->counters->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
vacrel->cutoffs.relminmxid,
- vacrel->NewRelminMxid));
- if (vacrel->skippedallvis)
+ vacrel->counters->NewRelminMxid));
+ if (vacrel->counters->skippedallvis)
{
/*
* Must keep original relfrozenxid in a non-aggressive VACUUM that
@@ -549,8 +644,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* values will have missed unfrozen XIDs from the pages we skipped.
*/
Assert(!vacrel->aggressive);
- vacrel->NewRelfrozenXid = InvalidTransactionId;
- vacrel->NewRelminMxid = InvalidMultiXactId;
+ vacrel->counters->NewRelfrozenXid = InvalidTransactionId;
+ vacrel->counters->NewRelminMxid = InvalidMultiXactId;
}
/*
@@ -571,7 +666,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
*/
vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
new_rel_allvisible, vacrel->nindexes > 0,
- vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
+ vacrel->counters->NewRelfrozenXid, vacrel->counters->NewRelminMxid,
&frozenxid_updated, &minmulti_updated, false);
/*
@@ -587,8 +682,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
- vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples);
+ vacrel->counters->recently_dead_tuples +
+ vacrel->counters->missed_dead_tuples);
pgstat_progress_end_command();
if (instrument)
@@ -651,21 +746,21 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
- vacrel->removed_pages,
+ vacrel->counters->removed_pages,
new_rel_pages,
- vacrel->scanned_pages,
+ vacrel->counters->scanned_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->scanned_pages / orig_rel_pages);
+ 100.0 * vacrel->counters->scanned_pages / orig_rel_pages);
appendStringInfo(&buf,
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
- (long long) vacrel->tuples_deleted,
+ (long long) vacrel->counters->tuples_deleted,
(long long) vacrel->new_rel_tuples,
- (long long) vacrel->recently_dead_tuples);
- if (vacrel->missed_dead_tuples > 0)
+ (long long) vacrel->counters->recently_dead_tuples);
+ if (vacrel->counters->missed_dead_tuples > 0)
appendStringInfo(&buf,
_("tuples missed: %lld dead from %u pages not removed due to cleanup lock contention\n"),
- (long long) vacrel->missed_dead_tuples,
- vacrel->missed_dead_pages);
+ (long long) vacrel->counters->missed_dead_tuples,
+ vacrel->counters->missed_dead_pages);
diff = (int32) (ReadNextTransactionId() -
vacrel->cutoffs.OldestXmin);
appendStringInfo(&buf,
@@ -673,25 +768,25 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->cutoffs.OldestXmin, diff);
if (frozenxid_updated)
{
- diff = (int32) (vacrel->NewRelfrozenXid -
+ diff = (int32) (vacrel->counters->NewRelfrozenXid -
vacrel->cutoffs.relfrozenxid);
appendStringInfo(&buf,
_("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
- vacrel->NewRelfrozenXid, diff);
+ vacrel->counters->NewRelfrozenXid, diff);
}
if (minmulti_updated)
{
- diff = (int32) (vacrel->NewRelminMxid -
+ diff = (int32) (vacrel->counters->NewRelminMxid -
vacrel->cutoffs.relminmxid);
appendStringInfo(&buf,
_("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
- vacrel->NewRelminMxid, diff);
+ vacrel->counters->NewRelminMxid, diff);
}
appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %lld tuples frozen\n"),
- vacrel->frozen_pages,
+ vacrel->counters->frozen_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->frozen_pages / orig_rel_pages,
- (long long) vacrel->tuples_frozen);
+ 100.0 * vacrel->counters->frozen_pages / orig_rel_pages,
+ (long long) vacrel->counters->tuples_frozen);
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
@@ -711,10 +806,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
appendStringInfo(&buf, msgfmt,
- vacrel->lpdead_item_pages,
+ vacrel->counters->lpdead_item_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
- (long long) vacrel->lpdead_items);
+ 100.0 * vacrel->counters->lpdead_item_pages / orig_rel_pages,
+ (long long) vacrel->counters->lpdead_items);
for (int i = 0; i < vacrel->nindexes; i++)
{
IndexBulkDeleteResult *istat = vacrel->indstats[i];
@@ -815,14 +910,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- blkno,
- next_fsm_block_to_vacuum = 0;
- bool all_visible_according_to_vm;
-
- TidStore *dead_items = vacrel->dead_items;
+ BlockNumber rel_pages = vacrel->rel_pages;
VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
- Buffer vmbuffer = InvalidBuffer;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -842,14 +931,78 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);
+
+ vacrel->blkno = InvalidBlockNumber;
+
+ /* report that everything is now scanned */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, rel_pages);
+
+ /* now we can compute the new value for pg_class.reltuples */
+ vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
+ vacrel->counters->scanned_pages,
+ vacrel->counters->live_tuples);
+
+ /*
+ * Also compute the total number of surviving heap entries. In the
+ * (unlikely) scenario that new_live_tuples is -1, take it as zero.
+ */
+ vacrel->new_rel_tuples =
+ Max(vacrel->new_live_tuples, 0) + vacrel->counters->recently_dead_tuples +
+ vacrel->counters->missed_dead_tuples;
+
+ /*
+ * Do index vacuuming (call each index's ambulkdelete routine), then do
+ * related heap vacuuming
+ */
+ if (dead_items_info->num_items > 0)
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the remainder of the Free Space Map. We must do this whether or
+ * not there were indexes, and whether or not we bypassed index vacuuming.
+ */
+ if (rel_pages > vacrel->next_fsm_block_to_vacuum)
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ rel_pages);
+
+ /* report all blocks vacuumed */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, rel_pages);
+
+ /* Do final index cleanup (call each index's amvacuumcleanup routine) */
+ if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
+ lazy_cleanup_all_indexes(vacrel);
+}
+
+/*
+ * Workhorse for lazy_scan_heap().
+ *
+ * Return true if we processed all blocks, otherwise false if we exit from this function
+ * while not completing the heap scan due to full of dead item TIDs. In serial heap scan
+ * case, this function always returns true. In parallel heap vacuum scan, this function
+ * is called by both worker processes and the leader process, and could return false.
+ */
+static bool
+do_lazy_scan_heap(LVRelState *vacrel)
+{
+ bool all_visible_according_to_vm;
+ TidStore *dead_items = vacrel->dead_items;
+ VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
+ BlockNumber blkno;
+ Buffer vmbuffer = InvalidBuffer;
+ bool scan_done = true;
+
while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
{
- Buffer buf;
- Page page;
- bool has_lpdead_items;
- bool got_cleanup_lock = false;
+ Buffer buf;
+ Page page;
+ bool has_lpdead_items;
+ bool got_cleanup_lock = false;
- vacrel->scanned_pages++;
+ vacrel->counters->scanned_pages++;
/* Report as block scanned, update error traceback information */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
@@ -867,46 +1020,10 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (!IsParallelWorker() &&
+ vacrel->counters->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
- /*
- * Consider if we definitely have enough space to process TIDs on page
- * already. If we are close to overrunning the available space for
- * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
- * this page.
- */
- if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
- {
- /*
- * Before beginning index vacuuming, we release any pin we may
- * hold on the visibility map page. This isn't necessary for
- * correctness, but we do it anyway to avoid holding the pin
- * across a lengthy, unrelated operation.
- */
- if (BufferIsValid(vmbuffer))
- {
- ReleaseBuffer(vmbuffer);
- vmbuffer = InvalidBuffer;
- }
-
- /* Perform a round of index and heap vacuuming */
- vacrel->consider_bypass_optimization = false;
- lazy_vacuum(vacrel);
-
- /*
- * Vacuum the Free Space Map to make newly-freed space visible on
- * upper-level FSM pages. Note we have not yet processed blkno.
- */
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
-
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
- }
-
/*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
@@ -994,10 +1111,14 @@ lazy_scan_heap(LVRelState *vacrel)
* also be no opportunity to update the FSM later, because we'll never
* revisit this page. Since updating the FSM is desirable but not
* absolutely required, that's OK.
+ *
+ * XXX: in parallel heap scan, some blocks before blkno might not
+ * been processed yet. Is it worth vacuuming FSM?
*/
- if (vacrel->nindexes == 0
- || !vacrel->do_index_vacuuming
- || !has_lpdead_items)
+ if (!IsParallelWorker() &&
+ (vacrel->nindexes == 0
+ || !vacrel->do_index_vacuuming
+ || !has_lpdead_items))
{
Size freespace = PageGetHeapFreeSpace(page);
@@ -1011,57 +1132,144 @@ lazy_scan_heap(LVRelState *vacrel)
* held the cleanup lock and lazy_scan_prune() was called.
*/
if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
- blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ blkno - vacrel->next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
blkno);
- next_fsm_block_to_vacuum = blkno;
+ vacrel->next_fsm_block_to_vacuum = blkno;
}
}
else
UnlockReleaseBuffer(buf);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
+ {
+ /*
+ * Before beginning index vacuuming, we release any pin we may
+ * hold on the visibility map page. This isn't necessary for
+ * correctness, but we do it anyway to avoid holding the pin
+ * across a lengthy, unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ /*
+ * In parallel heap vacuum case, both the leader process and the
+ * worker processes have to exit without invoking index and heap
+ * vacuuming. The leader process will wait for all workers to
+ * finish and perform index and heap vacuuming.
+ */
+ scan_done = false;
+ break;
+ }
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ *
+ * XXX: in parallel heap scan, some blocks before blkno might not
+ * been processed yet. Is it worth vacuuming FSM?
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = blkno;
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ continue;
+ }
}
- vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
- /* report that everything is now scanned */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ return scan_done;
+}
- /* now we can compute the new value for pg_class.reltuples */
- vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scanned_pages,
- vacrel->live_tuples);
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
+{
+ PHVState *phvstate = vacrel->phvstate;
+ BlockNumber next_block;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 mapbits = 0;
- /*
- * Also compute the total number of surviving heap entries. In the
- * (unlikely) scenario that new_live_tuples is -1, take it as zero.
- */
- vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples;
+ Assert(ParallelHeapVacuumIsActive(vacrel));
- /*
- * Do index vacuuming (call each index's ambulkdelete routine), then do
- * related heap vacuuming
- */
- if (dead_items_info->num_items > 0)
- lazy_vacuum(vacrel);
+ for (;;)
+ {
+ next_block = table_block_parallelscan_nextpage(vacrel->rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
- /*
- * Vacuum the remainder of the Free Space Map. We must do this whether or
- * not there were indexes, and whether or not we bypassed index vacuuming.
- */
- if (blkno > next_fsm_block_to_vacuum)
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
+ /* Have we reached the end of the table? */
+ if (!BlockNumberIsValid(next_block) || next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
- /* report all blocks vacuumed */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
- /* Do final index cleanup (call each index's amvacuumcleanup routine) */
- if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
- lazy_cleanup_all_indexes(vacrel);
+ /* We always treat the last block as unsafe to skip */
+ if (next_block == vacrel->rel_pages - 1)
+ break;
+
+ mapbits = visibilitymap_get_status(vacrel->rel, next_block, &vmbuffer);
+
+ /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
+ if (!vacrel->skipwithvm)
+ break;
+
+ /*
+ * Aggressive VACUUM caller can't skip pages just because they are
+ * all-visible.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;
+
+ /*
+ * All-visible block is safe to skip in non-aggressive case. But
+ * remember that the final range contains such a block for later.
+ */
+ vacrel->counters->skippedallvis = true;
+ }
+ }
+
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
+
+ *blkno = next_block;
+ *all_visible_according_to_vm = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ return true;
}
/*
@@ -1088,6 +1296,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
BlockNumber next_block;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ return heap_vac_scan_next_block_parallel(vacrel, blkno, all_visible_according_to_vm);
+
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1137,7 +1348,7 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
next_block = vacrel->next_unskippable_block;
if (skipsallvis)
- vacrel->skippedallvis = true;
+ vacrel->counters->skippedallvis = true;
}
}
@@ -1210,7 +1421,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/*
* Caller must scan the last page to determine whether it has tuples
- * (caller must have the opportunity to set vacrel->nonempty_pages).
+ * (caller must have the opportunity to set vacrel->counters->nonempty_pages).
* This rule avoids having lazy_truncate_heap() take access-exclusive
* lock on rel to attempt a truncation that fails anyway, just because
* there are tuples on the last page (it is likely that there will be
@@ -1439,10 +1650,10 @@ lazy_scan_prune(LVRelState *vacrel,
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
- &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+ &vacrel->counters->NewRelfrozenXid, &vacrel->counters->NewRelminMxid);
Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
- Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+ Assert(TransactionIdIsValid(vacrel->counters->NewRelfrozenXid));
if (presult.nfrozen > 0)
{
@@ -1451,7 +1662,7 @@ lazy_scan_prune(LVRelState *vacrel,
* nfrozen == 0, since it only counts pages with newly frozen tuples
* (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->frozen_pages++;
+ vacrel->counters->frozen_pages++;
}
/*
@@ -1486,7 +1697,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if (presult.lpdead_items > 0)
{
- vacrel->lpdead_item_pages++;
+ vacrel->counters->lpdead_item_pages++;
/*
* deadoffsets are collected incrementally in
@@ -1501,15 +1712,15 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += presult.live_tuples;
- vacrel->recently_dead_tuples += presult.recently_dead_tuples;
+ vacrel->counters->tuples_deleted += presult.ndeleted;
+ vacrel->counters->tuples_frozen += presult.nfrozen;
+ vacrel->counters->lpdead_items += presult.lpdead_items;
+ vacrel->counters->live_tuples += presult.live_tuples;
+ vacrel->counters->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->counters->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
@@ -1659,8 +1870,8 @@ lazy_scan_noprune(LVRelState *vacrel,
missed_dead_tuples;
bool hastup;
HeapTupleHeader tupleheader;
- TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ TransactionId NoFreezePageRelfrozenXid = vacrel->counters->NewRelfrozenXid;
+ MultiXactId NoFreezePageRelminMxid = vacrel->counters->NewRelminMxid;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1787,8 +1998,8 @@ lazy_scan_noprune(LVRelState *vacrel,
* this particular page until the next VACUUM. Remember its details now.
* (lazy_scan_prune expects a clean slate, so we have to do this last.)
*/
- vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = NoFreezePageRelminMxid;
+ vacrel->counters->NewRelfrozenXid = NoFreezePageRelfrozenXid;
+ vacrel->counters->NewRelminMxid = NoFreezePageRelminMxid;
/* Save any LP_DEAD items found on the page in dead_items */
if (vacrel->nindexes == 0)
@@ -1815,25 +2026,25 @@ lazy_scan_noprune(LVRelState *vacrel,
* indexes will be deleted during index vacuuming (and then marked
* LP_UNUSED in the heap)
*/
- vacrel->lpdead_item_pages++;
+ vacrel->counters->lpdead_item_pages++;
dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
- vacrel->lpdead_items += lpdead_items;
+ vacrel->counters->lpdead_items += lpdead_items;
}
/*
* Finally, add relevant page-local counts to whole-VACUUM counts
*/
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
- vacrel->missed_dead_tuples += missed_dead_tuples;
+ vacrel->counters->live_tuples += live_tuples;
+ vacrel->counters->recently_dead_tuples += recently_dead_tuples;
+ vacrel->counters->missed_dead_tuples += missed_dead_tuples;
if (missed_dead_tuples > 0)
- vacrel->missed_dead_pages++;
+ vacrel->counters->missed_dead_pages++;
/* Can't truncate this page */
if (hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->counters->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
@@ -1862,7 +2073,7 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(vacrel->lpdead_item_pages > 0);
+ Assert(vacrel->counters->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
@@ -1896,7 +2107,7 @@ lazy_vacuum(LVRelState *vacrel)
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
- Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
+ Assert(vacrel->counters->lpdead_items == vacrel->dead_items_info->num_items);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -1923,7 +2134,7 @@ lazy_vacuum(LVRelState *vacrel)
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
- bypass = (vacrel->lpdead_item_pages < threshold &&
+ bypass = (vacrel->counters->lpdead_item_pages < threshold &&
(TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L)));
}
@@ -2061,7 +2272,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
* place).
*/
Assert(vacrel->num_index_scans > 0 ||
- vacrel->dead_items_info->num_items == vacrel->lpdead_items);
+ vacrel->dead_items_info->num_items == vacrel->counters->lpdead_items);
Assert(allindexes || VacuumFailsafeActive);
/*
@@ -2165,8 +2376,8 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
- (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
- vacuumed_pages == vacrel->lpdead_item_pages));
+ (vacrel->dead_items_info->num_items == vacrel->counters->lpdead_items &&
+ vacuumed_pages == vacrel->counters->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
@@ -2347,7 +2558,7 @@ static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
double reltuples = vacrel->new_rel_tuples;
- bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
+ bool estimated_count = vacrel->counters->scanned_pages < vacrel->rel_pages;
const int progress_start_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_INDEXES_TOTAL
@@ -2528,7 +2739,7 @@ should_attempt_truncation(LVRelState *vacrel)
if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
return false;
- possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
+ possibly_freeable = vacrel->rel_pages - vacrel->counters->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
@@ -2554,7 +2765,7 @@ lazy_truncate_heap(LVRelState *vacrel)
/* Update error traceback information one last time */
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
- vacrel->nonempty_pages, InvalidOffsetNumber);
+ vacrel->counters->nonempty_pages, InvalidOffsetNumber);
/*
* Loop until no more truncating can be done.
@@ -2655,7 +2866,7 @@ lazy_truncate_heap(LVRelState *vacrel)
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
- vacrel->removed_pages += orig_rel_pages - new_rel_pages;
+ vacrel->counters->removed_pages += orig_rel_pages - new_rel_pages;
vacrel->rel_pages = new_rel_pages;
ereport(vacrel->verbose ? INFO : DEBUG2,
@@ -2663,7 +2874,7 @@ lazy_truncate_heap(LVRelState *vacrel)
vacrel->relname,
orig_rel_pages, new_rel_pages)));
orig_rel_pages = new_rel_pages;
- } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
+ } while (new_rel_pages > vacrel->counters->nonempty_pages && lock_waiter_detected);
}
/*
@@ -2691,7 +2902,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
- while (blkno > vacrel->nonempty_pages)
+ while (blkno > vacrel->counters->nonempty_pages)
{
Buffer buf;
Page page;
@@ -2803,7 +3014,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
- return vacrel->nonempty_pages;
+ return vacrel->counters->nonempty_pages;
}
/*
@@ -2821,12 +3032,8 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- /*
- * Initialize state for a parallel vacuum. As of now, only one worker can
- * be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
- */
- if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
+ /* Initialize state for a parallel vacuum */
+ if (nworkers >= 0)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -2844,11 +3051,18 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
+ {
+ /*
+ * For parallel index vacuuming, only one worker can be used for an
+ * index, we invoke parallelism only if there are at least two indexes
+ * on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
+ }
/*
* If parallel mode started, dead_items and dead_items_info spaces are
@@ -2889,9 +3103,19 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
};
int64 prog_val[2];
+ /*
+ * Protect both dead_items and dead_items_info from concurrent updates
+ * in parallel heap scan cases.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreLockExclusive(dead_items);
+
TidStoreSetBlockOffsets(dead_items, blkno, offsets, num_offsets);
vacrel->dead_items_info->num_items += num_offsets;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreUnlock(dead_items);
+
/* update the progress information */
prog_val[0] = vacrel->dead_items_info->num_items;
prog_val[1] = TidStoreMemoryUsage(dead_items);
@@ -3093,6 +3317,359 @@ update_relstats_all_indexes(LVRelState *vacrel)
}
}
+/*
+ * Compute the number of parallel workers for parallel vacuum heap scan.
+ *
+ * The calculation logic is borrowed from compute_parallel_worker().
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ int parallel_workers = 0;
+ int heap_parallel_threshold;
+ int heap_pages;
+
+ if (nrequested == 0)
+ {
+ /*
+ * Select the number of workers based on the log of the size of
+ * the relation. This probably needs to be a good deal more
+ * sophisticated, but we need something here for now. Note that
+ * the upper limit of the min_parallel_table_scan_size GUC is
+ * chosen to prevent overflow here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
+ {
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
+ }
+ }
+ else
+ parallel_workers = nrequested;
+
+ return parallel_workers;
+}
+
+/*
+ * Compute the amount of space we'll need in the parallel heap vacuum
+ * DSM, and inform pcxt->estimator about our needs.
+ *
+ * nworkers is the number of workers for the table vacuum. Note that it could
+ * be different than pcxt->nworkers since it is the maximum of number of
+ * workers for table vacuum and index vacuum.
+ */
+void
+heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ Size size = 0;
+ LVRelState *vacrel = (LVRelState *) state;
+
+ /* space for PHVShared */
+ size = add_size(size, SizeOfPHVShared);
+ size = add_size(size, mul_size(sizeof(LVRelCounters), nworkers));
+ vacrel->shared_len = size;
+ shm_toc_estimate_chunk(&pcxt->estimator, size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ vacrel->pscan_len = table_block_parallelscan_estimate(rel);
+ shm_toc_estimate_chunk(&pcxt->estimator, vacrel->pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ vacrel->pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+ shm_toc_estimate_chunk(&pcxt->estimator, vacrel->pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up shared memory for parallel heap vacuum.
+ */
+void
+heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ LVRelState *vacrel = (LVRelState *) state;
+ ParallelBlockTableScanDesc pscan;
+ PHVScanWorkerState *pscanwork;
+ PHVShared *shared;
+ PHVState *phvstate;
+
+ phvstate = (PHVState *) palloc(sizeof(PHVState));
+
+ shared = shm_toc_allocate(pcxt->toc, vacrel->shared_len);
+
+ /* Prepare the shared information */
+
+ MemSet(shared, 0, vacrel->shared_len);
+ shared->aggressive = vacrel->aggressive;
+ shared->skipwithvm = vacrel->skipwithvm;
+ shared->cutoffs = vacrel->cutoffs;
+ shared->NewRelfrozenXid = vacrel->counters->NewRelfrozenXid;
+ shared->NewRelminMxid = vacrel->counters->NewRelminMxid;
+ shared->skippedallvis = vacrel->counters->skippedallvis;
+
+ /*
+ * XXX: we copy the contents of vistest to the shared area, but in order to do
+ * that, we need to either expose GlobalVisTest or to provide functions to copy
+ * contents of GlobalVisTest to somewhere. Currently we do the former but not
+ * sure it's the best choice.
+ *
+ * Alternative idea is to have each worker determine cutoff and have their own
+ * vistest. But we need to carefully consider it since parallel workers end up
+ * having different cutoff and horizon.
+ */
+ shared->vistest = *vacrel->vistest;
+
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_SHARED, shared);
+
+ phvstate->shared = shared;
+
+ /* prepare the parallel block table scan description */
+ pscan = shm_toc_allocate(pcxt->toc, vacrel->pscan_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC, pscan);
+
+ /* initialize parallel scan description */
+ table_block_parallelscan_initialize(rel, (ParallelTableScanDesc) pscan);
+ phvstate->pscandesc = pscan;
+
+ /* prepare the workers' parallel block table scan state */
+ pscanwork = shm_toc_allocate(pcxt->toc, vacrel->pscanwork_len);
+ MemSet(pscanwork, 0, vacrel->pscanwork_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC_WORKER, pscanwork);
+ phvstate->scanstates = pscanwork;
+
+ vacrel->phvstate = phvstate;
+}
+
+/*
+ * Main function for parallel heap vacuum workers.
+ */
+void
+heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ LVRelState vacrel = {0};
+ PHVState *phvstate;
+ PHVShared *shared;
+ ParallelBlockTableScanDesc pscandesc;
+ PHVScanWorkerState *scanstate;
+ LVRelCounters *counters;
+ bool scan_done;
+
+ phvstate = palloc(sizeof(PHVState));
+
+ pscandesc = (ParallelBlockTableScanDesc) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC,
+ false);
+ phvstate->pscandesc = pscandesc;
+
+ shared = (PHVShared *) shm_toc_lookup(pwcxt->toc, LV_PARALLEL_SCAN_SHARED,
+ false);
+ phvstate->shared = shared;
+
+ scanstate = (PHVScanWorkerState *) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC_WORKER,
+ false);
+
+ phvstate->myscanstate = &(scanstate[ParallelWorkerNumber]);
+ counters = &(shared->worker_relcnts[ParallelWorkerNumber]);
+
+ /* Prepare LVRelState */
+ vacrel.rel = rel;
+ vacrel.indrels = parallel_vacuum_get_table_indexes(pvs, &vacrel.nindexes);
+ vacrel.pvs = pvs;
+ vacrel.phvstate = phvstate;
+ vacrel.aggressive = shared->aggressive;
+ vacrel.skipwithvm = shared->skipwithvm;
+ vacrel.cutoffs = shared->cutoffs;
+ vacrel.vistest = &(shared->vistest);
+ vacrel.dead_items = parallel_vacuum_get_dead_items(pvs,
+ &vacrel.dead_items_info);
+ vacrel.rel_pages = RelationGetNumberOfBlocks(rel);
+ vacrel.counters = counters;
+
+ /* initialize per-worker relation statistics */
+ MemSet(counters, 0, sizeof(LVRelCounters));
+
+ vacrel.counters->NewRelfrozenXid = shared->NewRelfrozenXid;
+ vacrel.counters->NewRelminMxid = shared->NewRelminMxid;
+ vacrel.counters->skippedallvis = shared->skippedallvis;
+
+ /*
+ * XXX: the following fields are not set yet:
+ * - index vacuum related fields such as consider_bypass_optimization,
+ * do_index_vacuuming etc.
+ * - error reporting state.
+ * - statistics such as scanned_pages etc.
+ * - oldest extant XID/MXID.
+ * - states maintained by heap_vac_scan_next_block()
+ */
+
+ /* Initialize the start block if not yet */
+ if (!phvstate->myscanstate->maybe_have_blocks)
+ {
+ table_block_parallelscan_startblock_init(rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
+
+ phvstate->myscanstate->maybe_have_blocks = false;
+ }
+
+ /*
+ * XXX: if we want to support parallel heap *vacuum*, we need to allow
+ * workers to call different function based on the shared information.
+ */
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+}
+
+/*
+ * Complete parallel heaps scans that have remaining blocks in their
+ * chunks.
+ */
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)
+{
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+
+ nworkers = parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
+ for (int i = 0; i < nworkers; i++)
+ {
+ PHVScanWorkerState *wstate = &(vacrel->phvstate->scanstates[i]);
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ if (!wstate->maybe_have_blocks)
+ continue;
+
+ vacrel->phvstate->myscanstate = wstate;
+
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ Assert(scan_done);
+ }
+}
+
+/*
+ * Accumulate relation counters that parallel workers collected into the
+ * leader's counters.
+ */
+static void
+parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelCounters *counters = &(phvstate->shared->worker_relcnts[i]);
+
+#define LV_ACCUM_ITEM(item) (vacrel)->counters->item += (counters)->item
+
+ LV_ACCUM_ITEM(scanned_pages);
+ LV_ACCUM_ITEM(removed_pages);
+ LV_ACCUM_ITEM(frozen_pages);
+ LV_ACCUM_ITEM(lpdead_item_pages);
+ LV_ACCUM_ITEM(missed_dead_pages);
+ LV_ACCUM_ITEM(nonempty_pages);
+ LV_ACCUM_ITEM(tuples_deleted);
+ LV_ACCUM_ITEM(tuples_frozen);
+ LV_ACCUM_ITEM(lpdead_items);
+ LV_ACCUM_ITEM(live_tuples);
+ LV_ACCUM_ITEM(recently_dead_tuples);
+ LV_ACCUM_ITEM(missed_dead_tuples);
+
+#undef LV_ACCUM_ITEM
+
+ if (TransactionIdPrecedes(counters->NewRelfrozenXid, vacrel->counters->NewRelfrozenXid))
+ vacrel->counters->NewRelfrozenXid = counters->NewRelfrozenXid;
+
+ if (MultiXactIdPrecedesOrEquals(counters->NewRelminMxid, vacrel->counters->NewRelminMxid))
+ vacrel->counters->NewRelminMxid = counters->NewRelminMxid;
+
+ if (!vacrel->counters->skippedallvis && counters->skippedallvis)
+ vacrel->counters->skippedallvis = true;
+ }
+}
+
+/*
+ * A parallel variant of do_lazy_scan_heap(). The leader process launches parallel
+ * workers to scan the heap in parallel.
+ */
+static void
+do_parallel_lazy_scan_heap(LVRelState *vacrel)
+{
+ PHVScanWorkerState *scanstate;
+ TidStore *dead_items = vacrel->dead_items;
+ VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* launcher workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ /* initialize parallel scan description to join as a worker */
+ scanstate = palloc(sizeof(PHVScanWorkerState));
+ table_block_parallelscan_startblock_init(vacrel->rel, &(scanstate->state),
+ vacrel->phvstate->pscandesc);
+ vacrel->phvstate->myscanstate = scanstate;
+
+ for (;;)
+ {
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * Scan the table until either we are close to overrunning the available
+ * space for dead_items TIDs or we reach the end of the table.
+ */
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* stop parallel workers and gather the collected stats */
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+ parallel_heap_vacuum_gather_scan_stats(vacrel);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ Assert(scan_done);
+ break;
+ }
+
+ parallel_heap_complete_unfinised_scan(vacrel);
+}
+
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..968addf94f 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -28,6 +28,7 @@
#include "access/amapi.h"
#include "access/table.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
@@ -64,6 +65,12 @@ typedef struct PVShared
Oid relid;
int elevel;
+ /*
+ * True if the caller wants parallel workers to invoke vacuum table scan
+ * callback.
+ */
+ bool do_vacuum_table_scan;
+
/*
* Fields for both index vacuum and cleanup.
*
@@ -163,6 +170,9 @@ struct ParallelVacuumState
/* NULL for worker processes */
ParallelContext *pcxt;
+ /* Passed to parallel table scan workers. NULL for leader process */
+ ParallelWorkerContext *pwcxt;
+
/* Parent Heap Relation */
Relation heaprel;
@@ -192,6 +202,16 @@ struct ParallelVacuumState
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /*
+ * The number of workers for parallel table scan/vacuuming and index vacuuming,
+ * respectively.
+ */
+ int nworkers_for_table;
+ int nworkers_for_index;
+
+ /* How many parallel table vacuum scan is called? */
+ int num_table_scans;
+
/*
* False if the index is totally unsuitable target for all parallel
* processing. For example, the index could be <
@@ -220,8 +240,9 @@ struct ParallelVacuumState
PVIndVacStatus status;
};
-static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum);
+static void parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum);
static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
@@ -241,7 +262,7 @@ static void parallel_vacuum_error_callback(void *arg);
ParallelVacuumState *
parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
int nrequested_workers, int vac_work_mem,
- int elevel, BufferAccessStrategy bstrategy)
+ int elevel, BufferAccessStrategy bstrategy, void *state)
{
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -255,6 +276,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
Size est_shared_len;
int nindexes_mwm = 0;
int parallel_workers = 0;
+ int nworkers_table;
+ int nworkers_index;
int querylen;
/*
@@ -262,15 +285,17 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
* relation
*/
Assert(nrequested_workers >= 0);
- Assert(nindexes > 0);
/*
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
- nrequested_workers,
- will_parallel_vacuum);
+ parallel_vacuum_compute_workers(rel, indrels, nindexes, nrequested_workers,
+ &nworkers_table, &nworkers_index,
+ will_parallel_vacuum);
+
+ parallel_workers = Max(nworkers_table, nworkers_index);
+
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- return NULL */
@@ -284,6 +309,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
pvs->will_parallel_vacuum = will_parallel_vacuum;
pvs->bstrategy = bstrategy;
pvs->heaprel = rel;
+ pvs->nworkers_for_table = nworkers_table;
+ pvs->nworkers_for_index = nworkers_index;
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
@@ -326,6 +353,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
else
querylen = 0; /* keep compiler quiet */
+ /* Estimate AM-specific space for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_estimate(rel, pcxt, nworkers_table, state);
+
InitializeParallelDSM(pcxt);
/* Prepare index vacuum stats */
@@ -417,6 +448,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+ /* Prepare AM-specific DSM for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_initialize(rel, pcxt, nworkers_table, state);
+
/* Success -- return parallel vacuum state */
return pvs;
}
@@ -538,27 +573,41 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
* min_parallel_index_scan_size as invoking workers for very small indexes
* can hurt performance.
*
+ * XXX needs to mention about the number of workers for table.
+ *
* nrequested is the number of parallel workers that user requested. If
* nrequested is 0, we compute the parallel degree based on nindexes, that is
* the number of indexes that support parallel vacuum. This function also
* sets will_parallel_vacuum to remember indexes that participate in parallel
* vacuum.
*/
-static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum)
+static void
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
+
+ *nworkers_table = 0;
+ *nworkers_index = 0;
/*
* We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
+ return;
+
+ /*
+ * Compute the number of workers for parallel table scan. Cap by
+ * max_parallel_maintenance_workers.
+ */
+ parallel_workers_table = Min(table_paralle_vacuum_compute_workers(rel, nrequested),
+ max_parallel_maintenance_workers);
/*
* Compute the number of indexes that can participate in parallel vacuum.
@@ -589,17 +638,18 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
nindexes_parallel--;
/* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Compute the parallel degree for parallel index vacuum */
+ parallel_workers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers_index = Min(parallel_workers_index, max_parallel_maintenance_workers);
+ }
- return parallel_workers;
+ *nworkers_table = parallel_workers_table;
+ *nworkers_index = parallel_workers_index;
}
/*
@@ -669,7 +719,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -978,6 +1028,120 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
return true;
}
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Do table vacuum scan */
+ table_parallel_vacuum_scan(pvs->heaprel, pvs, pvs->pwcxt);
+
+ /*
+ * We have completed the table vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Prepare DSM and vacuum delay, and launch parallel workers for parallel
+ * table vacuum scan.
+ */
+int
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->nworkers_for_table == 0)
+ return 0;
+
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ pvs->shared->do_vacuum_table_scan = true;
+
+ if (pvs->num_table_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ ReinitializeParallelWorkers(pvs->pcxt, pvs->nworkers_for_table);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for table scanning (planned: %d)",
+ "launched %d parallel vacuum workers for table scanning (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, pvs->nworkers_for_table)));
+
+ return pvs->pcxt->nworkers_launched;
+}
+
+/*
+ * Wait for all workers for parallel table vacuum scan, and gather statistics.
+ */
+void
+parallel_vacuum_table_scan_end(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->nworkers_for_table == 0)
+ return;
+
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->shared->do_vacuum_table_scan = false;
+ pvs->num_table_scans++;
+}
+
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
+{
+ *nindexes = pvs->nindexes;
+
+ return pvs->indrels;
+}
+
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->nworkers_for_table;
+}
+
/*
* Perform work within a launched parallel process.
*
@@ -1026,7 +1190,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
* matched to the leader's one.
*/
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
if (shared->maintenance_work_mem_worker > 0)
maintenance_work_mem = shared->maintenance_work_mem_worker;
@@ -1060,6 +1223,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.relname = pstrdup(RelationGetRelationName(rel));
pvs.heaprel = rel;
+ pvs.pwcxt = palloc(sizeof(ParallelWorkerContext));
+ pvs.pwcxt->toc = toc;
+ pvs.pwcxt->seg = seg;
+
/* These fields will be filled during index vacuum or cleanup */
pvs.indname = NULL;
pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
@@ -1077,8 +1244,15 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
/* Prepare to track buffer usage during parallel execution */
InstrStartParallelQuery();
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+ }
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index d5165aa0d9..37035cc186 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -164,15 +164,6 @@ typedef struct ProcArrayStruct
*
* The typedef is in the header.
*/
-struct GlobalVisState
-{
- /* XIDs >= are considered running by some backend */
- FullTransactionId definitely_needed;
-
- /* XIDs < are not considered to be running by any backend */
- FullTransactionId maybe_needed;
-};
-
/*
* Result of ComputeXidHorizons().
*/
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec88a6..6c5e48e478 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -20,6 +20,7 @@
#include "access/skey.h"
#include "access/table.h" /* for backward compatibility */
#include "access/tableam.h"
+#include "commands/vacuum.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -393,6 +394,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
+extern int heap_parallel_vacuum_compute_workers(Relation rel, int requested);
+extern void heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..b10b047ca1 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
#include "access/relscan.h"
#include "access/sdir.h"
#include "access/xact.h"
+#include "commands/vacuum.h"
#include "executor/tuptable.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
@@ -655,6 +656,46 @@ typedef struct TableAmRoutine
struct VacuumParams *params,
BufferAccessStrategy bstrategy);
+ /* ------------------------------------------------------------------------
+ * Callbacks for parallel table vacuum.
+ * ------------------------------------------------------------------------
+ */
+
+ /*
+ * Compute the number of parallel workers for parallel table vacuum.
+ * The function must return 0 to disable parallel table vacuum.
+ */
+ int (*parallel_vacuum_compute_workers) (Relation rel, int requested);
+
+ /*
+ * Compute the amount of DSM space AM need in the parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_estimate) (Relation rel,
+ ParallelContext *pcxt,
+ int nworkers,
+ void *state);
+
+ /*
+ * Initialize DSM space for parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_initialize) (Relation rel,
+ ParallelContext *pctx,
+ int nworkers,
+ void *state);
+
+ /*
+ * This callback is called for parallel table vacuum workers.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_scan_worker) (Relation rel,
+ ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
+
/*
* Prepare to analyze the next block in the read stream. Returns false if
* the stream is exhausted and true otherwise. The scan must have been
@@ -1720,6 +1761,33 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
}
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
+
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+static inline void
+table_parallel_vacuum_scan(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_scan_worker(rel, pvs, pwcxt);
+}
+
/*
* Prepare to analyze the next block in the read stream. The scan needs to
* have been started with table_beginscan_analyze(). Note that this routine
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 759f9a87d3..598bb5218f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -360,7 +360,8 @@ extern void VacuumUpdateCosts(void);
extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
int nindexes, int nrequested_workers,
int vac_work_mem, int elevel,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy,
+ void *state);
extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
@@ -372,6 +373,10 @@ extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
int num_index_scans,
bool estimated_count);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
+extern Relation *parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 9398a84051..6ccb19a29f 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -102,8 +102,20 @@ extern char *ExportSnapshot(Snapshot snapshot);
/*
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
+ *
+ * XXX the struct definition is temporarily moved from procarray.c for
+ * parallel table vacuum development. We need to find a suitable way for
+ * parallel table vacuum workers to share the GlobalVisState.
*/
-typedef struct GlobalVisState GlobalVisState;
+typedef struct GlobalVisState
+{
+ /* XIDs >= are considered running by some backend */
+ FullTransactionId definitely_needed;
+
+ /* XIDs < are not considered to be running by any backend */
+ FullTransactionId maybe_needed;
+} GlobalVisState;
+
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
On Fri, Jun 28, 2024 at 9:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
# Benchmark results
* Test-1: parallel heap scan on the table without indexes
I created 20GB table, made garbage on the table, and run vacuum while
changing parallel degree:create unlogged table test (a int) with (autovacuum_enabled = off);
insert into test select generate_series(1, 600000000); --- 20GB table
delete from test where a % 5 = 0;
vacuum (verbose, parallel 0) test;Here are the results (total time and heap scan time):
PARALLEL 0: 21.99 s (single process)
PARALLEL 1: 11.39 s
PARALLEL 2: 8.36 s
PARALLEL 3: 6.14 s
PARALLEL 4: 5.08 s* Test-2: parallel heap scan on the table with one index
I used a similar table to the test case 1 but created one btree index on it:
create unlogged table test (a int) with (autovacuum_enabled = off);
insert into test select generate_series(1, 600000000); --- 20GB table
create index on test (a);
delete from test where a % 5 = 0;
vacuum (verbose, parallel 0) test;I've measured the total execution time as well as the time of each
vacuum phase (from left heap scan time, index vacuum time, and heap
vacuum time):PARALLEL 0: 45.11 s (21.89, 16.74, 6.48)
PARALLEL 1: 42.13 s (12.75, 22.04, 7.23)
PARALLEL 2: 39.27 s (8.93, 22.78, 7.45)
PARALLEL 3: 36.53 s (6.76, 22.00, 7.65)
PARALLEL 4: 35.84 s (5.85, 22.04, 7.83)Overall, I can see the parallel heap scan in lazy vacuum has a decent
scalability; In both test-1 and test-2, the execution time of heap
scan got ~4x faster with 4 parallel workers. On the other hand, when
it comes to the total vacuum execution time, I could not see much
performance improvement in test-2 (45.11 vs. 35.84). Looking at the
results PARALLEL 0 vs. PARALLEL 1 in test-2, the heap scan got faster
(21.89 vs. 12.75) whereas index vacuum got slower (16.74 vs. 22.04),
and heap scan in case 2 was not as fast as in case 1 with 1 parallel
worker (12.75 vs. 11.39).I think the reason is the shared TidStore is not very scalable since
we have a single lock on it. In all cases in the test-1, we don't use
the shared TidStore since all dead tuples are removed during heap
pruning. So the scalability was better overall than in test-2. In
parallel 0 case in test-2, we use the local TidStore, and from
parallel degree of 1 in test-2, we use the shared TidStore and
parallel worker concurrently update it. Also, I guess that the lookup
performance of the local TidStore is better than the shared TidStore's
lookup performance because of the differences between a bump context
and an DSA area. I think that this difference contributed the fact
that index vacuuming got slower (16.74 vs. 22.04).There are two obvious improvement ideas to improve overall vacuum
execution time: (1) improve the shared TidStore scalability and (2)
support parallel heap vacuum. For (1), several ideas are proposed by
the ART authors[1]. I've not tried these ideas but it might be
applicable to our ART implementation. But I prefer to start with (2)
since it would be easier. Feedback is very welcome.
Starting with (2) sounds like a reasonable approach. We should study a
few more things like (a) the performance results where there are 3-4
indexes, (b) What is the reason for performance improvement seen with
only heap scans. We normally get benefits of parallelism because of
using multiple CPUs but parallelizing scans (I/O) shouldn't give much
benefits. Is it possible that you are seeing benefits because most of
the data is either in shared_buffers or in memory? We can probably try
vacuuming tables by restarting the nodes to ensure the data is not in
memory.
--
With Regards,
Amit Kapila.
Dear Sawada-san,
The parallel vacuum we have today supports only for index vacuuming.
Therefore, while multiple workers can work on different indexes in
parallel, the heap table is always processed by the single process.
I'd like to propose $subject, which enables us to have multiple
workers running on the single heap table. This would be helpful to
speedup vacuuming for tables without indexes or tables with
INDEX_CLENAUP = off.
Sounds great. IIUC, vacuuming is still one of the main weak point of postgres.
I've attached a PoC patch for this feature. It implements only
parallel heap scans in lazyvacum. We can extend this feature to
support parallel heap vacuum as well in the future or in the same
patch.
Before diving into deep, I tested your PoC but found unclear point.
When the vacuuming is requested with parallel > 0 with almost the same workload
as yours, only the first page was scanned and cleaned up.
When parallel was set to zero, I got:
```
INFO: vacuuming "postgres.public.test"
INFO: finished vacuuming "postgres.public.test": index scans: 0
pages: 0 removed, 2654868 remain, 2654868 scanned (100.00% of total)
tuples: 120000000 removed, 480000000 remain, 0 are dead but not yet removable
removable cutoff: 752, which was 0 XIDs old when operation ended
new relfrozenxid: 739, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 344.639 MB/s, avg write rate: 344.650 MB/s
buffer usage: 2655045 hits, 2655527 misses, 2655606 dirtied
WAL usage: 1 records, 1 full page images, 937 bytes
system usage: CPU: user: 39.45 s, system: 20.74 s, elapsed: 60.19 s
```
This meant that all pages were surely scanned and dead tuples were removed.
However, when parallel was set to one, I got another result:
```
INFO: vacuuming "postgres.public.test"
INFO: launched 1 parallel vacuum worker for table scanning (planned: 1)
INFO: finished vacuuming "postgres.public.test": index scans: 0
pages: 0 removed, 2654868 remain, 1 scanned (0.00% of total)
tuples: 12 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 752, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 92.952 MB/s, avg write rate: 0.845 MB/s
buffer usage: 96 hits, 660 misses, 6 dirtied
WAL usage: 1 records, 1 full page images, 937 bytes
system usage: CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.05 s
```
It looked like that only a page was scanned and 12 tuples were removed.
It looks very strange for me...
Attached script emulate my test. IIUC it was almost the same as yours, but
the instance was restarted before vacuuming.
Can you reproduce and see the reason? Based on the requirement I can
provide further information.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/
Attachments:
On Fri, Jul 5, 2024 at 6:51 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Sawada-san,
The parallel vacuum we have today supports only for index vacuuming.
Therefore, while multiple workers can work on different indexes in
parallel, the heap table is always processed by the single process.
I'd like to propose $subject, which enables us to have multiple
workers running on the single heap table. This would be helpful to
speedup vacuuming for tables without indexes or tables with
INDEX_CLENAUP = off.Sounds great. IIUC, vacuuming is still one of the main weak point of postgres.
I've attached a PoC patch for this feature. It implements only
parallel heap scans in lazyvacum. We can extend this feature to
support parallel heap vacuum as well in the future or in the same
patch.Before diving into deep, I tested your PoC but found unclear point.
When the vacuuming is requested with parallel > 0 with almost the same workload
as yours, only the first page was scanned and cleaned up.When parallel was set to zero, I got:
```
INFO: vacuuming "postgres.public.test"
INFO: finished vacuuming "postgres.public.test": index scans: 0
pages: 0 removed, 2654868 remain, 2654868 scanned (100.00% of total)
tuples: 120000000 removed, 480000000 remain, 0 are dead but not yet removable
removable cutoff: 752, which was 0 XIDs old when operation ended
new relfrozenxid: 739, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 344.639 MB/s, avg write rate: 344.650 MB/s
buffer usage: 2655045 hits, 2655527 misses, 2655606 dirtied
WAL usage: 1 records, 1 full page images, 937 bytes
system usage: CPU: user: 39.45 s, system: 20.74 s, elapsed: 60.19 s
```This meant that all pages were surely scanned and dead tuples were removed.
However, when parallel was set to one, I got another result:```
INFO: vacuuming "postgres.public.test"
INFO: launched 1 parallel vacuum worker for table scanning (planned: 1)
INFO: finished vacuuming "postgres.public.test": index scans: 0
pages: 0 removed, 2654868 remain, 1 scanned (0.00% of total)
tuples: 12 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 752, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 92.952 MB/s, avg write rate: 0.845 MB/s
buffer usage: 96 hits, 660 misses, 6 dirtied
WAL usage: 1 records, 1 full page images, 937 bytes
system usage: CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.05 s
```It looked like that only a page was scanned and 12 tuples were removed.
It looks very strange for me...Attached script emulate my test. IIUC it was almost the same as yours, but
the instance was restarted before vacuuming.Can you reproduce and see the reason? Based on the requirement I can
provide further information.
Thank you for the test!
I could reproduce this issue and it's a bug; it skipped even
non-all-visible pages. I've attached the new version patch.
BTW since we compute the number of parallel workers for the heap scan
based on the table size, it's possible that we launch multiple workers
even if most blocks are all-visible. It seems to be better if we
calculate it based on (relpages - relallvisible).
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
parallel_heap_vacuum_scan_v2.patchapplication/octet-stream; name=parallel_heap_vacuum_scan_v2.patchDownload
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6f8b1b7929..cf8c6614cd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2630,6 +2630,12 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_data = heapam_relation_copy_data,
.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
.relation_vacuum = heap_vacuum_rel,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,
+ .parallel_vacuum_estimate = heap_parallel_vacuum_estimate,
+ .parallel_vacuum_initialize = heap_parallel_vacuum_initialize,
+ .parallel_vacuum_scan_worker = heap_parallel_vacuum_scan_worker,
+
.scan_analyze_next_block = heapam_scan_analyze_next_block,
.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
.index_build_range_scan = heapam_index_build_range_scan,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3f88cf1e8e..ca44d04e66 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -49,6 +49,7 @@
#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
+#include "optimizer/paths.h"
#include "pgstat.h"
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
@@ -117,10 +118,22 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * Macro to check if we are in a parallel vacuum. If true, we are in the
- * parallel mode and the DSM segment is initialized.
+ * DSM keys for heap parallel vacuum scan. Unlike other parallel execution code, we
+ * we don't need to worry about DSM keys conflicting with plan_node_id, but need to
+ * avoid conflicting with DSM keys used in vacuumparallel.c.
+ */
+#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003
+
+/*
+ * Macro to check if we are in a parallel vacuum. If ParallelVacuumIsActive() is
+ * true, we are in the parallel mode, meaning that we do either parallel index
+ * vacuuming or parallel table vacuuming, or both. If ParallelHeapVacuumIsActive()
+ * is true, we do at least parallel table vacuuming.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
+#define ParallelHeapVacuumIsActive(vacrel) ((vacrel)->phvstate != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -133,6 +146,80 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;
+/*
+ * Relation statistics collected during heap scanning and need to be shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelCounters
+{
+ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation truncation */
+ BlockNumber frozen_pages; /* # pages with newly frozen tuples */
+ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
+ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+
+ /* Counters that follow are only for scanned_pages */
+ int64 tuples_deleted; /* # deleted from table */
+ int64 tuples_frozen; /* # newly frozen */
+ int64 lpdead_items; /* # deleted from indexes */
+ int64 live_tuples; /* # live tuples remaining */
+ int64 recently_dead_tuples; /* # dead, but not yet removable */
+ int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid. */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+} LVRelCounters;
+
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
+ /* The initial values shared by the leader process */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+
+ /* VACUUM operation's cutoffs for freezing and pruning */
+ struct VacuumCutoffs cutoffs;
+ GlobalVisState vistest;
+
+ LVRelCounters worker_relcnts[FLEXIBLE_ARRAY_MEMBER];
+} PHVShared;
+#define SizeOfPHVShared (offsetof(PHVShared, worker_relcnts))
+
+/* Per-worker scan state */
+typedef struct PHVScanWorkerState
+{
+ ParallelBlockTableScanWorkerData state;
+ bool maybe_have_blocks;
+} PHVScanWorkerState;
+
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
+
+ /* Per-worker scan state */
+ PHVScanWorkerState *myscanstate;
+
+ /* Points to all per-worker scan state array */
+ PHVScanWorkerState *scanstates;
+
+ /* The number of workers launched for parallel heap vacuum */
+ int nworkers_launched;
+} PHVState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -144,6 +231,12 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
+ /* Parallel heap vacuum state and sizes for each struct */
+ PHVState *phvstate;
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
@@ -159,10 +252,6 @@ typedef struct LVRelState
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
- /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
- TransactionId NewRelfrozenXid;
- MultiXactId NewRelminMxid;
- bool skippedallvis;
/* Error reporting state */
char *dbname;
@@ -188,12 +277,10 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
- BlockNumber removed_pages; /* # pages removed by relation truncation */
- BlockNumber frozen_pages; /* # pages with newly frozen tuples */
- BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
- BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber next_fsm_block_to_vacuum;
+
+ /* Block and tuple counters for the relation */
+ LVRelCounters *counters;
/* Statistics output by us, for table */
double new_rel_tuples; /* new estimated total # of tuples */
@@ -203,13 +290,6 @@ typedef struct LVRelState
/* Instrumentation counters */
int num_index_scans;
- /* Counters that follow are only for scanned_pages */
- int64 tuples_deleted; /* # deleted from table */
- int64 tuples_frozen; /* # newly frozen */
- int64 lpdead_items; /* # deleted from indexes */
- int64 live_tuples; /* # live tuples remaining */
- int64 recently_dead_tuples; /* # dead, but not yet removable */
- int64 missed_dead_tuples; /* # removable, but not removed */
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
@@ -229,6 +309,7 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
+static bool do_lazy_scan_heap(LVRelState *vacrel);
static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
bool *all_visible_according_to_vm);
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
@@ -271,6 +352,12 @@ static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
+
+
+static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
+static void parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel);
+static void parallel_heap_complete_unfinised_scan(LVRelState *vacrel);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -296,6 +383,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
+ LVRelCounters *counters;
bool verbose,
instrument,
skipwithvm,
@@ -406,14 +494,28 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
+ vacrel->next_fsm_block_to_vacuum = 0;
+
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->frozen_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
- /* dead_items_alloc allocates vacrel->dead_items later on */
+ counters = palloc(sizeof(LVRelCounters));
+ counters->scanned_pages = 0;
+ counters->removed_pages = 0;
+ counters->frozen_pages = 0;
+ counters->lpdead_item_pages = 0;
+ counters->missed_dead_pages = 0;
+ counters->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ counters->tuples_deleted = 0;
+ counters->tuples_frozen = 0;
+ counters->lpdead_items = 0;
+ counters->live_tuples = 0;
+ counters->recently_dead_tuples = 0;
+ counters->missed_dead_tuples = 0;
+
+ vacrel->counters = counters;
+
+ vacrel->num_index_scans = 0;
/* Allocate/initialize output statistics state */
vacrel->new_rel_tuples = 0;
@@ -421,14 +523,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
- /* Initialize remaining counters (be tidy) */
- vacrel->num_index_scans = 0;
- vacrel->tuples_deleted = 0;
- vacrel->tuples_frozen = 0;
- vacrel->lpdead_items = 0;
- vacrel->live_tuples = 0;
- vacrel->recently_dead_tuples = 0;
- vacrel->missed_dead_tuples = 0;
+ /* dead_items_alloc allocates vacrel->dead_items later on */
/*
* Get cutoffs that determine which deleted tuples are considered DEAD,
@@ -450,9 +545,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
/* Initialize state used to track oldest extant XID/MXID */
- vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
- vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- vacrel->skippedallvis = false;
+ vacrel->counters->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
+ vacrel->counters->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+ vacrel->counters->skippedallvis = false;
skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
{
@@ -533,15 +628,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
* Non-aggressive VACUUMs may advance them by any amount, or not at all.
*/
- Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
+ Assert(vacrel->counters->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
vacrel->cutoffs.relfrozenxid,
- vacrel->NewRelfrozenXid));
- Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
+ vacrel->counters->NewRelfrozenXid));
+ Assert(vacrel->counters->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
vacrel->cutoffs.relminmxid,
- vacrel->NewRelminMxid));
- if (vacrel->skippedallvis)
+ vacrel->counters->NewRelminMxid));
+ if (vacrel->counters->skippedallvis)
{
/*
* Must keep original relfrozenxid in a non-aggressive VACUUM that
@@ -549,8 +644,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* values will have missed unfrozen XIDs from the pages we skipped.
*/
Assert(!vacrel->aggressive);
- vacrel->NewRelfrozenXid = InvalidTransactionId;
- vacrel->NewRelminMxid = InvalidMultiXactId;
+ vacrel->counters->NewRelfrozenXid = InvalidTransactionId;
+ vacrel->counters->NewRelminMxid = InvalidMultiXactId;
}
/*
@@ -571,7 +666,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
*/
vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
new_rel_allvisible, vacrel->nindexes > 0,
- vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
+ vacrel->counters->NewRelfrozenXid, vacrel->counters->NewRelminMxid,
&frozenxid_updated, &minmulti_updated, false);
/*
@@ -587,8 +682,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
- vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples);
+ vacrel->counters->recently_dead_tuples +
+ vacrel->counters->missed_dead_tuples);
pgstat_progress_end_command();
if (instrument)
@@ -651,21 +746,21 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
- vacrel->removed_pages,
+ vacrel->counters->removed_pages,
new_rel_pages,
- vacrel->scanned_pages,
+ vacrel->counters->scanned_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->scanned_pages / orig_rel_pages);
+ 100.0 * vacrel->counters->scanned_pages / orig_rel_pages);
appendStringInfo(&buf,
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
- (long long) vacrel->tuples_deleted,
+ (long long) vacrel->counters->tuples_deleted,
(long long) vacrel->new_rel_tuples,
- (long long) vacrel->recently_dead_tuples);
- if (vacrel->missed_dead_tuples > 0)
+ (long long) vacrel->counters->recently_dead_tuples);
+ if (vacrel->counters->missed_dead_tuples > 0)
appendStringInfo(&buf,
_("tuples missed: %lld dead from %u pages not removed due to cleanup lock contention\n"),
- (long long) vacrel->missed_dead_tuples,
- vacrel->missed_dead_pages);
+ (long long) vacrel->counters->missed_dead_tuples,
+ vacrel->counters->missed_dead_pages);
diff = (int32) (ReadNextTransactionId() -
vacrel->cutoffs.OldestXmin);
appendStringInfo(&buf,
@@ -673,25 +768,25 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->cutoffs.OldestXmin, diff);
if (frozenxid_updated)
{
- diff = (int32) (vacrel->NewRelfrozenXid -
+ diff = (int32) (vacrel->counters->NewRelfrozenXid -
vacrel->cutoffs.relfrozenxid);
appendStringInfo(&buf,
_("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
- vacrel->NewRelfrozenXid, diff);
+ vacrel->counters->NewRelfrozenXid, diff);
}
if (minmulti_updated)
{
- diff = (int32) (vacrel->NewRelminMxid -
+ diff = (int32) (vacrel->counters->NewRelminMxid -
vacrel->cutoffs.relminmxid);
appendStringInfo(&buf,
_("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
- vacrel->NewRelminMxid, diff);
+ vacrel->counters->NewRelminMxid, diff);
}
appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %lld tuples frozen\n"),
- vacrel->frozen_pages,
+ vacrel->counters->frozen_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->frozen_pages / orig_rel_pages,
- (long long) vacrel->tuples_frozen);
+ 100.0 * vacrel->counters->frozen_pages / orig_rel_pages,
+ (long long) vacrel->counters->tuples_frozen);
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
@@ -711,10 +806,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
appendStringInfo(&buf, msgfmt,
- vacrel->lpdead_item_pages,
+ vacrel->counters->lpdead_item_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
- (long long) vacrel->lpdead_items);
+ 100.0 * vacrel->counters->lpdead_item_pages / orig_rel_pages,
+ (long long) vacrel->counters->lpdead_items);
for (int i = 0; i < vacrel->nindexes; i++)
{
IndexBulkDeleteResult *istat = vacrel->indstats[i];
@@ -815,14 +910,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- blkno,
- next_fsm_block_to_vacuum = 0;
- bool all_visible_according_to_vm;
-
- TidStore *dead_items = vacrel->dead_items;
+ BlockNumber rel_pages = vacrel->rel_pages;
VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
- Buffer vmbuffer = InvalidBuffer;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -842,6 +931,70 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);
+
+ vacrel->blkno = InvalidBlockNumber;
+
+ /* report that everything is now scanned */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, rel_pages);
+
+ /* now we can compute the new value for pg_class.reltuples */
+ vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
+ vacrel->counters->scanned_pages,
+ vacrel->counters->live_tuples);
+
+ /*
+ * Also compute the total number of surviving heap entries. In the
+ * (unlikely) scenario that new_live_tuples is -1, take it as zero.
+ */
+ vacrel->new_rel_tuples =
+ Max(vacrel->new_live_tuples, 0) + vacrel->counters->recently_dead_tuples +
+ vacrel->counters->missed_dead_tuples;
+
+ /*
+ * Do index vacuuming (call each index's ambulkdelete routine), then do
+ * related heap vacuuming
+ */
+ if (dead_items_info->num_items > 0)
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the remainder of the Free Space Map. We must do this whether or
+ * not there were indexes, and whether or not we bypassed index vacuuming.
+ */
+ if (rel_pages > vacrel->next_fsm_block_to_vacuum)
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ rel_pages);
+
+ /* report all blocks vacuumed */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, rel_pages);
+
+ /* Do final index cleanup (call each index's amvacuumcleanup routine) */
+ if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
+ lazy_cleanup_all_indexes(vacrel);
+}
+
+/*
+ * Workhorse for lazy_scan_heap().
+ *
+ * Return true if we processed all blocks, otherwise false if we exit from this function
+ * while not completing the heap scan due to full of dead item TIDs. In serial heap scan
+ * case, this function always returns true. In parallel heap vacuum scan, this function
+ * is called by both worker processes and the leader process, and could return false.
+ */
+static bool
+do_lazy_scan_heap(LVRelState *vacrel)
+{
+ bool all_visible_according_to_vm;
+ TidStore *dead_items = vacrel->dead_items;
+ VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
+ BlockNumber blkno;
+ Buffer vmbuffer = InvalidBuffer;
+ bool scan_done = true;
+
while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
{
Buffer buf;
@@ -849,7 +1002,7 @@ lazy_scan_heap(LVRelState *vacrel)
bool has_lpdead_items;
bool got_cleanup_lock = false;
- vacrel->scanned_pages++;
+ vacrel->counters->scanned_pages++;
/* Report as block scanned, update error traceback information */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
@@ -867,46 +1020,10 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (!IsParallelWorker() &&
+ vacrel->counters->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
- /*
- * Consider if we definitely have enough space to process TIDs on page
- * already. If we are close to overrunning the available space for
- * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
- * this page.
- */
- if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
- {
- /*
- * Before beginning index vacuuming, we release any pin we may
- * hold on the visibility map page. This isn't necessary for
- * correctness, but we do it anyway to avoid holding the pin
- * across a lengthy, unrelated operation.
- */
- if (BufferIsValid(vmbuffer))
- {
- ReleaseBuffer(vmbuffer);
- vmbuffer = InvalidBuffer;
- }
-
- /* Perform a round of index and heap vacuuming */
- vacrel->consider_bypass_optimization = false;
- lazy_vacuum(vacrel);
-
- /*
- * Vacuum the Free Space Map to make newly-freed space visible on
- * upper-level FSM pages. Note we have not yet processed blkno.
- */
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
-
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
- }
-
/*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
@@ -994,10 +1111,14 @@ lazy_scan_heap(LVRelState *vacrel)
* also be no opportunity to update the FSM later, because we'll never
* revisit this page. Since updating the FSM is desirable but not
* absolutely required, that's OK.
+ *
+ * XXX: in parallel heap scan, some blocks before blkno might not been
+ * processed yet. Is it worth vacuuming FSM?
*/
- if (vacrel->nindexes == 0
- || !vacrel->do_index_vacuuming
- || !has_lpdead_items)
+ if (!IsParallelWorker() &&
+ (vacrel->nindexes == 0
+ || !vacrel->do_index_vacuuming
+ || !has_lpdead_items))
{
Size freespace = PageGetHeapFreeSpace(page);
@@ -1011,57 +1132,154 @@ lazy_scan_heap(LVRelState *vacrel)
* held the cleanup lock and lazy_scan_prune() was called.
*/
if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
- blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ blkno - vacrel->next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
blkno);
- next_fsm_block_to_vacuum = blkno;
+ vacrel->next_fsm_block_to_vacuum = blkno;
}
}
else
UnlockReleaseBuffer(buf);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
+ {
+ /*
+ * Before beginning index vacuuming, we release any pin we may
+ * hold on the visibility map page. This isn't necessary for
+ * correctness, but we do it anyway to avoid holding the pin
+ * across a lengthy, unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ /*
+ * In parallel heap vacuum case, both the leader process and
+ * the worker processes have to exit without invoking index
+ * and heap vacuuming. The leader process will wait for all
+ * workers to finish and perform index and heap vacuuming.
+ */
+ scan_done = false;
+ break;
+ }
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ *
+ * XXX: in parallel heap scan, some blocks before blkno might not
+ * been processed yet. Is it worth vacuuming FSM?
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = blkno;
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ continue;
+ }
}
- vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
- /* report that everything is now scanned */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ return scan_done;
+}
- /* now we can compute the new value for pg_class.reltuples */
- vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scanned_pages,
- vacrel->live_tuples);
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
+{
+ PHVState *phvstate = vacrel->phvstate;
+ BlockNumber next_block;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 mapbits = 0;
- /*
- * Also compute the total number of surviving heap entries. In the
- * (unlikely) scenario that new_live_tuples is -1, take it as zero.
- */
- vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples;
+ Assert(ParallelHeapVacuumIsActive(vacrel));
- /*
- * Do index vacuuming (call each index's ambulkdelete routine), then do
- * related heap vacuuming
- */
- if (dead_items_info->num_items > 0)
- lazy_vacuum(vacrel);
+ for (;;)
+ {
+ next_block = table_block_parallelscan_nextpage(vacrel->rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
- /*
- * Vacuum the remainder of the Free Space Map. We must do this whether or
- * not there were indexes, and whether or not we bypassed index vacuuming.
- */
- if (blkno > next_fsm_block_to_vacuum)
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
+ /* Have we reached the end of the table? */
+ if (!BlockNumberIsValid(next_block) || next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
- /* report all blocks vacuumed */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
- /* Do final index cleanup (call each index's amvacuumcleanup routine) */
- if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
- lazy_cleanup_all_indexes(vacrel);
+ /* We always treat the last block as unsafe to skip */
+ if (next_block == vacrel->rel_pages - 1)
+ break;
+
+ mapbits = visibilitymap_get_status(vacrel->rel, next_block, &vmbuffer);
+
+ /*
+ * A block is unskippable if it is not all visible according to the
+ * visibility map.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
+ break;
+ }
+
+ /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
+ if (!vacrel->skipwithvm)
+ break;
+
+ /*
+ * Aggressive VACUUM caller can't skip pages just because they are
+ * all-visible.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;
+
+ /*
+ * All-visible block is safe to skip in non-aggressive case. But
+ * remember that the final range contains such a block for later.
+ */
+ vacrel->counters->skippedallvis = true;
+ }
+ }
+
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
+
+ *blkno = next_block;
+ *all_visible_according_to_vm = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ return true;
}
/*
@@ -1088,6 +1306,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
BlockNumber next_block;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ return heap_vac_scan_next_block_parallel(vacrel, blkno, all_visible_according_to_vm);
+
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1137,7 +1358,7 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
next_block = vacrel->next_unskippable_block;
if (skipsallvis)
- vacrel->skippedallvis = true;
+ vacrel->counters->skippedallvis = true;
}
}
@@ -1210,11 +1431,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/*
* Caller must scan the last page to determine whether it has tuples
- * (caller must have the opportunity to set vacrel->nonempty_pages).
- * This rule avoids having lazy_truncate_heap() take access-exclusive
- * lock on rel to attempt a truncation that fails anyway, just because
- * there are tuples on the last page (it is likely that there will be
- * tuples on other nearby pages as well, but those can be skipped).
+ * (caller must have the opportunity to set
+ * vacrel->counters->nonempty_pages). This rule avoids having
+ * lazy_truncate_heap() take access-exclusive lock on rel to attempt a
+ * truncation that fails anyway, just because there are tuples on the
+ * last page (it is likely that there will be tuples on other nearby
+ * pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
@@ -1439,10 +1661,10 @@ lazy_scan_prune(LVRelState *vacrel,
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
- &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+ &vacrel->counters->NewRelfrozenXid, &vacrel->counters->NewRelminMxid);
Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
- Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+ Assert(TransactionIdIsValid(vacrel->counters->NewRelfrozenXid));
if (presult.nfrozen > 0)
{
@@ -1451,7 +1673,7 @@ lazy_scan_prune(LVRelState *vacrel,
* nfrozen == 0, since it only counts pages with newly frozen tuples
* (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->frozen_pages++;
+ vacrel->counters->frozen_pages++;
}
/*
@@ -1486,7 +1708,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if (presult.lpdead_items > 0)
{
- vacrel->lpdead_item_pages++;
+ vacrel->counters->lpdead_item_pages++;
/*
* deadoffsets are collected incrementally in
@@ -1501,15 +1723,15 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += presult.live_tuples;
- vacrel->recently_dead_tuples += presult.recently_dead_tuples;
+ vacrel->counters->tuples_deleted += presult.ndeleted;
+ vacrel->counters->tuples_frozen += presult.nfrozen;
+ vacrel->counters->lpdead_items += presult.lpdead_items;
+ vacrel->counters->live_tuples += presult.live_tuples;
+ vacrel->counters->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->counters->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
@@ -1659,8 +1881,8 @@ lazy_scan_noprune(LVRelState *vacrel,
missed_dead_tuples;
bool hastup;
HeapTupleHeader tupleheader;
- TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ TransactionId NoFreezePageRelfrozenXid = vacrel->counters->NewRelfrozenXid;
+ MultiXactId NoFreezePageRelminMxid = vacrel->counters->NewRelminMxid;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1787,8 +2009,8 @@ lazy_scan_noprune(LVRelState *vacrel,
* this particular page until the next VACUUM. Remember its details now.
* (lazy_scan_prune expects a clean slate, so we have to do this last.)
*/
- vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = NoFreezePageRelminMxid;
+ vacrel->counters->NewRelfrozenXid = NoFreezePageRelfrozenXid;
+ vacrel->counters->NewRelminMxid = NoFreezePageRelminMxid;
/* Save any LP_DEAD items found on the page in dead_items */
if (vacrel->nindexes == 0)
@@ -1815,25 +2037,25 @@ lazy_scan_noprune(LVRelState *vacrel,
* indexes will be deleted during index vacuuming (and then marked
* LP_UNUSED in the heap)
*/
- vacrel->lpdead_item_pages++;
+ vacrel->counters->lpdead_item_pages++;
dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
- vacrel->lpdead_items += lpdead_items;
+ vacrel->counters->lpdead_items += lpdead_items;
}
/*
* Finally, add relevant page-local counts to whole-VACUUM counts
*/
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
- vacrel->missed_dead_tuples += missed_dead_tuples;
+ vacrel->counters->live_tuples += live_tuples;
+ vacrel->counters->recently_dead_tuples += recently_dead_tuples;
+ vacrel->counters->missed_dead_tuples += missed_dead_tuples;
if (missed_dead_tuples > 0)
- vacrel->missed_dead_pages++;
+ vacrel->counters->missed_dead_pages++;
/* Can't truncate this page */
if (hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->counters->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
@@ -1862,7 +2084,7 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(vacrel->lpdead_item_pages > 0);
+ Assert(vacrel->counters->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
@@ -1896,7 +2118,7 @@ lazy_vacuum(LVRelState *vacrel)
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
- Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
+ Assert(vacrel->counters->lpdead_items == vacrel->dead_items_info->num_items);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -1923,7 +2145,7 @@ lazy_vacuum(LVRelState *vacrel)
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
- bypass = (vacrel->lpdead_item_pages < threshold &&
+ bypass = (vacrel->counters->lpdead_item_pages < threshold &&
(TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L)));
}
@@ -2061,7 +2283,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
* place).
*/
Assert(vacrel->num_index_scans > 0 ||
- vacrel->dead_items_info->num_items == vacrel->lpdead_items);
+ vacrel->dead_items_info->num_items == vacrel->counters->lpdead_items);
Assert(allindexes || VacuumFailsafeActive);
/*
@@ -2165,8 +2387,8 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
- (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
- vacuumed_pages == vacrel->lpdead_item_pages));
+ (vacrel->dead_items_info->num_items == vacrel->counters->lpdead_items &&
+ vacuumed_pages == vacrel->counters->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
@@ -2347,7 +2569,7 @@ static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
double reltuples = vacrel->new_rel_tuples;
- bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
+ bool estimated_count = vacrel->counters->scanned_pages < vacrel->rel_pages;
const int progress_start_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_INDEXES_TOTAL
@@ -2528,7 +2750,7 @@ should_attempt_truncation(LVRelState *vacrel)
if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
return false;
- possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
+ possibly_freeable = vacrel->rel_pages - vacrel->counters->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
@@ -2554,7 +2776,7 @@ lazy_truncate_heap(LVRelState *vacrel)
/* Update error traceback information one last time */
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
- vacrel->nonempty_pages, InvalidOffsetNumber);
+ vacrel->counters->nonempty_pages, InvalidOffsetNumber);
/*
* Loop until no more truncating can be done.
@@ -2655,7 +2877,7 @@ lazy_truncate_heap(LVRelState *vacrel)
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
- vacrel->removed_pages += orig_rel_pages - new_rel_pages;
+ vacrel->counters->removed_pages += orig_rel_pages - new_rel_pages;
vacrel->rel_pages = new_rel_pages;
ereport(vacrel->verbose ? INFO : DEBUG2,
@@ -2663,7 +2885,7 @@ lazy_truncate_heap(LVRelState *vacrel)
vacrel->relname,
orig_rel_pages, new_rel_pages)));
orig_rel_pages = new_rel_pages;
- } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
+ } while (new_rel_pages > vacrel->counters->nonempty_pages && lock_waiter_detected);
}
/*
@@ -2691,7 +2913,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
- while (blkno > vacrel->nonempty_pages)
+ while (blkno > vacrel->counters->nonempty_pages)
{
Buffer buf;
Page page;
@@ -2803,7 +3025,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
- return vacrel->nonempty_pages;
+ return vacrel->counters->nonempty_pages;
}
/*
@@ -2821,12 +3043,8 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- /*
- * Initialize state for a parallel vacuum. As of now, only one worker can
- * be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
- */
- if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
+ /* Initialize state for a parallel vacuum */
+ if (nworkers >= 0)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -2844,11 +3062,18 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
+ {
+ /*
+ * For parallel index vacuuming, only one worker can be used for
+ * an index, we invoke parallelism only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
+ }
/*
* If parallel mode started, dead_items and dead_items_info spaces are
@@ -2889,9 +3114,19 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
};
int64 prog_val[2];
+ /*
+ * Protect both dead_items and dead_items_info from concurrent updates in
+ * parallel heap scan cases.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreLockExclusive(dead_items);
+
TidStoreSetBlockOffsets(dead_items, blkno, offsets, num_offsets);
vacrel->dead_items_info->num_items += num_offsets;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreUnlock(dead_items);
+
/* update the progress information */
prog_val[0] = vacrel->dead_items_info->num_items;
prog_val[1] = TidStoreMemoryUsage(dead_items);
@@ -3093,6 +3328,357 @@ update_relstats_all_indexes(LVRelState *vacrel)
}
}
+/*
+ * Compute the number of parallel workers for parallel vacuum heap scan.
+ *
+ * The calculation logic is borrowed from compute_parallel_worker().
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ int parallel_workers = 0;
+ int heap_parallel_threshold;
+ int heap_pages;
+
+ if (nrequested == 0)
+ {
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. This probably needs to be a good deal more
+ * sophisticated, but we need something here for now. Note that the
+ * upper limit of the min_parallel_table_scan_size GUC is chosen to
+ * prevent overflow here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
+ {
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
+ }
+ }
+ else
+ parallel_workers = nrequested;
+
+ return parallel_workers;
+}
+
+/*
+ * Compute the amount of space we'll need in the parallel heap vacuum
+ * DSM, and inform pcxt->estimator about our needs.
+ *
+ * nworkers is the number of workers for the table vacuum. Note that it could
+ * be different than pcxt->nworkers since it is the maximum of number of
+ * workers for table vacuum and index vacuum.
+ */
+void
+heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ Size size = 0;
+ LVRelState *vacrel = (LVRelState *) state;
+
+ /* space for PHVShared */
+ size = add_size(size, SizeOfPHVShared);
+ size = add_size(size, mul_size(sizeof(LVRelCounters), nworkers));
+ vacrel->shared_len = size;
+ shm_toc_estimate_chunk(&pcxt->estimator, size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ vacrel->pscan_len = table_block_parallelscan_estimate(rel);
+ shm_toc_estimate_chunk(&pcxt->estimator, vacrel->pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ vacrel->pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+ shm_toc_estimate_chunk(&pcxt->estimator, vacrel->pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up shared memory for parallel heap vacuum.
+ */
+void
+heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ LVRelState *vacrel = (LVRelState *) state;
+ ParallelBlockTableScanDesc pscan;
+ PHVScanWorkerState *pscanwork;
+ PHVShared *shared;
+ PHVState *phvstate;
+
+ phvstate = (PHVState *) palloc(sizeof(PHVState));
+
+ shared = shm_toc_allocate(pcxt->toc, vacrel->shared_len);
+
+ /* Prepare the shared information */
+
+ MemSet(shared, 0, vacrel->shared_len);
+ shared->aggressive = vacrel->aggressive;
+ shared->skipwithvm = vacrel->skipwithvm;
+ shared->cutoffs = vacrel->cutoffs;
+ shared->NewRelfrozenXid = vacrel->counters->NewRelfrozenXid;
+ shared->NewRelminMxid = vacrel->counters->NewRelminMxid;
+ shared->skippedallvis = vacrel->counters->skippedallvis;
+
+ /*
+ * XXX: we copy the contents of vistest to the shared area, but in order
+ * to do that, we need to either expose GlobalVisTest or to provide
+ * functions to copy contents of GlobalVisTest to somewhere. Currently we
+ * do the former but not sure it's the best choice.
+ *
+ * Alternative idea is to have each worker determine cutoff and have their
+ * own vistest. But we need to carefully consider it since parallel
+ * workers end up having different cutoff and horizon.
+ */
+ shared->vistest = *vacrel->vistest;
+
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_SHARED, shared);
+
+ phvstate->shared = shared;
+
+ /* prepare the parallel block table scan description */
+ pscan = shm_toc_allocate(pcxt->toc, vacrel->pscan_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC, pscan);
+
+ /* initialize parallel scan description */
+ table_block_parallelscan_initialize(rel, (ParallelTableScanDesc) pscan);
+ phvstate->pscandesc = pscan;
+
+ /* prepare the workers' parallel block table scan state */
+ pscanwork = shm_toc_allocate(pcxt->toc, vacrel->pscanwork_len);
+ MemSet(pscanwork, 0, vacrel->pscanwork_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC_WORKER, pscanwork);
+ phvstate->scanstates = pscanwork;
+
+ vacrel->phvstate = phvstate;
+}
+
+/*
+ * Main function for parallel heap vacuum workers.
+ */
+void
+heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ LVRelState vacrel = {0};
+ PHVState *phvstate;
+ PHVShared *shared;
+ ParallelBlockTableScanDesc pscandesc;
+ PHVScanWorkerState *scanstate;
+ LVRelCounters *counters;
+ bool scan_done;
+
+ phvstate = palloc(sizeof(PHVState));
+
+ pscandesc = (ParallelBlockTableScanDesc) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC,
+ false);
+ phvstate->pscandesc = pscandesc;
+
+ shared = (PHVShared *) shm_toc_lookup(pwcxt->toc, LV_PARALLEL_SCAN_SHARED,
+ false);
+ phvstate->shared = shared;
+
+ scanstate = (PHVScanWorkerState *) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC_WORKER,
+ false);
+
+ phvstate->myscanstate = &(scanstate[ParallelWorkerNumber]);
+ counters = &(shared->worker_relcnts[ParallelWorkerNumber]);
+
+ /* Prepare LVRelState */
+ vacrel.rel = rel;
+ vacrel.indrels = parallel_vacuum_get_table_indexes(pvs, &vacrel.nindexes);
+ vacrel.pvs = pvs;
+ vacrel.phvstate = phvstate;
+ vacrel.aggressive = shared->aggressive;
+ vacrel.skipwithvm = shared->skipwithvm;
+ vacrel.cutoffs = shared->cutoffs;
+ vacrel.vistest = &(shared->vistest);
+ vacrel.dead_items = parallel_vacuum_get_dead_items(pvs,
+ &vacrel.dead_items_info);
+ vacrel.rel_pages = RelationGetNumberOfBlocks(rel);
+ vacrel.counters = counters;
+
+ /* initialize per-worker relation statistics */
+ MemSet(counters, 0, sizeof(LVRelCounters));
+
+ vacrel.counters->NewRelfrozenXid = shared->NewRelfrozenXid;
+ vacrel.counters->NewRelminMxid = shared->NewRelminMxid;
+ vacrel.counters->skippedallvis = shared->skippedallvis;
+
+ /*
+ * XXX: the following fields are not set yet: - index vacuum related
+ * fields such as consider_bypass_optimization, do_index_vacuuming etc. -
+ * error reporting state. - statistics such as scanned_pages etc. - oldest
+ * extant XID/MXID. - states maintained by heap_vac_scan_next_block()
+ */
+
+ /* Initialize the start block if not yet */
+ if (!phvstate->myscanstate->maybe_have_blocks)
+ {
+ table_block_parallelscan_startblock_init(rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
+
+ phvstate->myscanstate->maybe_have_blocks = false;
+ }
+
+ /*
+ * XXX: if we want to support parallel heap *vacuum*, we need to allow
+ * workers to call different function based on the shared information.
+ */
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+}
+
+/*
+ * Complete parallel heaps scans that have remaining blocks in their
+ * chunks.
+ */
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)
+{
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+
+ nworkers = parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
+ for (int i = 0; i < nworkers; i++)
+ {
+ PHVScanWorkerState *wstate = &(vacrel->phvstate->scanstates[i]);
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ if (!wstate->maybe_have_blocks)
+ continue;
+
+ vacrel->phvstate->myscanstate = wstate;
+
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ Assert(scan_done);
+ }
+}
+
+/*
+ * Accumulate relation counters that parallel workers collected into the
+ * leader's counters.
+ */
+static void
+parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelCounters *counters = &(phvstate->shared->worker_relcnts[i]);
+
+#define LV_ACCUM_ITEM(item) (vacrel)->counters->item += (counters)->item
+
+ LV_ACCUM_ITEM(scanned_pages);
+ LV_ACCUM_ITEM(removed_pages);
+ LV_ACCUM_ITEM(frozen_pages);
+ LV_ACCUM_ITEM(lpdead_item_pages);
+ LV_ACCUM_ITEM(missed_dead_pages);
+ LV_ACCUM_ITEM(nonempty_pages);
+ LV_ACCUM_ITEM(tuples_deleted);
+ LV_ACCUM_ITEM(tuples_frozen);
+ LV_ACCUM_ITEM(lpdead_items);
+ LV_ACCUM_ITEM(live_tuples);
+ LV_ACCUM_ITEM(recently_dead_tuples);
+ LV_ACCUM_ITEM(missed_dead_tuples);
+
+#undef LV_ACCUM_ITEM
+
+ if (TransactionIdPrecedes(counters->NewRelfrozenXid, vacrel->counters->NewRelfrozenXid))
+ vacrel->counters->NewRelfrozenXid = counters->NewRelfrozenXid;
+
+ if (MultiXactIdPrecedesOrEquals(counters->NewRelminMxid, vacrel->counters->NewRelminMxid))
+ vacrel->counters->NewRelminMxid = counters->NewRelminMxid;
+
+ if (!vacrel->counters->skippedallvis && counters->skippedallvis)
+ vacrel->counters->skippedallvis = true;
+ }
+}
+
+/*
+ * A parallel variant of do_lazy_scan_heap(). The leader process launches parallel
+ * workers to scan the heap in parallel.
+ */
+static void
+do_parallel_lazy_scan_heap(LVRelState *vacrel)
+{
+ PHVScanWorkerState *scanstate;
+ TidStore *dead_items = vacrel->dead_items;
+ VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* launcher workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ /* initialize parallel scan description to join as a worker */
+ scanstate = palloc(sizeof(PHVScanWorkerState));
+ table_block_parallelscan_startblock_init(vacrel->rel, &(scanstate->state),
+ vacrel->phvstate->pscandesc);
+ vacrel->phvstate->myscanstate = scanstate;
+
+ for (;;)
+ {
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * Scan the table until either we are close to overrunning the
+ * available space for dead_items TIDs or we reach the end of the
+ * table.
+ */
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* stop parallel workers and gather the collected stats */
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+ parallel_heap_vacuum_gather_scan_stats(vacrel);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ Assert(scan_done);
+ break;
+ }
+
+ parallel_heap_complete_unfinised_scan(vacrel);
+}
+
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index f26070bff2..e1759da69a 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -28,6 +28,7 @@
#include "access/amapi.h"
#include "access/table.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
@@ -64,6 +65,12 @@ typedef struct PVShared
Oid relid;
int elevel;
+ /*
+ * True if the caller wants parallel workers to invoke vacuum table scan
+ * callback.
+ */
+ bool do_vacuum_table_scan;
+
/*
* Fields for both index vacuum and cleanup.
*
@@ -163,6 +170,9 @@ struct ParallelVacuumState
/* NULL for worker processes */
ParallelContext *pcxt;
+ /* Passed to parallel table scan workers. NULL for leader process */
+ ParallelWorkerContext *pwcxt;
+
/* Parent Heap Relation */
Relation heaprel;
@@ -192,6 +202,16 @@ struct ParallelVacuumState
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /*
+ * The number of workers for parallel table scan/vacuuming and index
+ * vacuuming, respectively.
+ */
+ int nworkers_for_table;
+ int nworkers_for_index;
+
+ /* How many parallel table vacuum scan is called? */
+ int num_table_scans;
+
/*
* False if the index is totally unsuitable target for all parallel
* processing. For example, the index could be <
@@ -220,8 +240,9 @@ struct ParallelVacuumState
PVIndVacStatus status;
};
-static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum);
+static void parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum);
static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
@@ -241,7 +262,7 @@ static void parallel_vacuum_error_callback(void *arg);
ParallelVacuumState *
parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
int nrequested_workers, int vac_work_mem,
- int elevel, BufferAccessStrategy bstrategy)
+ int elevel, BufferAccessStrategy bstrategy, void *state)
{
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -255,6 +276,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
Size est_shared_len;
int nindexes_mwm = 0;
int parallel_workers = 0;
+ int nworkers_table;
+ int nworkers_index;
int querylen;
/*
@@ -262,15 +285,17 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
* relation
*/
Assert(nrequested_workers >= 0);
- Assert(nindexes > 0);
/*
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
- nrequested_workers,
- will_parallel_vacuum);
+ parallel_vacuum_compute_workers(rel, indrels, nindexes, nrequested_workers,
+ &nworkers_table, &nworkers_index,
+ will_parallel_vacuum);
+
+ parallel_workers = Max(nworkers_table, nworkers_index);
+
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- return NULL */
@@ -284,6 +309,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
pvs->will_parallel_vacuum = will_parallel_vacuum;
pvs->bstrategy = bstrategy;
pvs->heaprel = rel;
+ pvs->nworkers_for_table = nworkers_table;
+ pvs->nworkers_for_index = nworkers_index;
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
@@ -326,6 +353,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
else
querylen = 0; /* keep compiler quiet */
+ /* Estimate AM-specific space for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_estimate(rel, pcxt, nworkers_table, state);
+
InitializeParallelDSM(pcxt);
/* Prepare index vacuum stats */
@@ -417,6 +448,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+ /* Prepare AM-specific DSM for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_initialize(rel, pcxt, nworkers_table, state);
+
/* Success -- return parallel vacuum state */
return pvs;
}
@@ -538,27 +573,41 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
* min_parallel_index_scan_size as invoking workers for very small indexes
* can hurt performance.
*
+ * XXX needs to mention about the number of workers for table.
+ *
* nrequested is the number of parallel workers that user requested. If
* nrequested is 0, we compute the parallel degree based on nindexes, that is
* the number of indexes that support parallel vacuum. This function also
* sets will_parallel_vacuum to remember indexes that participate in parallel
* vacuum.
*/
-static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum)
+static void
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
+
+ *nworkers_table = 0;
+ *nworkers_index = 0;
/*
* We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
+ return;
+
+ /*
+ * Compute the number of workers for parallel table scan. Cap by
+ * max_parallel_maintenance_workers.
+ */
+ parallel_workers_table = Min(table_paralle_vacuum_compute_workers(rel, nrequested),
+ max_parallel_maintenance_workers);
/*
* Compute the number of indexes that can participate in parallel vacuum.
@@ -589,17 +638,18 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
nindexes_parallel--;
/* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Compute the parallel degree for parallel index vacuum */
+ parallel_workers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers_index = Min(parallel_workers_index, max_parallel_maintenance_workers);
+ }
- return parallel_workers;
+ *nworkers_table = parallel_workers_table;
+ *nworkers_index = parallel_workers_index;
}
/*
@@ -669,7 +719,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -978,6 +1028,120 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
return true;
}
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Do table vacuum scan */
+ table_parallel_vacuum_scan(pvs->heaprel, pvs, pvs->pwcxt);
+
+ /*
+ * We have completed the table vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Prepare DSM and vacuum delay, and launch parallel workers for parallel
+ * table vacuum scan.
+ */
+int
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->nworkers_for_table == 0)
+ return 0;
+
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ pvs->shared->do_vacuum_table_scan = true;
+
+ if (pvs->num_table_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ ReinitializeParallelWorkers(pvs->pcxt, pvs->nworkers_for_table);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have already
+ * accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for table scanning (planned: %d)",
+ "launched %d parallel vacuum workers for table scanning (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, pvs->nworkers_for_table)));
+
+ return pvs->pcxt->nworkers_launched;
+}
+
+/*
+ * Wait for all workers for parallel table vacuum scan, and gather statistics.
+ */
+void
+parallel_vacuum_table_scan_end(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->nworkers_for_table == 0)
+ return;
+
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->shared->do_vacuum_table_scan = false;
+ pvs->num_table_scans++;
+}
+
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
+{
+ *nindexes = pvs->nindexes;
+
+ return pvs->indrels;
+}
+
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->nworkers_for_table;
+}
+
/*
* Perform work within a launched parallel process.
*
@@ -1026,7 +1190,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
* matched to the leader's one.
*/
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
if (shared->maintenance_work_mem_worker > 0)
maintenance_work_mem = shared->maintenance_work_mem_worker;
@@ -1060,6 +1223,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.relname = pstrdup(RelationGetRelationName(rel));
pvs.heaprel = rel;
+ pvs.pwcxt = palloc(sizeof(ParallelWorkerContext));
+ pvs.pwcxt->toc = toc;
+ pvs.pwcxt->seg = seg;
+
/* These fields will be filled during index vacuum or cleanup */
pvs.indname = NULL;
pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
@@ -1077,8 +1244,15 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
/* Prepare to track buffer usage during parallel execution */
InstrStartParallelQuery();
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+ }
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index af3b15e93d..63c2548c54 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -164,15 +164,6 @@ typedef struct ProcArrayStruct
*
* The typedef is in the header.
*/
-struct GlobalVisState
-{
- /* XIDs >= are considered running by some backend */
- FullTransactionId definitely_needed;
-
- /* XIDs < are not considered to be running by any backend */
- FullTransactionId maybe_needed;
-};
-
/*
* Result of ComputeXidHorizons().
*/
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec88a6..a80b3a17bf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -20,6 +20,7 @@
#include "access/skey.h"
#include "access/table.h" /* for backward compatibility */
#include "access/tableam.h"
+#include "commands/vacuum.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -393,6 +394,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
+extern int heap_parallel_vacuum_compute_workers(Relation rel, int requested);
+extern void heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da661289c1..e1bacc95cd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
#include "access/relscan.h"
#include "access/sdir.h"
#include "access/xact.h"
+#include "commands/vacuum.h"
#include "executor/tuptable.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
@@ -655,6 +656,46 @@ typedef struct TableAmRoutine
struct VacuumParams *params,
BufferAccessStrategy bstrategy);
+ /* ------------------------------------------------------------------------
+ * Callbacks for parallel table vacuum.
+ * ------------------------------------------------------------------------
+ */
+
+ /*
+ * Compute the number of parallel workers for parallel table vacuum. The
+ * function must return 0 to disable parallel table vacuum.
+ */
+ int (*parallel_vacuum_compute_workers) (Relation rel, int requested);
+
+ /*
+ * Compute the amount of DSM space AM need in the parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_estimate) (Relation rel,
+ ParallelContext *pcxt,
+ int nworkers,
+ void *state);
+
+ /*
+ * Initialize DSM space for parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_initialize) (Relation rel,
+ ParallelContext *pctx,
+ int nworkers,
+ void *state);
+
+ /*
+ * This callback is called for parallel table vacuum workers.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_scan_worker) (Relation rel,
+ ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
+
/*
* Prepare to analyze block `blockno` of `scan`. The scan has been started
* with table_beginscan_analyze(). See also
@@ -1710,6 +1751,33 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
}
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
+
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+static inline void
+table_parallel_vacuum_scan(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_scan_worker(rel, pvs, pwcxt);
+}
+
/*
* Prepare to analyze the next block in the read stream. The scan needs to
* have been started with table_beginscan_analyze(). Note that this routine
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 759f9a87d3..e665335b6f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -360,7 +360,8 @@ extern void VacuumUpdateCosts(void);
extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
int nindexes, int nrequested_workers,
int vac_work_mem, int elevel,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy,
+ void *state);
extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
@@ -372,6 +373,10 @@ extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
int num_index_scans,
bool estimated_count);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
+extern Relation *parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 9398a84051..6ccb19a29f 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -102,8 +102,20 @@ extern char *ExportSnapshot(Snapshot snapshot);
/*
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
+ *
+ * XXX the struct definition is temporarily moved from procarray.c for
+ * parallel table vacuum development. We need to find a suitable way for
+ * parallel table vacuum workers to share the GlobalVisState.
*/
-typedef struct GlobalVisState GlobalVisState;
+typedef struct GlobalVisState
+{
+ /* XIDs >= are considered running by some backend */
+ FullTransactionId definitely_needed;
+
+ /* XIDs < are not considered to be running by any backend */
+ FullTransactionId maybe_needed;
+} GlobalVisState;
+
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
On Fri, Jun 28, 2024 at 9:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 28, 2024 at 9:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
# Benchmark results
* Test-1: parallel heap scan on the table without indexes
I created 20GB table, made garbage on the table, and run vacuum while
changing parallel degree:create unlogged table test (a int) with (autovacuum_enabled = off);
insert into test select generate_series(1, 600000000); --- 20GB table
delete from test where a % 5 = 0;
vacuum (verbose, parallel 0) test;Here are the results (total time and heap scan time):
PARALLEL 0: 21.99 s (single process)
PARALLEL 1: 11.39 s
PARALLEL 2: 8.36 s
PARALLEL 3: 6.14 s
PARALLEL 4: 5.08 s* Test-2: parallel heap scan on the table with one index
I used a similar table to the test case 1 but created one btree index on it:
create unlogged table test (a int) with (autovacuum_enabled = off);
insert into test select generate_series(1, 600000000); --- 20GB table
create index on test (a);
delete from test where a % 5 = 0;
vacuum (verbose, parallel 0) test;I've measured the total execution time as well as the time of each
vacuum phase (from left heap scan time, index vacuum time, and heap
vacuum time):PARALLEL 0: 45.11 s (21.89, 16.74, 6.48)
PARALLEL 1: 42.13 s (12.75, 22.04, 7.23)
PARALLEL 2: 39.27 s (8.93, 22.78, 7.45)
PARALLEL 3: 36.53 s (6.76, 22.00, 7.65)
PARALLEL 4: 35.84 s (5.85, 22.04, 7.83)Overall, I can see the parallel heap scan in lazy vacuum has a decent
scalability; In both test-1 and test-2, the execution time of heap
scan got ~4x faster with 4 parallel workers. On the other hand, when
it comes to the total vacuum execution time, I could not see much
performance improvement in test-2 (45.11 vs. 35.84). Looking at the
results PARALLEL 0 vs. PARALLEL 1 in test-2, the heap scan got faster
(21.89 vs. 12.75) whereas index vacuum got slower (16.74 vs. 22.04),
and heap scan in case 2 was not as fast as in case 1 with 1 parallel
worker (12.75 vs. 11.39).I think the reason is the shared TidStore is not very scalable since
we have a single lock on it. In all cases in the test-1, we don't use
the shared TidStore since all dead tuples are removed during heap
pruning. So the scalability was better overall than in test-2. In
parallel 0 case in test-2, we use the local TidStore, and from
parallel degree of 1 in test-2, we use the shared TidStore and
parallel worker concurrently update it. Also, I guess that the lookup
performance of the local TidStore is better than the shared TidStore's
lookup performance because of the differences between a bump context
and an DSA area. I think that this difference contributed the fact
that index vacuuming got slower (16.74 vs. 22.04).
Thank you for the comments!
There are two obvious improvement ideas to improve overall vacuum
execution time: (1) improve the shared TidStore scalability and (2)
support parallel heap vacuum. For (1), several ideas are proposed by
the ART authors[1]. I've not tried these ideas but it might be
applicable to our ART implementation. But I prefer to start with (2)
since it would be easier. Feedback is very welcome.Starting with (2) sounds like a reasonable approach. We should study a
few more things like (a) the performance results where there are 3-4
indexes,
Here are the results with 4 indexes (and restarting the server before
the benchmark):
PARALLEL 0: 115.48 s (32.76, 64.46, 18.24)
PARALLEL 1: 74.88 s (17.11, 44.43, 13.25)
PARALLEL 2: 71.15 s (14.13, 44.82, 12.12)
PARALLEL 3: 46.78 s (10.74, 24.50, 11.43)
PARALLEL 4: 46.42 s (8.95, 24.96, 12.39) (launched 4 workers for heap
scan and 3 workers for index vacuum)
(b) What is the reason for performance improvement seen with
only heap scans. We normally get benefits of parallelism because of
using multiple CPUs but parallelizing scans (I/O) shouldn't give much
benefits. Is it possible that you are seeing benefits because most of
the data is either in shared_buffers or in memory? We can probably try
vacuuming tables by restarting the nodes to ensure the data is not in
memory.
I think it depends on the storage performance. FYI I use an EC2
instance (m6id.metal).
I've run the same benchmark script (table with no index) with
restarting the server before executing the vacuum, and here are the
results:
PARALLEL 0: 32.75 s
PARALLEL 1: 17.46 s
PARALLEL 2: 13.41 s
PARALLEL 3: 10.31 s
PARALLEL 4: 8.48 s
With the above two tests, I used the updated patch that I just submitted[1]/messages/by-id/CAD21AoAWHHnCg9OvtoEJnnvCc-3isyOyAGn+2KYoSXEv=vXauw@mail.gmail.com.
Regards,
[1]: /messages/by-id/CAD21AoAWHHnCg9OvtoEJnnvCc-3isyOyAGn+2KYoSXEv=vXauw@mail.gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Dear Sawada-san,
Thank you for the test!
I could reproduce this issue and it's a bug; it skipped even
non-all-visible pages. I've attached the new version patch.BTW since we compute the number of parallel workers for the heap scan
based on the table size, it's possible that we launch multiple workers
even if most blocks are all-visible. It seems to be better if we
calculate it based on (relpages - relallvisible).
Thanks for updating the patch. I applied and confirmed all pages are scanned.
I used almost the same script (just changed max_parallel_maintenance_workers)
and got below result. I think the tendency was the same as yours.
```
parallel 0: 61114.369 ms
parallel 1: 34870.316 ms
parallel 2: 23598.115 ms
parallel 3: 17732.363 ms
parallel 4: 15203.271 ms
parallel 5: 13836.025 ms
```
I started to read your codes but takes much time because I've never seen before...
Below part contains initial comments.
1.
This patch cannot be built when debug mode is enabled. See [1]``` vacuumlazy.c: In function ‘lazy_scan_prune’: vacuumlazy.c:1666:34: error: ‘LVRelState’ {aka ‘struct LVRelState’} has no member named ‘NewRelminMxid’ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid)); ^~ .... ```.
IIUC, this was because NewRelminMxid was moved from struct LVRelState to PHVShared.
So you should update like " vacrel->counters->NewRelminMxid".
2.
I compared parallel heap scan and found that it does not have compute_worker API.
Can you clarify the reason why there is an inconsistency?
(I feel it is intentional because the calculation logic seems to depend on the heap structure,
so should we add the API for table scan as well?)
[1]: ``` vacuumlazy.c: In function ‘lazy_scan_prune’: vacuumlazy.c:1666:34: error: ‘LVRelState’ {aka ‘struct LVRelState’} has no member named ‘NewRelminMxid’ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid)); ^~ .... ```
```
vacuumlazy.c: In function ‘lazy_scan_prune’:
vacuumlazy.c:1666:34: error: ‘LVRelState’ {aka ‘struct LVRelState’} has no member named ‘NewRelminMxid’
Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
^~
....
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Thu, Jul 25, 2024 at 2:58 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Sawada-san,
Thank you for the test!
I could reproduce this issue and it's a bug; it skipped even
non-all-visible pages. I've attached the new version patch.BTW since we compute the number of parallel workers for the heap scan
based on the table size, it's possible that we launch multiple workers
even if most blocks are all-visible. It seems to be better if we
calculate it based on (relpages - relallvisible).Thanks for updating the patch. I applied and confirmed all pages are scanned.
I used almost the same script (just changed max_parallel_maintenance_workers)
and got below result. I think the tendency was the same as yours.```
parallel 0: 61114.369 ms
parallel 1: 34870.316 ms
parallel 2: 23598.115 ms
parallel 3: 17732.363 ms
parallel 4: 15203.271 ms
parallel 5: 13836.025 ms
```
Thank you for testing!
I started to read your codes but takes much time because I've never seen before...
Below part contains initial comments.1.
This patch cannot be built when debug mode is enabled. See [1].
IIUC, this was because NewRelminMxid was moved from struct LVRelState to PHVShared.
So you should update like " vacrel->counters->NewRelminMxid".
Right, will fix.
2.
I compared parallel heap scan and found that it does not have compute_worker API.
Can you clarify the reason why there is an inconsistency?
(I feel it is intentional because the calculation logic seems to depend on the heap structure,
so should we add the API for table scan as well?)
There is room to consider a better API design, but yes, the reason is
that the calculation logic depends on table AM implementation. For
example, I thought it might make sense to consider taking the number
of all-visible pages into account for the calculation of the number of
parallel workers as we don't want to launch many workers on the table
where most pages are all-visible. Which might not work for other table
AMs.
I'm updating the patch to implement parallel heap vacuum and will
share the updated patch. It might take time as it requires to
implement shared iteration support in radx tree.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Dear Sawada-san,
Thank you for testing!
I tried to profile the vacuuming with the larger case (40 workers for the 20G table)
and attached FlameGraph showed the result. IIUC, I cannot find bottlenecks.
2.
I compared parallel heap scan and found that it does not have compute_workerAPI.
Can you clarify the reason why there is an inconsistency?
(I feel it is intentional because the calculation logic seems to depend on theheap structure,
so should we add the API for table scan as well?)
There is room to consider a better API design, but yes, the reason is
that the calculation logic depends on table AM implementation. For
example, I thought it might make sense to consider taking the number
of all-visible pages into account for the calculation of the number of
parallel workers as we don't want to launch many workers on the table
where most pages are all-visible. Which might not work for other table
AMs.
Okay, thanks for confirming. I wanted to ask others as well.
I'm updating the patch to implement parallel heap vacuum and will
share the updated patch. It might take time as it requires to
implement shared iteration support in radx tree.
Here are other preliminary comments for v2 patch. This does not contain
cosmetic ones.
1.
Shared data structure PHVShared does not contain the mutex lock. Is it intentional
because they are accessed by leader only after parallel workers exit?
2.
Per my understanding, the vacuuming goes like below steps.
a. paralell workers are launched for scanning pages
b. leader waits until scans are done
c. leader does vacuum alone (you may extend here...)
d. parallel workers are launched again to cleanup indeces
If so, can we reuse parallel workers for the cleanup? Or, this is painful
engineering than the benefit?
3.
According to LaunchParallelWorkers(), the bgw_name and bgw_type are hardcoded as
"parallel worker ..." Can we extend this to improve the trackability in the
pg_stat_activity?
4.
I'm not the expert TidStore, but as you said TidStoreLockExclusive() might be a
bottleneck when tid is added to the shared TidStore. My another primitive idea
is that to prepare per-worker TidStores (in the PHVScanWorkerState or LVRelCounters?)
and gather after the heap scanning. If you extend like parallel workers do vacuuming,
the gathering may not be needed: they can access own TidStore and clean up.
One downside is that the memory consumption may be quite large.
How do you think?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
40_flamegraph.svgapplication/octet-stream; name=40_flamegraph.svgDownload
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" width="1200" height="854" onload="init(evt)" viewBox="0 0 1200 854" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.html for examples. -->
<!-- NOTES: -->
<defs>
<linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
<stop stop-color="#eeeeee" offset="5%" />
<stop stop-color="#eeeeb0" offset="95%" />
</linearGradient>
</defs>
<style type="text/css">
text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); }
#search, #ignorecase { opacity:0.1; cursor:pointer; }
#search:hover, #search.show, #ignorecase:hover, #ignorecase.show { opacity:1; }
#subtitle { text-anchor:middle; font-color:rgb(160,160,160); }
#title { text-anchor:middle; font-size:17px}
#unzoom { cursor:pointer; }
#frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; }
.hide { display:none; }
.parent { opacity:0.5; }
</style>
<script type="text/ecmascript">
<![CDATA[
"use strict";
var details, searchbtn, unzoombtn, matchedtxt, svg, searching, currentSearchTerm, ignorecase, ignorecaseBtn;
function init(evt) {
details = document.getElementById("details").firstChild;
searchbtn = document.getElementById("search");
ignorecaseBtn = document.getElementById("ignorecase");
unzoombtn = document.getElementById("unzoom");
matchedtxt = document.getElementById("matched");
svg = document.getElementsByTagName("svg")[0];
searching = 0;
currentSearchTerm = null;
// use GET parameters to restore a flamegraphs state.
var params = get_params();
if (params.x && params.y)
zoom(find_group(document.querySelector('[x="' + params.x + '"][y="' + params.y + '"]')));
if (params.s) search(params.s);
}
// event listeners
window.addEventListener("click", function(e) {
var target = find_group(e.target);
if (target) {
if (target.nodeName == "a") {
if (e.ctrlKey === false) return;
e.preventDefault();
}
if (target.classList.contains("parent")) unzoom(true);
zoom(target);
if (!document.querySelector('.parent')) {
// we have basically done a clearzoom so clear the url
var params = get_params();
if (params.x) delete params.x;
if (params.y) delete params.y;
history.replaceState(null, null, parse_params(params));
unzoombtn.classList.add("hide");
return;
}
// set parameters for zoom state
var el = target.querySelector("rect");
if (el && el.attributes && el.attributes.y && el.attributes._orig_x) {
var params = get_params()
params.x = el.attributes._orig_x.value;
params.y = el.attributes.y.value;
history.replaceState(null, null, parse_params(params));
}
}
else if (e.target.id == "unzoom") clearzoom();
else if (e.target.id == "search") search_prompt();
else if (e.target.id == "ignorecase") toggle_ignorecase();
}, false)
// mouse-over for info
// show
window.addEventListener("mouseover", function(e) {
var target = find_group(e.target);
if (target) details.nodeValue = "Function: " + g_to_text(target);
}, false)
// clear
window.addEventListener("mouseout", function(e) {
var target = find_group(e.target);
if (target) details.nodeValue = ' ';
}, false)
// ctrl-F for search
// ctrl-I to toggle case-sensitive search
window.addEventListener("keydown",function (e) {
if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) {
e.preventDefault();
search_prompt();
}
else if (e.ctrlKey && e.keyCode === 73) {
e.preventDefault();
toggle_ignorecase();
}
}, false)
// functions
function get_params() {
var params = {};
var paramsarr = window.location.search.substr(1).split('&');
for (var i = 0; i < paramsarr.length; ++i) {
var tmp = paramsarr[i].split("=");
if (!tmp[0] || !tmp[1]) continue;
params[tmp[0]] = decodeURIComponent(tmp[1]);
}
return params;
}
function parse_params(params) {
var uri = "?";
for (var key in params) {
uri += key + '=' + encodeURIComponent(params[key]) + '&';
}
if (uri.slice(-1) == "&")
uri = uri.substring(0, uri.length - 1);
if (uri == '?')
uri = window.location.href.split('?')[0];
return uri;
}
function find_child(node, selector) {
var children = node.querySelectorAll(selector);
if (children.length) return children[0];
}
function find_group(node) {
var parent = node.parentElement;
if (!parent) return;
if (parent.id == "frames") return node;
return find_group(parent);
}
function orig_save(e, attr, val) {
if (e.attributes["_orig_" + attr] != undefined) return;
if (e.attributes[attr] == undefined) return;
if (val == undefined) val = e.attributes[attr].value;
e.setAttribute("_orig_" + attr, val);
}
function orig_load(e, attr) {
if (e.attributes["_orig_"+attr] == undefined) return;
e.attributes[attr].value = e.attributes["_orig_" + attr].value;
e.removeAttribute("_orig_"+attr);
}
function g_to_text(e) {
var text = find_child(e, "title").firstChild.nodeValue;
return (text)
}
function g_to_func(e) {
var func = g_to_text(e);
// if there's any manipulation we want to do to the function
// name before it's searched, do it here before returning.
return (func);
}
function update_text(e) {
var r = find_child(e, "rect");
var t = find_child(e, "text");
var w = parseFloat(r.attributes.width.value) -3;
var txt = find_child(e, "title").textContent.replace(/\([^(]*\)$/,"");
t.attributes.x.value = parseFloat(r.attributes.x.value) + 3;
// Smaller than this size won't fit anything
if (w < 2 * 12 * 0.59) {
t.textContent = "";
return;
}
t.textContent = txt;
var sl = t.getSubStringLength(0, txt.length);
// check if only whitespace or if we can fit the entire string into width w
if (/^ *$/.test(txt) || sl < w)
return;
// this isn't perfect, but gives a good starting point
// and avoids calling getSubStringLength too often
var start = Math.floor((w/sl) * txt.length);
for (var x = start; x > 0; x = x-2) {
if (t.getSubStringLength(0, x + 2) <= w) {
t.textContent = txt.substring(0, x) + "..";
return;
}
}
t.textContent = "";
}
// zoom
function zoom_reset(e) {
if (e.attributes != undefined) {
orig_load(e, "x");
orig_load(e, "width");
}
if (e.childNodes == undefined) return;
for (var i = 0, c = e.childNodes; i < c.length; i++) {
zoom_reset(c[i]);
}
}
function zoom_child(e, x, ratio) {
if (e.attributes != undefined) {
if (e.attributes.x != undefined) {
orig_save(e, "x");
e.attributes.x.value = (parseFloat(e.attributes.x.value) - x - 10) * ratio + 10;
if (e.tagName == "text")
e.attributes.x.value = find_child(e.parentNode, "rect[x]").attributes.x.value + 3;
}
if (e.attributes.width != undefined) {
orig_save(e, "width");
e.attributes.width.value = parseFloat(e.attributes.width.value) * ratio;
}
}
if (e.childNodes == undefined) return;
for (var i = 0, c = e.childNodes; i < c.length; i++) {
zoom_child(c[i], x - 10, ratio);
}
}
function zoom_parent(e) {
if (e.attributes) {
if (e.attributes.x != undefined) {
orig_save(e, "x");
e.attributes.x.value = 10;
}
if (e.attributes.width != undefined) {
orig_save(e, "width");
e.attributes.width.value = parseInt(svg.width.baseVal.value) - (10 * 2);
}
}
if (e.childNodes == undefined) return;
for (var i = 0, c = e.childNodes; i < c.length; i++) {
zoom_parent(c[i]);
}
}
function zoom(node) {
var attr = find_child(node, "rect").attributes;
var width = parseFloat(attr.width.value);
var xmin = parseFloat(attr.x.value);
var xmax = parseFloat(xmin + width);
var ymin = parseFloat(attr.y.value);
var ratio = (svg.width.baseVal.value - 2 * 10) / width;
// XXX: Workaround for JavaScript float issues (fix me)
var fudge = 0.0001;
unzoombtn.classList.remove("hide");
var el = document.getElementById("frames").children;
for (var i = 0; i < el.length; i++) {
var e = el[i];
var a = find_child(e, "rect").attributes;
var ex = parseFloat(a.x.value);
var ew = parseFloat(a.width.value);
var upstack;
// Is it an ancestor
if (0 == 0) {
upstack = parseFloat(a.y.value) > ymin;
} else {
upstack = parseFloat(a.y.value) < ymin;
}
if (upstack) {
// Direct ancestor
if (ex <= xmin && (ex+ew+fudge) >= xmax) {
e.classList.add("parent");
zoom_parent(e);
update_text(e);
}
// not in current path
else
e.classList.add("hide");
}
// Children maybe
else {
// no common path
if (ex < xmin || ex + fudge >= xmax) {
e.classList.add("hide");
}
else {
zoom_child(e, xmin, ratio);
update_text(e);
}
}
}
search();
}
function unzoom(dont_update_text) {
unzoombtn.classList.add("hide");
var el = document.getElementById("frames").children;
for(var i = 0; i < el.length; i++) {
el[i].classList.remove("parent");
el[i].classList.remove("hide");
zoom_reset(el[i]);
if(!dont_update_text) update_text(el[i]);
}
search();
}
function clearzoom() {
unzoom();
// remove zoom state
var params = get_params();
if (params.x) delete params.x;
if (params.y) delete params.y;
history.replaceState(null, null, parse_params(params));
}
// search
function toggle_ignorecase() {
ignorecase = !ignorecase;
if (ignorecase) {
ignorecaseBtn.classList.add("show");
} else {
ignorecaseBtn.classList.remove("show");
}
reset_search();
search();
}
function reset_search() {
var el = document.querySelectorAll("#frames rect");
for (var i = 0; i < el.length; i++) {
orig_load(el[i], "fill")
}
var params = get_params();
delete params.s;
history.replaceState(null, null, parse_params(params));
}
function search_prompt() {
if (!searching) {
var term = prompt("Enter a search term (regexp " +
"allowed, eg: ^ext4_)"
+ (ignorecase ? ", ignoring case" : "")
+ "\nPress Ctrl-i to toggle case sensitivity", "");
if (term != null) search(term);
} else {
reset_search();
searching = 0;
currentSearchTerm = null;
searchbtn.classList.remove("show");
searchbtn.firstChild.nodeValue = "Search"
matchedtxt.classList.add("hide");
matchedtxt.firstChild.nodeValue = ""
}
}
function search(term) {
if (term) currentSearchTerm = term;
var re = new RegExp(currentSearchTerm, ignorecase ? 'i' : '');
var el = document.getElementById("frames").children;
var matches = new Object();
var maxwidth = 0;
for (var i = 0; i < el.length; i++) {
var e = el[i];
var func = g_to_func(e);
var rect = find_child(e, "rect");
if (func == null || rect == null)
continue;
// Save max width. Only works as we have a root frame
var w = parseFloat(rect.attributes.width.value);
if (w > maxwidth)
maxwidth = w;
if (func.match(re)) {
// highlight
var x = parseFloat(rect.attributes.x.value);
orig_save(rect, "fill");
rect.attributes.fill.value = "rgb(230,0,230)";
// remember matches
if (matches[x] == undefined) {
matches[x] = w;
} else {
if (w > matches[x]) {
// overwrite with parent
matches[x] = w;
}
}
searching = 1;
}
}
if (!searching)
return;
var params = get_params();
params.s = currentSearchTerm;
history.replaceState(null, null, parse_params(params));
searchbtn.classList.add("show");
searchbtn.firstChild.nodeValue = "Reset Search";
// calculate percent matched, excluding vertical overlap
var count = 0;
var lastx = -1;
var lastw = 0;
var keys = Array();
for (k in matches) {
if (matches.hasOwnProperty(k))
keys.push(k);
}
// sort the matched frames by their x location
// ascending, then width descending
keys.sort(function(a, b){
return a - b;
});
// Step through frames saving only the biggest bottom-up frames
// thanks to the sort order. This relies on the tree property
// where children are always smaller than their parents.
var fudge = 0.0001; // JavaScript floating point
for (var k in keys) {
var x = parseFloat(keys[k]);
var w = matches[keys[k]];
if (x >= lastx + lastw - fudge) {
count += w;
lastx = x;
lastw = w;
}
}
// display matched percent
matchedtxt.classList.remove("hide");
var pct = 100 * count / maxwidth;
if (pct != 100) pct = pct.toFixed(1)
matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%";
}
]]>
</script>
<rect x="0.0" y="0" width="1200.0" height="854.0" fill="url(#background)" />
<text id="title" x="600.00" y="24" >Flame Graph</text>
<text id="details" x="10.00" y="837" > </text>
<text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text>
<text id="search" x="1090.00" y="24" >Search</text>
<text id="ignorecase" x="1174.00" y="24" >ic</text>
<text id="matched" x="1090.00" y="837" > </text>
<g id="frames">
<g >
<title>__rmqueue (322,728,072 samples, 0.04%)</title><rect x="126.4" y="101" width="0.5" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="129.40" y="111.5" ></text>
</g>
<g >
<title>enqueue_entity (295,591,692 samples, 0.04%)</title><rect x="1180.2" y="661" width="0.4" height="15.0" fill="rgb(218,62,15)" rx="2" ry="2" />
<text x="1183.22" y="671.5" ></text>
</g>
<g >
<title>cpu_startup_entry (18,163,022,403 samples, 2.20%)</title><rect x="1162.7" y="741" width="26.0" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="1165.66" y="751.5" >c..</text>
</g>
<g >
<title>scheduler_tick (118,516,889 samples, 0.01%)</title><rect x="709.3" y="389" width="0.1" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="712.27" y="399.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (773,495,997 samples, 0.09%)</title><rect x="319.2" y="405" width="1.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="322.24" y="415.5" ></text>
</g>
<g >
<title>tas (614,778,162 samples, 0.07%)</title><rect x="75.0" y="453" width="0.9" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="78.03" y="463.5" ></text>
</g>
<g >
<title>pgstat_count_io_op (72,274,524 samples, 0.01%)</title><rect x="621.1" y="421" width="0.2" height="15.0" fill="rgb(207,10,2)" rx="2" ry="2" />
<text x="624.15" y="431.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (95,669,608 samples, 0.01%)</title><rect x="54.1" y="341" width="0.2" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="57.11" y="351.5" ></text>
</g>
<g >
<title>__pte_alloc (1,508,822,768 samples, 0.18%)</title><rect x="386.4" y="245" width="2.1" height="15.0" fill="rgb(218,62,15)" rx="2" ry="2" />
<text x="389.38" y="255.5" ></text>
</g>
<g >
<title>list_del (631,365,829 samples, 0.08%)</title><rect x="573.4" y="101" width="0.9" height="15.0" fill="rgb(235,140,33)" rx="2" ry="2" />
<text x="576.42" y="111.5" ></text>
</g>
<g >
<title>dequeue_entity (423,622,158 samples, 0.05%)</title><rect x="593.1" y="213" width="0.6" height="15.0" fill="rgb(233,130,31)" rx="2" ry="2" />
<text x="596.10" y="223.5" ></text>
</g>
<g >
<title>heap_prune_record_dead_or_unused (2,386,563,437 samples, 0.29%)</title><rect x="935.6" y="501" width="3.4" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="938.56" y="511.5" ></text>
</g>
<g >
<title>lazy_scan_heap (22,604,331,570 samples, 2.74%)</title><rect x="11.0" y="693" width="32.3" height="15.0" fill="rgb(248,198,47)" rx="2" ry="2" />
<text x="13.98" y="703.5" >la..</text>
</g>
<g >
<title>ReleaseBuffer (155,476,154 samples, 0.02%)</title><rect x="133.0" y="517" width="0.2" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="136.00" y="527.5" ></text>
</g>
<g >
<title>LWLockAcquire (101,433,190 samples, 0.01%)</title><rect x="232.3" y="485" width="0.2" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="235.33" y="495.5" ></text>
</g>
<g >
<title>int_sqrt (184,053,246 samples, 0.02%)</title><rect x="1177.9" y="677" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="1180.89" y="687.5" ></text>
</g>
<g >
<title>x86_64_start_kernel (379,723,816 samples, 0.05%)</title><rect x="1189.5" y="757" width="0.5" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1192.46" y="767.5" ></text>
</g>
<g >
<title>pte_alloc_one (1,494,467,745 samples, 0.18%)</title><rect x="386.4" y="229" width="2.1" height="15.0" fill="rgb(252,217,51)" rx="2" ry="2" />
<text x="389.40" y="239.5" ></text>
</g>
<g >
<title>wb_writeback (115,471,936 samples, 0.01%)</title><rect x="10.1" y="693" width="0.2" height="15.0" fill="rgb(222,80,19)" rx="2" ry="2" />
<text x="13.10" y="703.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (123,131,901 samples, 0.01%)</title><rect x="257.7" y="421" width="0.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="260.69" y="431.5" ></text>
</g>
<g >
<title>__radix_tree_insert (4,428,321,008 samples, 0.54%)</title><rect x="456.2" y="165" width="6.4" height="15.0" fill="rgb(235,140,33)" rx="2" ry="2" />
<text x="459.24" y="175.5" ></text>
</g>
<g >
<title>do_softirq (535,134,408 samples, 0.06%)</title><rect x="1164.6" y="645" width="0.7" height="15.0" fill="rgb(223,87,20)" rx="2" ry="2" />
<text x="1167.57" y="655.5" ></text>
</g>
<g >
<title>do_generic_file_read.constprop.52 (169,978,536,003 samples, 20.62%)</title><rect x="343.5" y="325" width="243.4" height="15.0" fill="rgb(205,4,1)" rx="2" ry="2" />
<text x="346.51" y="335.5" >do_generic_file_read.constprop.52</text>
</g>
<g >
<title>BufferDescriptorGetContentLock (145,479,215 samples, 0.02%)</title><rect x="1089.3" y="469" width="0.2" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="1092.31" y="479.5" ></text>
</g>
<g >
<title>ItemPointerSet (2,156,126,831 samples, 0.26%)</title><rect x="627.1" y="533" width="3.1" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="630.10" y="543.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (2,515,838,353 samples, 0.31%)</title><rect x="705.6" y="517" width="3.6" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="708.59" y="527.5" ></text>
</g>
<g >
<title>ss_report_location (266,921,260 samples, 0.03%)</title><rect x="607.5" y="517" width="0.4" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="610.51" y="527.5" ></text>
</g>
<g >
<title>xfs_file_aio_write (701,647,072 samples, 0.09%)</title><rect x="50.5" y="149" width="1.0" height="15.0" fill="rgb(251,211,50)" rx="2" ry="2" />
<text x="53.47" y="159.5" ></text>
</g>
<g >
<title>LockBufHdr (166,567,596 samples, 0.02%)</title><rect x="1118.6" y="453" width="0.2" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="1121.57" y="463.5" ></text>
</g>
<g >
<title>WaitReadBuffers (40,272,838,991 samples, 4.89%)</title><rect x="74.7" y="501" width="57.6" height="15.0" fill="rgb(210,26,6)" rx="2" ry="2" />
<text x="77.68" y="511.5" >WaitRe..</text>
</g>
<g >
<title>__do_fault.isra.61 (79,157,688 samples, 0.01%)</title><rect x="75.5" y="357" width="0.1" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="78.50" y="367.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (105,887,461 samples, 0.01%)</title><rect x="203.0" y="389" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="206.01" y="399.5" ></text>
</g>
<g >
<title>security_file_permission (1,027,140,287 samples, 0.12%)</title><rect x="597.9" y="373" width="1.5" height="15.0" fill="rgb(225,96,23)" rx="2" ry="2" />
<text x="600.88" y="383.5" ></text>
</g>
<g >
<title>system_call_fastpath (707,254,508 samples, 0.09%)</title><rect x="50.5" y="213" width="1.0" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="53.47" y="223.5" ></text>
</g>
<g >
<title>shmem_add_to_page_cache.isra.26 (81,789,067,321 samples, 9.92%)</title><rect x="452.5" y="181" width="117.1" height="15.0" fill="rgb(250,207,49)" rx="2" ry="2" />
<text x="455.52" y="191.5" >shmem_add_to_p..</text>
</g>
<g >
<title>apic_timer_interrupt (137,726,205 samples, 0.02%)</title><rect x="853.8" y="469" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="856.81" y="479.5" ></text>
</g>
<g >
<title>compactify_tuples (105,072,182 samples, 0.01%)</title><rect x="854.3" y="501" width="0.2" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="857.34" y="511.5" ></text>
</g>
<g >
<title>schedule (596,433,624 samples, 0.07%)</title><rect x="16.3" y="277" width="0.9" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="19.34" y="287.5" ></text>
</g>
<g >
<title>lapic_next_deadline (194,545,753 samples, 0.02%)</title><rect x="1185.4" y="597" width="0.3" height="15.0" fill="rgb(222,82,19)" rx="2" ry="2" />
<text x="1188.38" y="607.5" ></text>
</g>
<g >
<title>_raw_qspin_lock_irq (113,817,551 samples, 0.01%)</title><rect x="130.2" y="261" width="0.2" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="133.19" y="271.5" ></text>
</g>
<g >
<title>PinBufferForBlock (162,226,846 samples, 0.02%)</title><rect x="22.8" y="533" width="0.2" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="25.79" y="543.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (2,334,526,026 samples, 0.28%)</title><rect x="1163.7" y="693" width="3.4" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1166.73" y="703.5" ></text>
</g>
<g >
<title>xfs_vm_writepages (115,471,936 samples, 0.01%)</title><rect x="10.1" y="613" width="0.2" height="15.0" fill="rgb(246,191,45)" rx="2" ry="2" />
<text x="13.10" y="623.5" ></text>
</g>
<g >
<title>LWLockWakeup (72,140,377 samples, 0.01%)</title><rect x="321.5" y="453" width="0.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="324.53" y="463.5" ></text>
</g>
<g >
<title>TransactionIdFollows (1,453,985,001 samples, 0.18%)</title><rect x="930.7" y="501" width="2.1" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="933.74" y="511.5" ></text>
</g>
<g >
<title>PageGetMaxOffsetNumber (135,737,711 samples, 0.02%)</title><rect x="864.6" y="485" width="0.2" height="15.0" fill="rgb(234,133,32)" rx="2" ry="2" />
<text x="867.59" y="495.5" ></text>
</g>
<g >
<title>shmem_recalc_inode (1,343,862,804 samples, 0.16%)</title><rect x="576.0" y="181" width="1.9" height="15.0" fill="rgb(214,42,10)" rx="2" ry="2" />
<text x="578.99" y="191.5" ></text>
</g>
<g >
<title>InitializeOneGUCOption (71,976,486 samples, 0.01%)</title><rect x="55.0" y="597" width="0.1" height="15.0" fill="rgb(229,111,26)" rx="2" ry="2" />
<text x="58.01" y="607.5" ></text>
</g>
<g >
<title>LWLockRelease (967,891,315 samples, 0.12%)</title><rect x="1141.4" y="501" width="1.4" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="1144.44" y="511.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (104,243,313 samples, 0.01%)</title><rect x="23.5" y="469" width="0.2" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="26.54" y="479.5" ></text>
</g>
<g >
<title>LWLockWakeup (528,609,363 samples, 0.06%)</title><rect x="1141.8" y="485" width="0.7" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="1144.77" y="495.5" ></text>
</g>
<g >
<title>ExecVacuum (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="549" width="4.4" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="53.20" y="559.5" ></text>
</g>
<g >
<title>ConditionalLockBufferForCleanup (226,808,609 samples, 0.03%)</title><rect x="55.4" y="533" width="0.3" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="58.41" y="543.5" ></text>
</g>
<g >
<title>copy_user_enhanced_fast_string (575,488,434 samples, 0.07%)</title><rect x="21.4" y="405" width="0.8" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="24.41" y="415.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (1,168,021,112 samples, 0.14%)</title><rect x="144.3" y="485" width="1.6" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="147.26" y="495.5" ></text>
</g>
<g >
<title>call_softirq (203,950,887 samples, 0.02%)</title><rect x="1189.6" y="565" width="0.3" height="15.0" fill="rgb(225,94,22)" rx="2" ry="2" />
<text x="1192.63" y="575.5" ></text>
</g>
<g >
<title>do_softirq (203,950,887 samples, 0.02%)</title><rect x="1189.6" y="581" width="0.3" height="15.0" fill="rgb(223,87,20)" rx="2" ry="2" />
<text x="1192.63" y="591.5" ></text>
</g>
<g >
<title>hrtimer_start_range_ns (801,244,050 samples, 0.10%)</title><rect x="1185.0" y="661" width="1.2" height="15.0" fill="rgb(244,179,42)" rx="2" ry="2" />
<text x="1188.03" y="671.5" ></text>
</g>
<g >
<title>do_nanosleep (82,563,287 samples, 0.01%)</title><rect x="71.8" y="309" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="74.85" y="319.5" ></text>
</g>
<g >
<title>_raw_qspin_lock_irq (417,446,285 samples, 0.05%)</title><rect x="591.7" y="277" width="0.6" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="594.69" y="287.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (73,671,132 samples, 0.01%)</title><rect x="618.4" y="389" width="0.1" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="621.39" y="399.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (486,036,641 samples, 0.06%)</title><rect x="256.3" y="261" width="0.7" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="259.29" y="271.5" ></text>
</g>
<g >
<title>FileReadV (186,288,969,029 samples, 22.60%)</title><rect x="333.8" y="469" width="266.7" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="336.81" y="479.5" >FileReadV</text>
</g>
<g >
<title>__audit_syscall_entry (70,480,025 samples, 0.01%)</title><rect x="309.9" y="357" width="0.1" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="312.87" y="367.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (823,765,745 samples, 0.10%)</title><rect x="203.9" y="469" width="1.1" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="206.86" y="479.5" ></text>
</g>
<g >
<title>__libc_start_main (128,045,675,684 samples, 15.54%)</title><rect x="50.2" y="773" width="183.3" height="15.0" fill="rgb(236,142,34)" rx="2" ry="2" />
<text x="53.19" y="783.5" >__libc_start_main</text>
</g>
<g >
<title>hash_search_with_hash_value (318,195,656 samples, 0.04%)</title><rect x="615.9" y="373" width="0.5" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="618.92" y="383.5" ></text>
</g>
<g >
<title>find_next_bit (73,626,817 samples, 0.01%)</title><rect x="311.2" y="213" width="0.1" height="15.0" fill="rgb(244,179,43)" rx="2" ry="2" />
<text x="314.21" y="223.5" ></text>
</g>
<g >
<title>futex_wake (280,422,222 samples, 0.03%)</title><rect x="1142.1" y="405" width="0.4" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="1145.10" y="415.5" ></text>
</g>
<g >
<title>ConditionVariableBroadcast (2,698,740,417 samples, 0.33%)</title><rect x="325.6" y="485" width="3.9" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="328.62" y="495.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (369,814,789 samples, 0.04%)</title><rect x="784.3" y="501" width="0.5" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="787.30" y="511.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (135,558,230 samples, 0.02%)</title><rect x="132.6" y="437" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="135.57" y="447.5" ></text>
</g>
<g >
<title>update_process_times (105,887,461 samples, 0.01%)</title><rect x="203.0" y="341" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="206.01" y="351.5" ></text>
</g>
<g >
<title>StartBufferIO (492,043,252 samples, 0.06%)</title><rect x="330.9" y="485" width="0.7" height="15.0" fill="rgb(244,183,43)" rx="2" ry="2" />
<text x="333.90" y="495.5" ></text>
</g>
<g >
<title>xfs_vn_update_time (100,541,237 samples, 0.01%)</title><rect x="50.9" y="69" width="0.1" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="53.88" y="79.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (159,855,016 samples, 0.02%)</title><rect x="349.7" y="309" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="352.69" y="319.5" ></text>
</g>
<g >
<title>heap_page_prune_and_freeze (1,488,034,874 samples, 0.18%)</title><rect x="52.4" y="405" width="2.2" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="55.43" y="415.5" ></text>
</g>
<g >
<title>xfs_trans_commit (112,688,247 samples, 0.01%)</title><rect x="586.7" y="261" width="0.1" height="15.0" fill="rgb(250,210,50)" rx="2" ry="2" />
<text x="589.66" y="271.5" ></text>
</g>
<g >
<title>maybe_start_bgworkers (637,265,816,681 samples, 77.32%)</title><rect x="233.6" y="709" width="912.4" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="236.63" y="719.5" >maybe_start_bgworkers</text>
</g>
<g >
<title>GetPrivateRefCountEntry (89,716,643 samples, 0.01%)</title><rect x="1138.2" y="485" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1141.19" y="495.5" ></text>
</g>
<g >
<title>sys_pread64 (37,575,822,762 samples, 4.56%)</title><rect x="78.4" y="405" width="53.8" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="81.38" y="415.5" >sys_p..</text>
</g>
<g >
<title>visibilitymap_pin (797,848,131 samples, 0.10%)</title><rect x="1144.3" y="549" width="1.1" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="1147.26" y="559.5" ></text>
</g>
<g >
<title>heap_prepare_freeze_tuple (1,685,453,234 samples, 0.20%)</title><rect x="37.4" y="581" width="2.4" height="15.0" fill="rgb(227,101,24)" rx="2" ry="2" />
<text x="40.38" y="591.5" ></text>
</g>
<g >
<title>table_block_parallelscan_nextpage (1,394,665,687 samples, 0.17%)</title><rect x="607.9" y="517" width="2.0" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="610.89" y="527.5" ></text>
</g>
<g >
<title>BufferGetPage (291,133,286 samples, 0.04%)</title><rect x="1089.5" y="469" width="0.4" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="1092.52" y="479.5" ></text>
</g>
<g >
<title>PinBuffer (836,501,856 samples, 0.10%)</title><rect x="619.1" y="389" width="1.2" height="15.0" fill="rgb(219,64,15)" rx="2" ry="2" />
<text x="622.08" y="399.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (131,841,886 samples, 0.02%)</title><rect x="682.4" y="501" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="685.35" y="511.5" ></text>
</g>
<g >
<title>sysret_check (295,261,045 samples, 0.04%)</title><rect x="77.8" y="421" width="0.5" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="80.83" y="431.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (147,600,825 samples, 0.02%)</title><rect x="38.5" y="565" width="0.2" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="41.49" y="575.5" ></text>
</g>
<g >
<title>system_call_fastpath (10,517,164,883 samples, 1.28%)</title><rect x="1146.0" y="773" width="15.0" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="1148.99" y="783.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (70,484,646 samples, 0.01%)</title><rect x="1058.9" y="405" width="0.1" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="1061.89" y="415.5" ></text>
</g>
<g >
<title>get_user_pages_fast (128,367,624 samples, 0.02%)</title><rect x="49.0" y="645" width="0.2" height="15.0" fill="rgb(229,111,26)" rx="2" ry="2" />
<text x="52.02" y="655.5" ></text>
</g>
<g >
<title>kthread (115,471,936 samples, 0.01%)</title><rect x="10.1" y="757" width="0.2" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="13.10" y="767.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (254,280,701 samples, 0.03%)</title><rect x="217.9" y="469" width="0.4" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="220.90" y="479.5" ></text>
</g>
<g >
<title>BlockIdSet (2,378,065,078 samples, 0.29%)</title><rect x="766.0" y="501" width="3.4" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="769.01" y="511.5" ></text>
</g>
<g >
<title>mem_cgroup_charge_common (1,009,313,280 samples, 0.12%)</title><rect x="97.1" y="149" width="1.5" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="100.12" y="159.5" ></text>
</g>
<g >
<title>xfs_iunlock (344,246,781 samples, 0.04%)</title><rect x="131.3" y="325" width="0.5" height="15.0" fill="rgb(232,127,30)" rx="2" ry="2" />
<text x="134.33" y="335.5" ></text>
</g>
<g >
<title>pgstat_tracks_io_op (234,846,491 samples, 0.03%)</title><rect x="332.4" y="469" width="0.3" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="335.39" y="479.5" ></text>
</g>
<g >
<title>mdreadv (157,402,898 samples, 0.02%)</title><rect x="51.7" y="357" width="0.2" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="54.67" y="367.5" ></text>
</g>
<g >
<title>touch_atime (547,498,164 samples, 0.07%)</title><rect x="586.1" y="309" width="0.8" height="15.0" fill="rgb(205,2,0)" rx="2" ry="2" />
<text x="589.08" y="319.5" ></text>
</g>
<g >
<title>iomap_apply (270,636,716 samples, 0.03%)</title><rect x="10.4" y="533" width="0.4" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="13.44" y="543.5" ></text>
</g>
<g >
<title>xfs_log_commit_cil (355,451,941 samples, 0.04%)</title><rect x="15.1" y="245" width="0.5" height="15.0" fill="rgb(207,11,2)" rx="2" ry="2" />
<text x="18.11" y="255.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (86,182,567 samples, 0.01%)</title><rect x="96.9" y="117" width="0.1" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="99.88" y="127.5" ></text>
</g>
<g >
<title>xfs_trans_alloc (102,176,742 samples, 0.01%)</title><rect x="14.9" y="277" width="0.2" height="15.0" fill="rgb(214,45,10)" rx="2" ry="2" />
<text x="17.91" y="287.5" ></text>
</g>
<g >
<title>__inc_zone_page_state (243,537,605 samples, 0.03%)</title><rect x="455.8" y="165" width="0.3" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="458.78" y="175.5" ></text>
</g>
<g >
<title>mark_page_accessed (2,072,833,213 samples, 0.25%)</title><rect x="1151.6" y="629" width="2.9" height="15.0" fill="rgb(217,57,13)" rx="2" ry="2" />
<text x="1154.57" y="639.5" ></text>
</g>
<g >
<title>__find_get_page (3,924,892,558 samples, 0.48%)</title><rect x="344.0" y="309" width="5.6" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="347.03" y="319.5" ></text>
</g>
<g >
<title>BufferIsValid (980,288,190 samples, 0.12%)</title><rect x="1107.9" y="421" width="1.4" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1110.92" y="431.5" ></text>
</g>
<g >
<title>do_group_exit (10,517,164,883 samples, 1.28%)</title><rect x="1146.0" y="741" width="15.0" height="15.0" fill="rgb(219,67,16)" rx="2" ry="2" />
<text x="1148.99" y="751.5" ></text>
</g>
<g >
<title>PinBufferForBlock (6,500,970,346 samples, 0.79%)</title><rect x="11.2" y="581" width="9.3" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="14.17" y="591.5" ></text>
</g>
<g >
<title>PageGetItemId (163,735,936 samples, 0.02%)</title><rect x="32.5" y="581" width="0.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="35.53" y="591.5" ></text>
</g>
<g >
<title>vm_readbuf (95,273,516 samples, 0.01%)</title><rect x="621.6" y="517" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="624.56" y="527.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (206,275,440 samples, 0.03%)</title><rect x="709.2" y="517" width="0.3" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="712.19" y="527.5" ></text>
</g>
<g >
<title>do_page_fault (143,314,527,755 samples, 17.39%)</title><rect x="380.7" y="293" width="205.1" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="383.67" y="303.5" >do_page_fault</text>
</g>
<g >
<title>__audit_syscall_exit (735,968,673 samples, 0.09%)</title><rect x="335.9" y="421" width="1.1" height="15.0" fill="rgb(218,62,14)" rx="2" ry="2" />
<text x="338.90" y="431.5" ></text>
</g>
<g >
<title>ReleaseBuffer (424,736,470 samples, 0.05%)</title><rect x="606.9" y="517" width="0.6" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="609.89" y="527.5" ></text>
</g>
<g >
<title>__find_lock_page (89,010,712 samples, 0.01%)</title><rect x="13.9" y="261" width="0.1" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="16.87" y="271.5" ></text>
</g>
<g >
<title>select_task_rq_fair (170,990,994 samples, 0.02%)</title><rect x="1165.9" y="565" width="0.2" height="15.0" fill="rgb(211,29,7)" rx="2" ry="2" />
<text x="1168.86" y="575.5" ></text>
</g>
<g >
<title>BufferIsValid (1,080,172,541 samples, 0.13%)</title><rect x="1109.9" y="453" width="1.5" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1112.86" y="463.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (636,857,395 samples, 0.08%)</title><rect x="245.3" y="229" width="0.9" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="248.32" y="239.5" ></text>
</g>
<g >
<title>file_update_time (1,997,989,982 samples, 0.24%)</title><rect x="579.3" y="229" width="2.9" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="582.35" y="239.5" ></text>
</g>
<g >
<title>visibilitymap_get_status (285,548,717 samples, 0.03%)</title><rect x="23.3" y="613" width="0.4" height="15.0" fill="rgb(217,59,14)" rx="2" ry="2" />
<text x="26.34" y="623.5" ></text>
</g>
<g >
<title>call_rwsem_down_write_failed (843,114,110 samples, 0.10%)</title><rect x="16.0" y="309" width="1.2" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="18.99" y="319.5" ></text>
</g>
<g >
<title>LWLockConditionalAcquire (408,957,962 samples, 0.05%)</title><rect x="235.2" y="517" width="0.6" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="238.24" y="527.5" ></text>
</g>
<g >
<title>BufferIsValid (147,989,814 samples, 0.02%)</title><rect x="769.8" y="501" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="772.76" y="511.5" ></text>
</g>
<g >
<title>PageGetItem (283,395,127 samples, 0.03%)</title><rect x="29.1" y="613" width="0.4" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="32.12" y="623.5" ></text>
</g>
<g >
<title>perform_spin_delay (100,806,989 samples, 0.01%)</title><rect x="247.7" y="389" width="0.2" height="15.0" fill="rgb(247,196,46)" rx="2" ry="2" />
<text x="250.71" y="399.5" ></text>
</g>
<g >
<title>pg_preadv (188,855,651 samples, 0.02%)</title><rect x="600.0" y="453" width="0.3" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="602.98" y="463.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (200,576,965 samples, 0.02%)</title><rect x="1159.0" y="533" width="0.3" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1162.04" y="543.5" ></text>
</g>
<g >
<title>rw_verify_area (194,418,688 samples, 0.02%)</title><rect x="131.9" y="373" width="0.2" height="15.0" fill="rgb(218,64,15)" rx="2" ry="2" />
<text x="134.86" y="383.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (122,104,010 samples, 0.01%)</title><rect x="64.1" y="229" width="0.2" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="67.12" y="239.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (127,019,827 samples, 0.02%)</title><rect x="44.0" y="725" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="46.96" y="735.5" ></text>
</g>
<g >
<title>StartReadBuffer (6,511,909,804 samples, 0.79%)</title><rect x="11.2" y="613" width="9.3" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="14.15" y="623.5" ></text>
</g>
<g >
<title>FlushBuffer (5,304,482,542 samples, 0.64%)</title><rect x="11.9" y="533" width="7.6" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" />
<text x="14.93" y="543.5" ></text>
</g>
<g >
<title>tick_sched_timer (75,893,920 samples, 0.01%)</title><rect x="173.3" y="373" width="0.2" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="176.34" y="383.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (263,118,266 samples, 0.03%)</title><rect x="933.4" y="501" width="0.4" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="936.44" y="511.5" ></text>
</g>
<g >
<title>HeapTupleHeaderAdvanceConflictHorizon (6,228,913,769 samples, 0.76%)</title><rect x="907.3" y="501" width="8.9" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="910.26" y="511.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (856,293,468 samples, 0.10%)</title><rect x="666.3" y="517" width="1.2" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="669.32" y="527.5" ></text>
</g>
<g >
<title>hrtimer_cancel (115,533,294 samples, 0.01%)</title><rect x="1187.4" y="693" width="0.2" height="15.0" fill="rgb(254,228,54)" rx="2" ry="2" />
<text x="1190.43" y="703.5" ></text>
</g>
<g >
<title>compactify_tuples (3,494,990,954 samples, 0.42%)</title><rect x="174.1" y="469" width="5.0" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="177.11" y="479.5" ></text>
</g>
<g >
<title>fsm_readbuf (279,923,013 samples, 0.03%)</title><rect x="22.6" y="613" width="0.4" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="25.62" y="623.5" ></text>
</g>
<g >
<title>radix_tree_descend (1,276,223,732 samples, 0.15%)</title><rect x="458.6" y="133" width="1.8" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="461.62" y="143.5" ></text>
</g>
<g >
<title>heap_vacuum_rel (22,616,657,143 samples, 2.74%)</title><rect x="11.0" y="709" width="32.4" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="13.98" y="719.5" >he..</text>
</g>
<g >
<title>vfs_write (313,556,868 samples, 0.04%)</title><rect x="10.4" y="613" width="0.5" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="13.42" y="623.5" ></text>
</g>
<g >
<title>shmem_fault (595,808,066 samples, 0.07%)</title><rect x="327.8" y="357" width="0.8" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="330.77" y="367.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (78,231,985 samples, 0.01%)</title><rect x="28.6" y="613" width="0.2" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="31.64" y="623.5" ></text>
</g>
<g >
<title>down_write (82,091,217 samples, 0.01%)</title><rect x="586.5" y="245" width="0.1" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="589.51" y="255.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (916,817,104 samples, 0.11%)</title><rect x="216.1" y="437" width="1.3" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="219.08" y="447.5" ></text>
</g>
<g >
<title>__do_page_fault (2,320,058,077 samples, 0.28%)</title><rect x="243.3" y="373" width="3.4" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="246.35" y="383.5" ></text>
</g>
<g >
<title>TidStoreMemoryUsage (2,275,949,099 samples, 0.28%)</title><rect x="601.4" y="549" width="3.2" height="15.0" fill="rgb(229,113,27)" rx="2" ry="2" />
<text x="604.37" y="559.5" ></text>
</g>
<g >
<title>InvalidateVictimBuffer (344,974,810 samples, 0.04%)</title><rect x="19.5" y="533" width="0.5" height="15.0" fill="rgb(248,198,47)" rx="2" ry="2" />
<text x="22.53" y="543.5" ></text>
</g>
<g >
<title>ItemPointerIsValid (1,580,431,853 samples, 0.19%)</title><rect x="1125.9" y="501" width="2.3" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1128.95" y="511.5" ></text>
</g>
<g >
<title>[unknown] (858,525,970 samples, 0.10%)</title><rect x="46.9" y="757" width="1.3" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="49.93" y="767.5" ></text>
</g>
<g >
<title>__do_page_fault (30,497,477,926 samples, 3.70%)</title><rect x="85.4" y="261" width="43.7" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="88.42" y="271.5" >__do..</text>
</g>
<g >
<title>ReadBufferExtended (1,340,860,850 samples, 0.16%)</title><rect x="133.8" y="469" width="1.9" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="136.78" y="479.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (185,930,761 samples, 0.02%)</title><rect x="1138.1" y="501" width="0.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1141.06" y="511.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (77,288,274 samples, 0.01%)</title><rect x="621.3" y="453" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="624.34" y="463.5" ></text>
</g>
<g >
<title>__perf_event_task_sched_in (110,725,430 samples, 0.01%)</title><rect x="1181.9" y="677" width="0.2" height="15.0" fill="rgb(231,121,29)" rx="2" ry="2" />
<text x="1184.90" y="687.5" ></text>
</g>
<g >
<title>process_one_work (115,471,936 samples, 0.01%)</title><rect x="10.1" y="725" width="0.2" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="13.10" y="735.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (761,463,094 samples, 0.09%)</title><rect x="38.7" y="565" width="1.1" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="41.70" y="575.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (196,157,364 samples, 0.02%)</title><rect x="328.2" y="277" width="0.3" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="331.22" y="287.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (137,726,205 samples, 0.02%)</title><rect x="853.8" y="453" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="856.81" y="463.5" ></text>
</g>
<g >
<title>parallel_vacuum_process_table (637,236,586,138 samples, 77.31%)</title><rect x="233.7" y="613" width="912.3" height="15.0" fill="rgb(205,3,0)" rx="2" ry="2" />
<text x="236.67" y="623.5" >parallel_vacuum_process_table</text>
</g>
<g >
<title>BufTableLookup (205,061,762 samples, 0.02%)</title><rect x="11.6" y="549" width="0.3" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="14.60" y="559.5" ></text>
</g>
<g >
<title>BufferIsValid (149,567,202 samples, 0.02%)</title><rect x="759.8" y="517" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="762.81" y="527.5" ></text>
</g>
<g >
<title>system_call_after_swapgs (297,624,001 samples, 0.04%)</title><rect x="43.5" y="661" width="0.4" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="46.50" y="671.5" ></text>
</g>
<g >
<title>update_process_times (132,283,144 samples, 0.02%)</title><rect x="709.3" y="405" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="712.26" y="415.5" ></text>
</g>
<g >
<title>hash_initial_lookup (779,209,400 samples, 0.09%)</title><rect x="253.8" y="421" width="1.1" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="256.83" y="431.5" ></text>
</g>
<g >
<title>fsnotify (273,688,962 samples, 0.03%)</title><rect x="598.2" y="357" width="0.4" height="15.0" fill="rgb(215,50,12)" rx="2" ry="2" />
<text x="601.17" y="367.5" ></text>
</g>
<g >
<title>list_del (263,101,215 samples, 0.03%)</title><rect x="126.5" y="85" width="0.4" height="15.0" fill="rgb(235,140,33)" rx="2" ry="2" />
<text x="129.49" y="95.5" ></text>
</g>
<g >
<title>TransactionIdFollows (483,413,334 samples, 0.06%)</title><rect x="203.2" y="469" width="0.7" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="206.17" y="479.5" ></text>
</g>
<g >
<title>radix_tree_descend (120,946,008 samples, 0.01%)</title><rect x="246.2" y="229" width="0.2" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="249.24" y="239.5" ></text>
</g>
<g >
<title>update_process_times (95,227,023 samples, 0.01%)</title><rect x="393.6" y="85" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="396.64" y="95.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (242,208,178 samples, 0.03%)</title><rect x="328.2" y="293" width="0.4" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="331.21" y="303.5" ></text>
</g>
<g >
<title>__mem_cgroup_try_charge (429,001,947 samples, 0.05%)</title><rect x="451.6" y="149" width="0.6" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="454.59" y="159.5" ></text>
</g>
<g >
<title>CheckBufferIsPinnedOnce (177,920,144 samples, 0.02%)</title><rect x="257.6" y="437" width="0.3" height="15.0" fill="rgb(244,183,43)" rx="2" ry="2" />
<text x="260.63" y="447.5" ></text>
</g>
<g >
<title>WaitReadBuffers (183,808,446 samples, 0.02%)</title><rect x="51.6" y="389" width="0.3" height="15.0" fill="rgb(210,26,6)" rx="2" ry="2" />
<text x="54.63" y="399.5" ></text>
</g>
<g >
<title>generic_file_aio_read (170,394,819,281 samples, 20.67%)</title><rect x="343.1" y="341" width="243.9" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="346.07" y="351.5" >generic_file_aio_read</text>
</g>
<g >
<title>smp_apic_timer_interrupt (131,841,886 samples, 0.02%)</title><rect x="682.4" y="485" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="685.35" y="495.5" ></text>
</g>
<g >
<title>[unknown] (4,331,051,912 samples, 0.53%)</title><rect x="43.9" y="773" width="6.2" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="46.93" y="783.5" ></text>
</g>
<g >
<title>ItemPointerSet (87,248,740 samples, 0.01%)</title><rect x="23.9" y="629" width="0.1" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="26.86" y="639.5" ></text>
</g>
<g >
<title>BufferIsValid (119,329,650 samples, 0.01%)</title><rect x="227.3" y="437" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="230.28" y="447.5" ></text>
</g>
<g >
<title>radix_tree_descend (367,628,238 samples, 0.04%)</title><rect x="348.8" y="261" width="0.6" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="351.83" y="271.5" ></text>
</g>
<g >
<title>tas (88,089,515 samples, 0.01%)</title><rect x="73.3" y="405" width="0.1" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="76.32" y="415.5" ></text>
</g>
<g >
<title>StartReadBuffer (398,349,091 samples, 0.05%)</title><rect x="43.4" y="773" width="0.5" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="46.36" y="783.5" ></text>
</g>
<g >
<title>idle_exit_fair (78,500,553 samples, 0.01%)</title><rect x="1182.5" y="661" width="0.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1185.46" y="671.5" ></text>
</g>
<g >
<title>radix_tree_descend (162,364,900 samples, 0.02%)</title><rect x="60.4" y="213" width="0.2" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="63.36" y="223.5" ></text>
</g>
<g >
<title>heap_parallel_vacuum_scan_worker (637,236,586,138 samples, 77.31%)</title><rect x="233.7" y="581" width="912.3" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="236.67" y="591.5" >heap_parallel_vacuum_scan_worker</text>
</g>
<g >
<title>shmem_fault (270,519,733 samples, 0.03%)</title><rect x="63.9" y="309" width="0.4" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="66.93" y="319.5" ></text>
</g>
<g >
<title>lazy_scan_prune (67,794,568,622 samples, 8.23%)</title><rect x="135.8" y="533" width="97.1" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="138.82" y="543.5" >lazy_scan_p..</text>
</g>
<g >
<title>__radix_tree_lookup (448,623,926 samples, 0.05%)</title><rect x="256.3" y="245" width="0.6" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="259.29" y="255.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (812,554,938 samples, 0.10%)</title><rect x="319.2" y="421" width="1.1" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="322.18" y="431.5" ></text>
</g>
<g >
<title>error_swapgs (82,319,755 samples, 0.01%)</title><rect x="49.6" y="709" width="0.1" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="52.58" y="719.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (2,701,879,035 samples, 0.33%)</title><rect x="440.9" y="149" width="3.8" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="443.88" y="159.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (197,255,195 samples, 0.02%)</title><rect x="146.4" y="501" width="0.3" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="149.40" y="511.5" ></text>
</g>
<g >
<title>xfs_file_aio_read (178,159,674,754 samples, 21.62%)</title><rect x="342.1" y="373" width="255.0" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="345.07" y="383.5" >xfs_file_aio_read</text>
</g>
<g >
<title>ItemPointerIsValid (1,407,275,130 samples, 0.17%)</title><rect x="678.9" y="485" width="2.0" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="681.89" y="495.5" ></text>
</g>
<g >
<title>TransactionIdFollows (121,998,621 samples, 0.01%)</title><rect x="35.2" y="597" width="0.2" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="38.18" y="607.5" ></text>
</g>
<g >
<title>postmaster_child_launch (637,265,816,681 samples, 77.32%)</title><rect x="233.6" y="677" width="912.4" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="236.63" y="687.5" >postmaster_child_launch</text>
</g>
<g >
<title>xfs_ilock (963,834,659 samples, 0.12%)</title><rect x="15.8" y="341" width="1.4" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="18.82" y="351.5" ></text>
</g>
<g >
<title>unmap_vmas (10,374,169,931 samples, 1.26%)</title><rect x="1146.1" y="677" width="14.9" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="1149.15" y="687.5" ></text>
</g>
<g >
<title>PortalRun (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="629" width="4.4" height="15.0" fill="rgb(223,85,20)" rx="2" ry="2" />
<text x="53.20" y="639.5" ></text>
</g>
<g >
<title>intel_idle (6,010,443,106 samples, 0.73%)</title><rect x="1167.2" y="677" width="8.6" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="1170.23" y="687.5" ></text>
</g>
<g >
<title>account_entity_dequeue (90,354,745 samples, 0.01%)</title><rect x="593.2" y="197" width="0.1" height="15.0" fill="rgb(231,120,28)" rx="2" ry="2" />
<text x="596.16" y="207.5" ></text>
</g>
<g >
<title>tick_sched_timer (74,985,907 samples, 0.01%)</title><rect x="621.3" y="389" width="0.1" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="624.34" y="399.5" ></text>
</g>
<g >
<title>TransactionIdDidCommit (474,727,945 samples, 0.06%)</title><rect x="1128.6" y="501" width="0.7" height="15.0" fill="rgb(216,51,12)" rx="2" ry="2" />
<text x="1131.58" y="511.5" ></text>
</g>
<g >
<title>BufferGetPage (127,844,494 samples, 0.02%)</title><rect x="610.6" y="485" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="613.60" y="495.5" ></text>
</g>
<g >
<title>smgrwritev (5,080,836,104 samples, 0.62%)</title><rect x="12.3" y="501" width="7.2" height="15.0" fill="rgb(217,56,13)" rx="2" ry="2" />
<text x="15.25" y="511.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (1,369,405,226 samples, 0.17%)</title><rect x="1136.4" y="517" width="1.9" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1139.39" y="527.5" ></text>
</g>
<g >
<title>vm_readbuf (7,777,835,010 samples, 0.94%)</title><rect x="610.4" y="501" width="11.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="613.43" y="511.5" ></text>
</g>
<g >
<title>ss_search (126,871,703 samples, 0.02%)</title><rect x="609.7" y="485" width="0.2" height="15.0" fill="rgb(244,181,43)" rx="2" ry="2" />
<text x="612.70" y="495.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (782,945,425 samples, 0.09%)</title><rect x="617.2" y="341" width="1.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="620.21" y="351.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (140,930,970 samples, 0.02%)</title><rect x="225.4" y="437" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="228.39" y="447.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (118,334,939 samples, 0.01%)</title><rect x="607.3" y="469" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="610.33" y="479.5" ></text>
</g>
<g >
<title>smgrreadv (187,449,915,647 samples, 22.74%)</title><rect x="332.8" y="501" width="268.3" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="335.78" y="511.5" >smgrreadv</text>
</g>
<g >
<title>ret_from_fork_nospec_end (115,471,936 samples, 0.01%)</title><rect x="10.1" y="773" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="13.10" y="783.5" ></text>
</g>
<g >
<title>__put_single_page (500,981,483 samples, 0.06%)</title><rect x="1164.6" y="533" width="0.7" height="15.0" fill="rgb(214,45,10)" rx="2" ry="2" />
<text x="1167.61" y="543.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (3,066,171,706 samples, 0.37%)</title><rect x="35.5" y="597" width="4.4" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="38.55" y="607.5" ></text>
</g>
<g >
<title>get_hash_entry (206,305,776 samples, 0.03%)</title><rect x="331.6" y="501" width="0.3" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="334.62" y="511.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (119,060,468 samples, 0.01%)</title><rect x="1058.9" y="437" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="1061.88" y="447.5" ></text>
</g>
<g >
<title>auditsys (394,905,399 samples, 0.05%)</title><rect x="335.1" y="437" width="0.6" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="338.13" y="447.5" ></text>
</g>
<g >
<title>tick_sched_timer (73,045,894 samples, 0.01%)</title><rect x="87.4" y="101" width="0.1" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="90.39" y="111.5" ></text>
</g>
<g >
<title>update_curr (220,741,522 samples, 0.03%)</title><rect x="593.4" y="197" width="0.3" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="596.38" y="207.5" ></text>
</g>
<g >
<title>vfs_write (707,254,508 samples, 0.09%)</title><rect x="50.5" y="181" width="1.0" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="53.47" y="191.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (433,353,918 samples, 0.05%)</title><rect x="932.8" y="501" width="0.6" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="935.82" y="511.5" ></text>
</g>
<g >
<title>ReleaseBuffer (636,129,423 samples, 0.08%)</title><rect x="605.2" y="533" width="0.9" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="608.24" y="543.5" ></text>
</g>
<g >
<title>PageGetItemId (485,394,818 samples, 0.06%)</title><rect x="178.4" y="453" width="0.7" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="181.37" y="463.5" ></text>
</g>
<g >
<title>heap_page_is_all_visible (1,796,979,870 samples, 0.22%)</title><rect x="24.1" y="629" width="2.6" height="15.0" fill="rgb(228,107,25)" rx="2" ry="2" />
<text x="27.11" y="639.5" ></text>
</g>
<g >
<title>pg_rotate_left32 (324,572,656 samples, 0.04%)</title><rect x="614.5" y="325" width="0.5" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="617.49" y="335.5" ></text>
</g>
<g >
<title>finish_task_switch (147,777,065 samples, 0.02%)</title><rect x="1181.9" y="693" width="0.2" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="1184.85" y="703.5" ></text>
</g>
<g >
<title>sched_ttwu_pending (1,564,184,133 samples, 0.19%)</title><rect x="1179.1" y="725" width="2.2" height="15.0" fill="rgb(223,85,20)" rx="2" ry="2" />
<text x="1182.08" y="735.5" ></text>
</g>
<g >
<title>LWLockWakeup (86,751,378 samples, 0.01%)</title><rect x="232.6" y="469" width="0.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="235.58" y="479.5" ></text>
</g>
<g >
<title>pagevec_lru_move_fn (1,054,133,464 samples, 0.13%)</title><rect x="439.3" y="165" width="1.5" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="442.26" y="175.5" ></text>
</g>
<g >
<title>TidStoreMemoryUsage (75,936,328 samples, 0.01%)</title><rect x="52.0" y="421" width="0.1" height="15.0" fill="rgb(229,113,27)" rx="2" ry="2" />
<text x="54.95" y="431.5" ></text>
</g>
<g >
<title>system_call_after_swapgs (109,404,767 samples, 0.01%)</title><rect x="48.0" y="725" width="0.2" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="51.01" y="735.5" ></text>
</g>
<g >
<title>enqueue_task_fair (498,688,020 samples, 0.06%)</title><rect x="1180.0" y="677" width="0.7" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="1182.96" y="687.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (150,595,284 samples, 0.02%)</title><rect x="709.2" y="453" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="712.23" y="463.5" ></text>
</g>
<g >
<title>UnpinBuffer (544,244,404 samples, 0.07%)</title><rect x="605.4" y="517" width="0.7" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="608.37" y="527.5" ></text>
</g>
<g >
<title>nohz_balance_enter_idle (246,102,363 samples, 0.03%)</title><rect x="1186.2" y="677" width="0.3" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="1189.18" y="687.5" ></text>
</g>
<g >
<title>__schedule (1,160,883,236 samples, 0.14%)</title><rect x="1181.3" y="709" width="1.7" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="1184.32" y="719.5" ></text>
</g>
<g >
<title>proclist_delete_offset (95,612,548 samples, 0.01%)</title><rect x="1141.0" y="469" width="0.1" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="1143.97" y="479.5" ></text>
</g>
<g >
<title>clear_page_c_e (2,160,487,749 samples, 0.26%)</title><rect x="87.5" y="181" width="3.1" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="90.50" y="191.5" ></text>
</g>
<g >
<title>futex_wait_setup (502,642,039 samples, 0.06%)</title><rect x="48.7" y="677" width="0.7" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="51.72" y="687.5" ></text>
</g>
<g >
<title>BufferIsValid (253,026,045 samples, 0.03%)</title><rect x="1138.6" y="517" width="0.3" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1141.58" y="527.5" ></text>
</g>
<g >
<title>write_cache_pages (115,471,936 samples, 0.01%)</title><rect x="10.1" y="597" width="0.2" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="13.10" y="607.5" ></text>
</g>
<g >
<title>schedule_preempt_disabled (1,270,692,993 samples, 0.15%)</title><rect x="1181.3" y="725" width="1.8" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="1184.32" y="735.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (211,882,877 samples, 0.03%)</title><rect x="828.8" y="485" width="0.3" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="831.81" y="495.5" ></text>
</g>
<g >
<title>TransactionIdPrecedesOrEquals (2,213,915,933 samples, 0.27%)</title><rect x="1055.7" y="453" width="3.2" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="1058.71" y="463.5" ></text>
</g>
<g >
<title>BufferAlloc (990,132,493 samples, 0.12%)</title><rect x="134.1" y="389" width="1.5" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="137.13" y="399.5" ></text>
</g>
<g >
<title>activate_page (1,720,881,200 samples, 0.21%)</title><rect x="1152.1" y="613" width="2.4" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="1155.08" y="623.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (665,890,937 samples, 0.08%)</title><rect x="1140.0" y="485" width="0.9" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="1142.97" y="495.5" ></text>
</g>
<g >
<title>path_put (221,951,443 samples, 0.03%)</title><rect x="336.6" y="405" width="0.3" height="15.0" fill="rgb(249,206,49)" rx="2" ry="2" />
<text x="339.58" y="415.5" ></text>
</g>
<g >
<title>page_fault (180,891,907 samples, 0.02%)</title><rect x="47.3" y="725" width="0.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="50.26" y="735.5" ></text>
</g>
<g >
<title>LockBuffer (337,629,607 samples, 0.04%)</title><rect x="604.8" y="533" width="0.4" height="15.0" fill="rgb(235,142,34)" rx="2" ry="2" />
<text x="607.75" y="543.5" ></text>
</g>
<g >
<title>put_page (114,960,458 samples, 0.01%)</title><rect x="585.8" y="309" width="0.2" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="588.85" y="319.5" ></text>
</g>
<g >
<title>ConditionalLockBufferForCleanup (1,255,282,863 samples, 0.15%)</title><rect x="234.8" y="549" width="1.8" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="237.83" y="559.5" ></text>
</g>
<g >
<title>vacuum (22,616,657,143 samples, 2.74%)</title><rect x="11.0" y="757" width="32.4" height="15.0" fill="rgb(219,66,15)" rx="2" ry="2" />
<text x="13.98" y="767.5" >va..</text>
</g>
<g >
<title>unlock_page (88,491,040 samples, 0.01%)</title><rect x="271.6" y="341" width="0.2" height="15.0" fill="rgb(220,69,16)" rx="2" ry="2" />
<text x="274.64" y="351.5" ></text>
</g>
<g >
<title>RestoreGUCState (110,594,713 samples, 0.01%)</title><rect x="55.0" y="613" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="58.01" y="623.5" ></text>
</g>
<g >
<title>heap_prune_chain (839,849,499 samples, 0.10%)</title><rect x="53.1" y="389" width="1.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="56.08" y="399.5" ></text>
</g>
<g >
<title>scheduler_tick (91,702,468 samples, 0.01%)</title><rect x="828.9" y="357" width="0.2" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="831.92" y="367.5" ></text>
</g>
<g >
<title>tick_sched_timer (114,070,044 samples, 0.01%)</title><rect x="393.6" y="117" width="0.2" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="396.61" y="127.5" ></text>
</g>
<g >
<title>heap_prepare_freeze_tuple (311,664,299 samples, 0.04%)</title><rect x="53.8" y="357" width="0.5" height="15.0" fill="rgb(227,101,24)" rx="2" ry="2" />
<text x="56.81" y="367.5" ></text>
</g>
<g >
<title>InitPostgres (209,130,176 samples, 0.03%)</title><rect x="54.7" y="597" width="0.3" height="15.0" fill="rgb(230,117,28)" rx="2" ry="2" />
<text x="57.69" y="607.5" ></text>
</g>
<g >
<title>page_waitqueue (150,069,459 samples, 0.02%)</title><rect x="585.2" y="213" width="0.2" height="15.0" fill="rgb(212,34,8)" rx="2" ry="2" />
<text x="588.16" y="223.5" ></text>
</g>
<g >
<title>sysret_check (125,321,012 samples, 0.02%)</title><rect x="310.3" y="373" width="0.1" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="313.27" y="383.5" ></text>
</g>
<g >
<title>FileReadV (2,044,560,622 samples, 0.25%)</title><rect x="44.0" y="757" width="2.9" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="46.96" y="767.5" ></text>
</g>
<g >
<title>mem_cgroup_charge_statistics.isra.20 (136,177,421 samples, 0.02%)</title><rect x="451.0" y="133" width="0.2" height="15.0" fill="rgb(220,73,17)" rx="2" ry="2" />
<text x="453.98" y="143.5" ></text>
</g>
<g >
<title>LWLockHeldByMe (124,797,739 samples, 0.02%)</title><rect x="1102.6" y="469" width="0.2" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="1105.58" y="479.5" ></text>
</g>
<g >
<title>__rmqueue (71,400,847 samples, 0.01%)</title><rect x="388.1" y="165" width="0.1" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="391.10" y="175.5" ></text>
</g>
<g >
<title>htsv_get_valid_status (237,118,096 samples, 0.03%)</title><rect x="218.3" y="485" width="0.3" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="221.30" y="495.5" ></text>
</g>
<g >
<title>[perf] (342,231,209 samples, 0.04%)</title><rect x="10.4" y="677" width="0.5" height="15.0" fill="rgb(253,223,53)" rx="2" ry="2" />
<text x="13.38" y="687.5" ></text>
</g>
<g >
<title>hash_initial_lookup (106,186,922 samples, 0.01%)</title><rect x="59.1" y="405" width="0.1" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="62.08" y="415.5" ></text>
</g>
<g >
<title>page_fault (308,107,392 samples, 0.04%)</title><rect x="75.4" y="437" width="0.4" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="78.40" y="447.5" ></text>
</g>
<g >
<title>BufferGetBlock (119,212,653 samples, 0.01%)</title><rect x="759.6" y="501" width="0.2" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="762.62" y="511.5" ></text>
</g>
<g >
<title>grab_cache_page_write_begin (99,402,375 samples, 0.01%)</title><rect x="13.9" y="277" width="0.1" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="16.86" y="287.5" ></text>
</g>
<g >
<title>__find_get_page (95,547,139 samples, 0.01%)</title><rect x="21.3" y="405" width="0.1" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="24.27" y="415.5" ></text>
</g>
<g >
<title>selinux_file_permission (551,113,782 samples, 0.07%)</title><rect x="598.6" y="357" width="0.8" height="15.0" fill="rgb(249,204,48)" rx="2" ry="2" />
<text x="601.56" y="367.5" ></text>
</g>
<g >
<title>trigger_load_balance (105,181,066 samples, 0.01%)</title><rect x="203.0" y="309" width="0.2" height="15.0" fill="rgb(228,108,26)" rx="2" ry="2" />
<text x="206.02" y="319.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (105,887,461 samples, 0.01%)</title><rect x="203.0" y="453" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="206.01" y="463.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (112,709,505 samples, 0.01%)</title><rect x="1086.9" y="437" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="1089.95" y="447.5" ></text>
</g>
<g >
<title>StartBufferIO (95,731,868 samples, 0.01%)</title><rect x="76.2" y="469" width="0.2" height="15.0" fill="rgb(244,183,43)" rx="2" ry="2" />
<text x="79.23" y="479.5" ></text>
</g>
<g >
<title>__zone_watermark_ok (97,915,835 samples, 0.01%)</title><rect x="125.7" y="117" width="0.2" height="15.0" fill="rgb(240,162,38)" rx="2" ry="2" />
<text x="128.74" y="127.5" ></text>
</g>
<g >
<title>BufferGetBlock (260,772,081 samples, 0.03%)</title><rect x="226.9" y="421" width="0.3" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="229.86" y="431.5" ></text>
</g>
<g >
<title>BlockIdSet (160,824,652 samples, 0.02%)</title><rect x="143.2" y="501" width="0.2" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="146.18" y="511.5" ></text>
</g>
<g >
<title>do_page_fault (100,582,489 samples, 0.01%)</title><rect x="11.3" y="485" width="0.2" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="14.32" y="495.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (85,945,729 samples, 0.01%)</title><rect x="225.3" y="437" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="228.26" y="447.5" ></text>
</g>
<g >
<title>heap_prune_record_dead_or_unused (87,338,614 samples, 0.01%)</title><rect x="35.4" y="597" width="0.1" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="38.42" y="607.5" ></text>
</g>
<g >
<title>do_futex (323,879,808 samples, 0.04%)</title><rect x="1142.1" y="421" width="0.4" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="1145.06" y="431.5" ></text>
</g>
<g >
<title>do_read_fault.isra.63 (2,332,012,274 samples, 0.28%)</title><rect x="268.4" y="357" width="3.4" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="271.43" y="367.5" ></text>
</g>
<g >
<title>LockBufHdr (783,287,595 samples, 0.10%)</title><rect x="321.6" y="453" width="1.2" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="324.64" y="463.5" ></text>
</g>
<g >
<title>BufferIsValid (353,833,527 samples, 0.04%)</title><rect x="1109.3" y="437" width="0.6" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1112.35" y="447.5" ></text>
</g>
<g >
<title>BufferDescriptorGetBuffer (467,733,068 samples, 0.06%)</title><rect x="619.2" y="373" width="0.6" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="622.16" y="383.5" ></text>
</g>
<g >
<title>__inc_zone_page_state (91,583,340 samples, 0.01%)</title><rect x="438.8" y="181" width="0.2" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="441.84" y="191.5" ></text>
</g>
<g >
<title>__mem_cgroup_try_charge (156,252,162 samples, 0.02%)</title><rect x="98.3" y="133" width="0.3" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="101.33" y="143.5" ></text>
</g>
<g >
<title>kworker/u244:0 (115,471,936 samples, 0.01%)</title><rect x="10.1" y="789" width="0.2" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="13.10" y="799.5" ></text>
</g>
<g >
<title>tag_hash (248,005,373 samples, 0.03%)</title><rect x="134.3" y="341" width="0.4" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="137.34" y="351.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (97,340,457 samples, 0.01%)</title><rect x="594.9" y="213" width="0.2" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="597.91" y="223.5" ></text>
</g>
<g >
<title>arch_cpu_idle (10,521,777,127 samples, 1.28%)</title><rect x="1163.5" y="725" width="15.0" height="15.0" fill="rgb(218,62,14)" rx="2" ry="2" />
<text x="1166.45" y="735.5" ></text>
</g>
<g >
<title>__do_page_fault (474,711,513 samples, 0.06%)</title><rect x="63.7" y="373" width="0.7" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="66.71" y="383.5" ></text>
</g>
<g >
<title>tick_sched_handle (132,283,144 samples, 0.02%)</title><rect x="709.3" y="421" width="0.1" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="712.26" y="431.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (270,831,393 samples, 0.03%)</title><rect x="318.8" y="421" width="0.4" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="321.78" y="431.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (5,912,698,770 samples, 0.72%)</title><rect x="427.1" y="133" width="8.5" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="430.10" y="143.5" ></text>
</g>
<g >
<title>tick_sched_handle (195,003,410 samples, 0.02%)</title><rect x="784.5" y="421" width="0.3" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="787.53" y="431.5" ></text>
</g>
<g >
<title>UnpinBuffer (90,219,515 samples, 0.01%)</title><rect x="133.0" y="501" width="0.2" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="136.02" y="511.5" ></text>
</g>
<g >
<title>GetVictimBuffer (42,558,234,700 samples, 5.16%)</title><rect x="257.5" y="453" width="60.9" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="260.48" y="463.5" >GetVic..</text>
</g>
<g >
<title>heap_vacuum_rel (3,086,113,520 samples, 0.37%)</title><rect x="50.2" y="485" width="4.4" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="53.20" y="495.5" ></text>
</g>
<g >
<title>update_time (100,541,237 samples, 0.01%)</title><rect x="50.9" y="85" width="0.1" height="15.0" fill="rgb(211,31,7)" rx="2" ry="2" />
<text x="53.88" y="95.5" ></text>
</g>
<g >
<title>__memmove_ssse3 (372,118,706 samples, 0.05%)</title><rect x="173.5" y="469" width="0.6" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="176.54" y="479.5" ></text>
</g>
<g >
<title>alloc_pages_current (263,588,567 samples, 0.03%)</title><rect x="86.3" y="197" width="0.4" height="15.0" fill="rgb(216,51,12)" rx="2" ry="2" />
<text x="89.33" y="207.5" ></text>
</g>
<g >
<title>xfs_do_writepage (84,258,683 samples, 0.01%)</title><rect x="10.1" y="581" width="0.2" height="15.0" fill="rgb(242,174,41)" rx="2" ry="2" />
<text x="13.15" y="591.5" ></text>
</g>
<g >
<title>tlb_flush_mmu_free (1,485,699,215 samples, 0.18%)</title><rect x="1158.6" y="629" width="2.1" height="15.0" fill="rgb(251,211,50)" rx="2" ry="2" />
<text x="1161.62" y="639.5" ></text>
</g>
<g >
<title>PageGetItemId (2,870,705,108 samples, 0.35%)</title><rect x="860.5" y="485" width="4.1" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="863.48" y="495.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (716,510,559 samples, 0.09%)</title><rect x="59.7" y="293" width="1.0" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="62.70" y="303.5" ></text>
</g>
<g >
<title>BufferGetBlock (200,056,000 samples, 0.02%)</title><rect x="1105.8" y="453" width="0.2" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="1108.75" y="463.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (527,375,283 samples, 0.06%)</title><rect x="444.9" y="181" width="0.7" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="447.85" y="191.5" ></text>
</g>
<g >
<title>PageGetItem (598,527,492 samples, 0.07%)</title><rect x="201.3" y="469" width="0.8" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="204.27" y="479.5" ></text>
</g>
<g >
<title>LWLockRelease (403,929,011 samples, 0.05%)</title><rect x="618.5" y="389" width="0.6" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="621.49" y="399.5" ></text>
</g>
<g >
<title>sys_pwrite64 (4,752,416,222 samples, 0.58%)</title><rect x="12.5" y="421" width="6.8" height="15.0" fill="rgb(238,156,37)" rx="2" ry="2" />
<text x="15.53" y="431.5" ></text>
</g>
<g >
<title>update_process_times (74,180,775 samples, 0.01%)</title><rect x="621.3" y="357" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="624.34" y="367.5" ></text>
</g>
<g >
<title>deactivate_task (351,635,633 samples, 0.04%)</title><rect x="16.4" y="245" width="0.5" height="15.0" fill="rgb(206,8,2)" rx="2" ry="2" />
<text x="19.43" y="255.5" ></text>
</g>
<g >
<title>PinBufferForBlock (140,649,243 samples, 0.02%)</title><rect x="612.0" y="437" width="0.2" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="614.97" y="447.5" ></text>
</g>
<g >
<title>update_curr (107,808,561 samples, 0.01%)</title><rect x="16.6" y="197" width="0.2" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="19.65" y="207.5" ></text>
</g>
<g >
<title>PageGetItemId (858,942,197 samples, 0.10%)</title><rect x="172.2" y="469" width="1.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="175.22" y="479.5" ></text>
</g>
<g >
<title>ktime_get (87,314,491 samples, 0.01%)</title><rect x="784.3" y="437" width="0.2" height="15.0" fill="rgb(207,10,2)" rx="2" ry="2" />
<text x="787.35" y="447.5" ></text>
</g>
<g >
<title>visibilitymap_set (696,026,385 samples, 0.08%)</title><rect x="231.9" y="517" width="1.0" height="15.0" fill="rgb(220,73,17)" rx="2" ry="2" />
<text x="234.88" y="527.5" ></text>
</g>
<g >
<title>StartReadBuffer (60,051,069,381 samples, 7.29%)</title><rect x="237.1" y="517" width="86.0" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="240.13" y="527.5" >StartReadB..</text>
</g>
<g >
<title>do_page_fault (2,342,578,748 samples, 0.28%)</title><rect x="243.3" y="389" width="3.4" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="246.33" y="399.5" ></text>
</g>
<g >
<title>RegisterSyncRequest (72,393,843 samples, 0.01%)</title><rect x="19.4" y="453" width="0.1" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="22.42" y="463.5" ></text>
</g>
<g >
<title>activate_task (102,520,980 samples, 0.01%)</title><rect x="19.1" y="229" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="22.06" y="239.5" ></text>
</g>
<g >
<title>do_sync_read (178,485,479,838 samples, 21.66%)</title><rect x="341.6" y="389" width="255.5" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="344.61" y="399.5" >do_sync_read</text>
</g>
<g >
<title>pg_atomic_read_u32_impl (200,606,586 samples, 0.02%)</title><rect x="1099.4" y="453" width="0.3" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1102.43" y="463.5" ></text>
</g>
<g >
<title>RecordPageWithFreeSpace (409,979,614 samples, 0.05%)</title><rect x="22.5" y="645" width="0.5" height="15.0" fill="rgb(247,197,47)" rx="2" ry="2" />
<text x="25.45" y="655.5" ></text>
</g>
<g >
<title>clear_page_c_e (927,684,554 samples, 0.11%)</title><rect x="386.6" y="181" width="1.3" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="389.60" y="191.5" ></text>
</g>
<g >
<title>do_set_pte (122,544,924 samples, 0.01%)</title><rect x="127.7" y="213" width="0.2" height="15.0" fill="rgb(253,221,52)" rx="2" ry="2" />
<text x="130.74" y="223.5" ></text>
</g>
<g >
<title>PinBufferForBlock (13,062,990,329 samples, 1.58%)</title><rect x="56.0" y="469" width="18.7" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="58.98" y="479.5" ></text>
</g>
<g >
<title>ttwu_do_activate (380,348,311 samples, 0.05%)</title><rect x="1166.1" y="565" width="0.6" height="15.0" fill="rgb(215,48,11)" rx="2" ry="2" />
<text x="1169.11" y="575.5" ></text>
</g>
<g >
<title>mdreadv (88,032,678 samples, 0.01%)</title><rect x="76.4" y="485" width="0.2" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="79.43" y="495.5" ></text>
</g>
<g >
<title>LWLockRelease (405,166,738 samples, 0.05%)</title><rect x="604.0" y="501" width="0.6" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="607.05" y="511.5" ></text>
</g>
<g >
<title>SetHintBits (24,076,079,462 samples, 2.92%)</title><rect x="1087.6" y="485" width="34.5" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="1090.63" y="495.5" >Se..</text>
</g>
<g >
<title>xfs_vn_update_time (253,334,555 samples, 0.03%)</title><rect x="586.5" y="277" width="0.4" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="589.50" y="287.5" ></text>
</g>
<g >
<title>tick_program_event (72,265,392 samples, 0.01%)</title><rect x="1167.0" y="629" width="0.1" height="15.0" fill="rgb(241,166,39)" rx="2" ry="2" />
<text x="1169.95" y="639.5" ></text>
</g>
<g >
<title>PageGetItem (108,314,740 samples, 0.01%)</title><rect x="53.6" y="357" width="0.1" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="56.59" y="367.5" ></text>
</g>
<g >
<title>vfs_read (37,387,576,229 samples, 4.54%)</title><rect x="78.7" y="389" width="53.5" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="81.65" y="399.5" >vfs_r..</text>
</g>
<g >
<title>apic_timer_interrupt (137,764,122 samples, 0.02%)</title><rect x="990.7" y="485" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="993.73" y="495.5" ></text>
</g>
<g >
<title>StartReadBuffer (13,163,903,090 samples, 1.60%)</title><rect x="55.8" y="501" width="18.9" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="58.84" y="511.5" ></text>
</g>
<g >
<title>do_sync_read (37,105,401,777 samples, 4.50%)</title><rect x="78.7" y="373" width="53.1" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="81.72" y="383.5" >do_sy..</text>
</g>
<g >
<title>PageGetItemId (201,313,947 samples, 0.02%)</title><rect x="34.9" y="597" width="0.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="37.89" y="607.5" ></text>
</g>
<g >
<title>iomap_file_buffered_write (1,359,289,501 samples, 0.16%)</title><rect x="12.7" y="341" width="2.0" height="15.0" fill="rgb(206,6,1)" rx="2" ry="2" />
<text x="15.74" y="351.5" ></text>
</g>
<g >
<title>x86_64_start_reservations (379,723,816 samples, 0.05%)</title><rect x="1189.5" y="741" width="0.5" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="1192.46" y="751.5" ></text>
</g>
<g >
<title>ItemPointerIsValid (254,737,337 samples, 0.03%)</title><rect x="145.6" y="469" width="0.3" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="148.56" y="479.5" ></text>
</g>
<g >
<title>BufferIsValid (500,895,006 samples, 0.06%)</title><rect x="1115.8" y="421" width="0.7" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1118.82" y="431.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (132,564,391 samples, 0.02%)</title><rect x="96.8" y="149" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="99.82" y="159.5" ></text>
</g>
<g >
<title>xfs_ilock (1,331,178,108 samples, 0.16%)</title><rect x="129.4" y="325" width="1.9" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="132.43" y="335.5" ></text>
</g>
<g >
<title>table_parallel_vacuum_scan (124,325,521,611 samples, 15.08%)</title><rect x="55.2" y="581" width="178.0" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="58.21" y="591.5" >table_parallel_vacuum_s..</text>
</g>
<g >
<title>PortalRunUtility (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="597" width="4.4" height="15.0" fill="rgb(239,160,38)" rx="2" ry="2" />
<text x="53.20" y="607.5" ></text>
</g>
<g >
<title>tick_sched_handle (75,893,920 samples, 0.01%)</title><rect x="173.3" y="357" width="0.2" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="176.34" y="367.5" ></text>
</g>
<g >
<title>read_tsc (71,236,044 samples, 0.01%)</title><rect x="1176.0" y="661" width="0.1" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1178.95" y="671.5" ></text>
</g>
<g >
<title>__find_lock_page (688,253,910 samples, 0.08%)</title><rect x="59.7" y="277" width="1.0" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="62.74" y="287.5" ></text>
</g>
<g >
<title>put_page (504,987,077 samples, 0.06%)</title><rect x="1164.6" y="549" width="0.7" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="1167.60" y="559.5" ></text>
</g>
<g >
<title>page_waitqueue (73,997,230 samples, 0.01%)</title><rect x="271.7" y="325" width="0.1" height="15.0" fill="rgb(212,34,8)" rx="2" ry="2" />
<text x="274.66" y="335.5" ></text>
</g>
<g >
<title>error_swapgs (608,977,116 samples, 0.07%)</title><rect x="44.2" y="725" width="0.9" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="47.22" y="735.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32_impl (314,930,663 samples, 0.04%)</title><rect x="63.2" y="373" width="0.5" height="15.0" fill="rgb(253,224,53)" rx="2" ry="2" />
<text x="66.23" y="383.5" ></text>
</g>
<g >
<title>__find_get_page (135,896,440 samples, 0.02%)</title><rect x="57.4" y="245" width="0.2" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="60.37" y="255.5" ></text>
</g>
<g >
<title>htsv_get_valid_status (1,156,880,980 samples, 0.14%)</title><rect x="1129.7" y="517" width="1.7" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="1132.74" y="527.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (113,577,889 samples, 0.01%)</title><rect x="1086.9" y="469" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1089.95" y="479.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (70,484,646 samples, 0.01%)</title><rect x="1058.9" y="421" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="1061.89" y="431.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32 (1,686,414,057 samples, 0.20%)</title><rect x="264.2" y="405" width="2.4" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="267.17" y="415.5" ></text>
</g>
<g >
<title>unmap_single_vma (10,374,169,931 samples, 1.26%)</title><rect x="1146.1" y="661" width="14.9" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="1149.15" y="671.5" ></text>
</g>
<g >
<title>do_lazy_scan_heap (22,521,315,931 samples, 2.73%)</title><rect x="11.0" y="661" width="32.3" height="15.0" fill="rgb(221,75,18)" rx="2" ry="2" />
<text x="14.04" y="671.5" >do..</text>
</g>
<g >
<title>__lru_cache_add (283,349,330 samples, 0.03%)</title><rect x="95.5" y="165" width="0.4" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="98.46" y="175.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (97,460,343 samples, 0.01%)</title><rect x="135.1" y="325" width="0.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="138.11" y="335.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (79,063,339 samples, 0.01%)</title><rect x="87.4" y="165" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="90.38" y="175.5" ></text>
</g>
<g >
<title>heap_prune_record_unused (360,241,375 samples, 0.04%)</title><rect x="194.8" y="469" width="0.5" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="197.82" y="479.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (323,212,945 samples, 0.04%)</title><rect x="625.1" y="533" width="0.4" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="628.07" y="543.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (245,327,453 samples, 0.03%)</title><rect x="933.5" y="453" width="0.3" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="936.47" y="463.5" ></text>
</g>
<g >
<title>__pread_nocancel (2,044,560,622 samples, 0.25%)</title><rect x="44.0" y="741" width="2.9" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="46.96" y="751.5" ></text>
</g>
<g >
<title>table_relation_vacuum (3,086,113,520 samples, 0.37%)</title><rect x="50.2" y="501" width="4.4" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="53.20" y="511.5" ></text>
</g>
<g >
<title>ReadBufferExtended (53,512,163,014 samples, 6.49%)</title><rect x="55.8" y="533" width="76.6" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="58.77" y="543.5" >ReadBuff..</text>
</g>
<g >
<title>update_process_times (195,003,410 samples, 0.02%)</title><rect x="784.5" y="405" width="0.3" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="787.53" y="415.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (86,182,567 samples, 0.01%)</title><rect x="96.9" y="133" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="99.88" y="143.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (211,882,877 samples, 0.03%)</title><rect x="828.8" y="469" width="0.3" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="831.81" y="479.5" ></text>
</g>
<g >
<title>sem_post@@GLIBC_2.2.5 (116,994,260 samples, 0.01%)</title><rect x="604.3" y="469" width="0.2" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="607.34" y="479.5" ></text>
</g>
<g >
<title>tick_sched_handle (264,178,758 samples, 0.03%)</title><rect x="445.1" y="85" width="0.4" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="448.15" y="95.5" ></text>
</g>
<g >
<title>LockBufHdr (383,395,252 samples, 0.05%)</title><rect x="63.1" y="405" width="0.6" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="66.14" y="415.5" ></text>
</g>
<g >
<title>exec_simple_query (3,088,982,358 samples, 0.37%)</title><rect x="50.2" y="645" width="4.4" height="15.0" fill="rgb(211,29,6)" rx="2" ry="2" />
<text x="53.20" y="655.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (126,027,627 samples, 0.02%)</title><rect x="79.9" y="277" width="0.2" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="82.93" y="287.5" ></text>
</g>
<g >
<title>heap_prune_chain (309,466,378 samples, 0.04%)</title><rect x="231.1" y="517" width="0.5" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="234.11" y="527.5" ></text>
</g>
<g >
<title>ResourceOwnerRemember (161,950,769 samples, 0.02%)</title><rect x="620.0" y="357" width="0.2" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="622.99" y="367.5" ></text>
</g>
<g >
<title>ItemPointerIsValid (299,545,838 samples, 0.04%)</title><rect x="222.6" y="469" width="0.5" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="225.64" y="479.5" ></text>
</g>
<g >
<title>alloc_pages_vma (4,152,451,457 samples, 0.50%)</title><rect x="569.9" y="165" width="5.9" height="15.0" fill="rgb(253,224,53)" rx="2" ry="2" />
<text x="572.89" y="175.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (196,692,883 samples, 0.02%)</title><rect x="19.6" y="501" width="0.2" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="22.55" y="511.5" ></text>
</g>
<g >
<title>SetHintBits (256,382,571 samples, 0.03%)</title><rect x="1128.2" y="501" width="0.4" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="1131.21" y="511.5" ></text>
</g>
<g >
<title>ResourceOwnerForget (102,223,199 samples, 0.01%)</title><rect x="605.6" y="485" width="0.1" height="15.0" fill="rgb(235,142,33)" rx="2" ry="2" />
<text x="608.58" y="495.5" ></text>
</g>
<g >
<title>PinBuffer_Locked (350,170,032 samples, 0.04%)</title><rect x="258.0" y="437" width="0.5" height="15.0" fill="rgb(218,60,14)" rx="2" ry="2" />
<text x="260.96" y="447.5" ></text>
</g>
<g >
<title>ReadBufferExtended (7,903,255,857 samples, 0.96%)</title><rect x="11.1" y="645" width="11.4" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="14.14" y="655.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (77,288,274 samples, 0.01%)</title><rect x="621.3" y="437" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="624.34" y="447.5" ></text>
</g>
<g >
<title>LWLockRelease (196,547,355 samples, 0.02%)</title><rect x="232.5" y="485" width="0.3" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="235.48" y="495.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (269,921,226 samples, 0.03%)</title><rect x="329.1" y="453" width="0.4" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="332.10" y="463.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (168,303,597 samples, 0.02%)</title><rect x="22.8" y="549" width="0.2" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="25.78" y="559.5" ></text>
</g>
<g >
<title>tick_do_update_jiffies64 (82,388,199 samples, 0.01%)</title><rect x="1164.4" y="629" width="0.1" height="15.0" fill="rgb(208,14,3)" rx="2" ry="2" />
<text x="1167.39" y="639.5" ></text>
</g>
<g >
<title>FileReadV (319,415,208 samples, 0.04%)</title><rect x="49.5" y="741" width="0.5" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="52.51" y="751.5" ></text>
</g>
<g >
<title>mdwritev (5,074,689,565 samples, 0.62%)</title><rect x="12.3" y="485" width="7.2" height="15.0" fill="rgb(215,50,12)" rx="2" ry="2" />
<text x="15.26" y="495.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (640,204,755 samples, 0.08%)</title><rect x="95.9" y="165" width="0.9" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="98.87" y="175.5" ></text>
</g>
<g >
<title>do_futex (106,839,308 samples, 0.01%)</title><rect x="604.4" y="421" width="0.1" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="607.35" y="431.5" ></text>
</g>
<g >
<title>trigger_load_balance (74,180,775 samples, 0.01%)</title><rect x="621.3" y="325" width="0.1" height="15.0" fill="rgb(228,108,26)" rx="2" ry="2" />
<text x="624.34" y="335.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (105,887,461 samples, 0.01%)</title><rect x="203.0" y="421" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="206.01" y="431.5" ></text>
</g>
<g >
<title>PageGetItem (101,459,237 samples, 0.01%)</title><rect x="26.2" y="613" width="0.2" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="29.23" y="623.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (124,382,774 samples, 0.02%)</title><rect x="73.5" y="421" width="0.2" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="76.53" y="431.5" ></text>
</g>
<g >
<title>drop_futex_key_refs.isra.13 (97,529,216 samples, 0.01%)</title><rect x="48.4" y="677" width="0.2" height="15.0" fill="rgb(217,56,13)" rx="2" ry="2" />
<text x="51.43" y="687.5" ></text>
</g>
<g >
<title>handle_mm_fault (93,138,394 samples, 0.01%)</title><rect x="11.3" y="453" width="0.2" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="14.33" y="463.5" ></text>
</g>
<g >
<title>__memcpy_ssse3 (74,357,833 samples, 0.01%)</title><rect x="238.8" y="437" width="0.1" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="241.76" y="447.5" ></text>
</g>
<g >
<title>TransactionLogFetch (192,068,445 samples, 0.02%)</title><rect x="1125.5" y="485" width="0.3" height="15.0" fill="rgb(244,180,43)" rx="2" ry="2" />
<text x="1128.52" y="495.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (117,603,141 samples, 0.01%)</title><rect x="229.3" y="453" width="0.1" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="232.28" y="463.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (1,458,294,053 samples, 0.18%)</title><rect x="1095.3" y="437" width="2.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1098.30" y="447.5" ></text>
</g>
<g >
<title>__alloc_pages_nodemask (1,033,797,387 samples, 0.13%)</title><rect x="125.7" y="133" width="1.4" height="15.0" fill="rgb(228,108,25)" rx="2" ry="2" />
<text x="128.67" y="143.5" ></text>
</g>
<g >
<title>xfs_ilock (5,522,102,019 samples, 0.67%)</title><rect x="587.3" y="341" width="7.9" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="590.26" y="351.5" ></text>
</g>
<g >
<title>BufferAlloc (986,844,257 samples, 0.12%)</title><rect x="50.2" y="341" width="1.4" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="53.22" y="351.5" ></text>
</g>
<g >
<title>BufferDescriptorGetContentLock (112,583,956 samples, 0.01%)</title><rect x="1105.6" y="453" width="0.2" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="1108.59" y="463.5" ></text>
</g>
<g >
<title>futex_wake (106,839,308 samples, 0.01%)</title><rect x="604.4" y="405" width="0.1" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="607.35" y="415.5" ></text>
</g>
<g >
<title>page_fault (477,991,274 samples, 0.06%)</title><rect x="63.7" y="405" width="0.7" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="66.71" y="415.5" ></text>
</g>
<g >
<title>rb_erase (99,948,625 samples, 0.01%)</title><rect x="1185.7" y="613" width="0.2" height="15.0" fill="rgb(219,66,15)" rx="2" ry="2" />
<text x="1188.74" y="623.5" ></text>
</g>
<g >
<title>futex_wake (107,024,141 samples, 0.01%)</title><rect x="73.9" y="341" width="0.2" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="76.93" y="351.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (484,379,198 samples, 0.06%)</title><rect x="327.9" y="341" width="0.7" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="330.93" y="351.5" ></text>
</g>
<g >
<title>down_read (1,295,070,375 samples, 0.16%)</title><rect x="129.5" y="309" width="1.8" height="15.0" fill="rgb(246,188,45)" rx="2" ry="2" />
<text x="132.48" y="319.5" ></text>
</g>
<g >
<title>next_zones_zonelist (254,326,349 samples, 0.03%)</title><rect x="575.2" y="133" width="0.4" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="578.25" y="143.5" ></text>
</g>
<g >
<title>heap_prune_chain (26,077,695,915 samples, 3.16%)</title><rect x="181.3" y="501" width="37.3" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="184.31" y="511.5" >hea..</text>
</g>
<g >
<title>fsnotify (75,729,740 samples, 0.01%)</title><rect x="597.8" y="373" width="0.1" height="15.0" fill="rgb(215,50,12)" rx="2" ry="2" />
<text x="600.77" y="383.5" ></text>
</g>
<g >
<title>rw_verify_area (1,622,761,477 samples, 0.20%)</title><rect x="597.2" y="389" width="2.4" height="15.0" fill="rgb(218,64,15)" rx="2" ry="2" />
<text x="600.24" y="399.5" ></text>
</g>
<g >
<title>init_spin_delay (76,435,054 samples, 0.01%)</title><rect x="331.1" y="453" width="0.1" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="334.11" y="463.5" ></text>
</g>
<g >
<title>BufferAlloc (398,251,233 samples, 0.05%)</title><rect x="43.4" y="725" width="0.5" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="46.36" y="735.5" ></text>
</g>
<g >
<title>swapper (20,221,455,097 samples, 2.45%)</title><rect x="1161.0" y="789" width="29.0" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="1164.05" y="799.5" >sw..</text>
</g>
<g >
<title>WaitReadBuffersCanStartIO (560,582,188 samples, 0.07%)</title><rect x="330.8" y="501" width="0.8" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="333.80" y="511.5" ></text>
</g>
<g >
<title>do_shared_fault.isra.64 (946,101,688 samples, 0.11%)</title><rect x="327.7" y="389" width="1.3" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="330.68" y="399.5" ></text>
</g>
<g >
<title>file_read_actor (585,648,256 samples, 0.07%)</title><rect x="371.3" y="309" width="0.8" height="15.0" fill="rgb(218,61,14)" rx="2" ry="2" />
<text x="374.30" y="319.5" ></text>
</g>
<g >
<title>copy_user_enhanced_fast_string (637,877,709 samples, 0.08%)</title><rect x="12.8" y="293" width="0.9" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="15.79" y="303.5" ></text>
</g>
<g >
<title>xfs_ilock (85,346,173 samples, 0.01%)</title><rect x="586.5" y="261" width="0.1" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="589.50" y="271.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (101,720,074 samples, 0.01%)</title><rect x="40.0" y="613" width="0.2" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="43.02" y="623.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (99,260,054 samples, 0.01%)</title><rect x="1143.5" y="501" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="1146.53" y="511.5" ></text>
</g>
<g >
<title>vm_readbuf (282,312,859 samples, 0.03%)</title><rect x="23.3" y="597" width="0.4" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="26.34" y="607.5" ></text>
</g>
<g >
<title>__virt_addr_valid (193,004,855 samples, 0.02%)</title><rect x="371.8" y="277" width="0.2" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="374.77" y="287.5" ></text>
</g>
<g >
<title>do_set_pte (639,253,165 samples, 0.08%)</title><rect x="578.4" y="229" width="0.9" height="15.0" fill="rgb(253,221,52)" rx="2" ry="2" />
<text x="581.43" y="239.5" ></text>
</g>
<g >
<title>LWLockWakeup (205,627,301 samples, 0.02%)</title><rect x="73.8" y="421" width="0.3" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="76.79" y="431.5" ></text>
</g>
<g >
<title>__pwrite_nocancel (721,846,937 samples, 0.09%)</title><rect x="50.5" y="229" width="1.0" height="15.0" fill="rgb(219,67,16)" rx="2" ry="2" />
<text x="53.45" y="239.5" ></text>
</g>
<g >
<title>__check_object_size (401,775,242 samples, 0.05%)</title><rect x="371.5" y="293" width="0.6" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="374.49" y="303.5" ></text>
</g>
<g >
<title>free_pcppages_bulk (437,081,236 samples, 0.05%)</title><rect x="1164.7" y="501" width="0.6" height="15.0" fill="rgb(210,26,6)" rx="2" ry="2" />
<text x="1167.69" y="511.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (204,839,350 samples, 0.02%)</title><rect x="74.1" y="421" width="0.3" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="77.09" y="431.5" ></text>
</g>
<g >
<title>shmem_alloc_page (4,457,593,560 samples, 0.54%)</title><rect x="569.6" y="181" width="6.4" height="15.0" fill="rgb(214,42,10)" rx="2" ry="2" />
<text x="572.61" y="191.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (2,724,528,719 samples, 0.33%)</title><rect x="440.8" y="181" width="3.9" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="443.84" y="191.5" ></text>
</g>
<g >
<title>LWLockHeldByMe (155,181,642 samples, 0.02%)</title><rect x="228.6" y="437" width="0.2" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="231.59" y="447.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuum (161,547,505 samples, 0.02%)</title><rect x="136.3" y="517" width="0.2" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="139.28" y="527.5" ></text>
</g>
<g >
<title>__switch_to (199,102,160 samples, 0.02%)</title><rect x="334.8" y="437" width="0.3" height="15.0" fill="rgb(205,2,0)" rx="2" ry="2" />
<text x="337.82" y="447.5" ></text>
</g>
<g >
<title>perform_spin_delay (8,381,902,204 samples, 1.02%)</title><rect x="301.2" y="405" width="12.0" height="15.0" fill="rgb(247,196,46)" rx="2" ry="2" />
<text x="304.18" y="415.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuum (964,297,582 samples, 0.12%)</title><rect x="625.7" y="533" width="1.4" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="628.72" y="543.5" ></text>
</g>
<g >
<title>_raw_spin_lock_irqsave (169,289,522 samples, 0.02%)</title><rect x="594.8" y="245" width="0.3" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="597.81" y="255.5" ></text>
</g>
<g >
<title>tas (244,094,244 samples, 0.03%)</title><rect x="73.0" y="389" width="0.3" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="75.97" y="399.5" ></text>
</g>
<g >
<title>dsa_get_total_size (2,030,461,994 samples, 0.25%)</title><rect x="601.7" y="517" width="2.9" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="604.72" y="527.5" ></text>
</g>
<g >
<title>iomap_file_buffered_write (158,972,088 samples, 0.02%)</title><rect x="50.6" y="117" width="0.3" height="15.0" fill="rgb(206,6,1)" rx="2" ry="2" />
<text x="53.63" y="127.5" ></text>
</g>
<g >
<title>__find_lock_page (735,697,712 samples, 0.09%)</title><rect x="255.9" y="293" width="1.1" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="258.93" y="303.5" ></text>
</g>
<g >
<title>xfs_file_aio_read (267,586,436 samples, 0.03%)</title><rect x="599.6" y="389" width="0.4" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="602.57" y="399.5" ></text>
</g>
<g >
<title>hash_bytes (170,562,460 samples, 0.02%)</title><rect x="56.1" y="389" width="0.2" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="59.09" y="399.5" ></text>
</g>
<g >
<title>handle_mm_fault (250,258,863 samples, 0.03%)</title><rect x="75.5" y="389" width="0.3" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="78.48" y="399.5" ></text>
</g>
<g >
<title>smgrwritev (749,558,632 samples, 0.09%)</title><rect x="50.4" y="277" width="1.1" height="15.0" fill="rgb(217,56,13)" rx="2" ry="2" />
<text x="53.45" y="287.5" ></text>
</g>
<g >
<title>__do_softirq (203,950,887 samples, 0.02%)</title><rect x="1189.6" y="549" width="0.3" height="15.0" fill="rgb(246,191,45)" rx="2" ry="2" />
<text x="1192.63" y="559.5" ></text>
</g>
<g >
<title>unlock_page (84,843,846 samples, 0.01%)</title><rect x="328.9" y="373" width="0.1" height="15.0" fill="rgb(220,69,16)" rx="2" ry="2" />
<text x="331.91" y="383.5" ></text>
</g>
<g >
<title>__find_get_page (8,299,844,019 samples, 1.01%)</title><rect x="426.9" y="165" width="11.9" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="429.89" y="175.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_add_u32_impl (179,345,556 samples, 0.02%)</title><rect x="65.1" y="389" width="0.3" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="68.14" y="399.5" ></text>
</g>
<g >
<title>rcu_eqs_enter (70,626,732 samples, 0.01%)</title><rect x="1178.9" y="709" width="0.1" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="1181.86" y="719.5" ></text>
</g>
<g >
<title>BufferGetPage (70,909,365 samples, 0.01%)</title><rect x="610.2" y="501" width="0.1" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="613.22" y="511.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (141,508,905 samples, 0.02%)</title><rect x="57.4" y="277" width="0.2" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="60.36" y="287.5" ></text>
</g>
<g >
<title>dequeue_task_fair (129,312,767 samples, 0.02%)</title><rect x="310.9" y="261" width="0.2" height="15.0" fill="rgb(230,119,28)" rx="2" ry="2" />
<text x="313.92" y="271.5" ></text>
</g>
<g >
<title>wake_q_add (169,259,723 samples, 0.02%)</title><rect x="17.6" y="261" width="0.2" height="15.0" fill="rgb(248,200,47)" rx="2" ry="2" />
<text x="20.58" y="271.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (145,923,964 samples, 0.02%)</title><rect x="1033.9" y="437" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="1036.92" y="447.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (440,844,041 samples, 0.05%)</title><rect x="445.0" y="165" width="0.6" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="447.98" y="175.5" ></text>
</g>
<g >
<title>mark_page_accessed (116,915,793 samples, 0.01%)</title><rect x="372.1" y="309" width="0.2" height="15.0" fill="rgb(217,57,13)" rx="2" ry="2" />
<text x="375.14" y="319.5" ></text>
</g>
<g >
<title>ItemPointerIsValid (1,001,975,515 samples, 0.12%)</title><rect x="680.9" y="501" width="1.5" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="683.92" y="511.5" ></text>
</g>
<g >
<title>tlb_remove_table_rcu (520,674,817 samples, 0.06%)</title><rect x="1164.6" y="581" width="0.7" height="15.0" fill="rgb(226,99,23)" rx="2" ry="2" />
<text x="1167.58" y="591.5" ></text>
</g>
<g >
<title>rcu_process_callbacks (525,422,512 samples, 0.06%)</title><rect x="1164.6" y="597" width="0.7" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="1167.58" y="607.5" ></text>
</g>
<g >
<title>update_cpu_load_nohz (176,457,505 samples, 0.02%)</title><rect x="1188.3" y="709" width="0.2" height="15.0" fill="rgb(225,92,22)" rx="2" ry="2" />
<text x="1191.26" y="719.5" ></text>
</g>
<g >
<title>lazy_scan_prune (364,474,634,375 samples, 44.22%)</title><rect x="622.0" y="549" width="521.8" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="624.97" y="559.5" >lazy_scan_prune</text>
</g>
<g >
<title>hrtimer_start_range_ns (348,714,643 samples, 0.04%)</title><rect x="1187.6" y="693" width="0.5" height="15.0" fill="rgb(244,179,42)" rx="2" ry="2" />
<text x="1190.60" y="703.5" ></text>
</g>
<g >
<title>hrtimer_nanosleep (86,396,643 samples, 0.01%)</title><rect x="71.8" y="325" width="0.2" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="74.84" y="335.5" ></text>
</g>
<g >
<title>FileWriteV (725,040,178 samples, 0.09%)</title><rect x="50.4" y="245" width="1.1" height="15.0" fill="rgb(248,201,48)" rx="2" ry="2" />
<text x="53.45" y="255.5" ></text>
</g>
<g >
<title>PageRepairFragmentation (1,581,199,235 samples, 0.19%)</title><rect x="30.0" y="597" width="2.3" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="33.02" y="607.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (1,919,595,363 samples, 0.23%)</title><rect x="1113.9" y="437" width="2.7" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1116.88" y="447.5" ></text>
</g>
<g >
<title>rest_init (379,723,816 samples, 0.05%)</title><rect x="1189.5" y="709" width="0.5" height="15.0" fill="rgb(252,217,51)" rx="2" ry="2" />
<text x="1192.46" y="719.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (177,153,295 samples, 0.02%)</title><rect x="1033.9" y="469" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1036.87" y="479.5" ></text>
</g>
<g >
<title>[perf] (495,367,615 samples, 0.06%)</title><rect x="10.3" y="741" width="0.7" height="15.0" fill="rgb(253,223,53)" rx="2" ry="2" />
<text x="13.27" y="751.5" ></text>
</g>
<g >
<title>scheduler_tick (110,255,351 samples, 0.01%)</title><rect x="1086.9" y="341" width="0.2" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="1089.95" y="351.5" ></text>
</g>
<g >
<title>__find_get_page (1,289,182,240 samples, 0.16%)</title><rect x="244.6" y="261" width="1.8" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="247.57" y="271.5" ></text>
</g>
<g >
<title>calc_load_enter_idle (221,857,308 samples, 0.03%)</title><rect x="1184.6" y="677" width="0.3" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1187.58" y="687.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (172,079,250 samples, 0.02%)</title><rect x="602.3" y="453" width="0.2" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="605.26" y="463.5" ></text>
</g>
<g >
<title>get_pageblock_flags_group (98,117,971 samples, 0.01%)</title><rect x="574.5" y="117" width="0.1" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="577.46" y="127.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_add_u32 (807,781,132 samples, 0.10%)</title><rect x="275.1" y="421" width="1.2" height="15.0" fill="rgb(206,4,1)" rx="2" ry="2" />
<text x="278.14" y="431.5" ></text>
</g>
<g >
<title>sys_futex (106,839,308 samples, 0.01%)</title><rect x="604.4" y="437" width="0.1" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="607.35" y="447.5" ></text>
</g>
<g >
<title>do_shared_fault.isra.64 (231,871,738 samples, 0.03%)</title><rect x="75.5" y="373" width="0.3" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="78.50" y="383.5" ></text>
</g>
<g >
<title>heap_prune_record_dead_or_unused (593,085,989 samples, 0.07%)</title><rect x="194.5" y="485" width="0.8" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="197.48" y="495.5" ></text>
</g>
<g >
<title>gup_pte_range (71,673,109 samples, 0.01%)</title><rect x="49.1" y="613" width="0.1" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="52.10" y="623.5" ></text>
</g>
<g >
<title>arch_cpu_idle_enter (104,654,779 samples, 0.01%)</title><rect x="1178.5" y="725" width="0.2" height="15.0" fill="rgb(253,223,53)" rx="2" ry="2" />
<text x="1181.52" y="735.5" ></text>
</g>
<g >
<title>BufferDescriptorGetContentLock (508,629,527 samples, 0.06%)</title><rect x="624.3" y="533" width="0.8" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="627.34" y="543.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (1,302,984,685 samples, 0.16%)</title><rect x="211.4" y="453" width="1.9" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="214.43" y="463.5" ></text>
</g>
<g >
<title>TransactionIdIsInProgress (274,639,345 samples, 0.03%)</title><rect x="1129.3" y="501" width="0.4" height="15.0" fill="rgb(208,16,3)" rx="2" ry="2" />
<text x="1132.26" y="511.5" ></text>
</g>
<g >
<title>mpol_shared_policy_lookup (72,574,635 samples, 0.01%)</title><rect x="575.9" y="165" width="0.1" height="15.0" fill="rgb(213,41,9)" rx="2" ry="2" />
<text x="578.85" y="175.5" ></text>
</g>
<g >
<title>BufferIsValid (89,617,797 samples, 0.01%)</title><rect x="225.1" y="405" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="228.13" y="415.5" ></text>
</g>
<g >
<title>PageSetChecksumCopy (70,930,903 samples, 0.01%)</title><rect x="50.3" y="293" width="0.1" height="15.0" fill="rgb(218,63,15)" rx="2" ry="2" />
<text x="53.30" y="303.5" ></text>
</g>
<g >
<title>schedule (579,092,383 samples, 0.07%)</title><rect x="130.4" y="261" width="0.8" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="133.36" y="271.5" ></text>
</g>
<g >
<title>PageGetItem (2,918,717,494 samples, 0.35%)</title><rect x="973.1" y="485" width="4.2" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="976.08" y="495.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (6,343,805,105 samples, 0.77%)</title><rect x="612.2" y="437" width="9.1" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="615.17" y="447.5" ></text>
</g>
<g >
<title>heap_page_is_all_visible (186,708,845 samples, 0.02%)</title><rect x="52.2" y="405" width="0.2" height="15.0" fill="rgb(228,107,25)" rx="2" ry="2" />
<text x="55.16" y="415.5" ></text>
</g>
<g >
<title>ReadBuffer_common (6,702,418,469 samples, 0.81%)</title><rect x="611.7" y="469" width="9.6" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="614.66" y="479.5" ></text>
</g>
<g >
<title>ReadBufferExtended (7,420,804,528 samples, 0.90%)</title><rect x="610.8" y="485" width="10.6" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="613.82" y="495.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (5,867,090,871 samples, 0.71%)</title><rect x="249.0" y="437" width="8.4" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="252.05" y="447.5" ></text>
</g>
<g >
<title>__remove_hrtimer (523,541,777 samples, 0.06%)</title><rect x="1185.2" y="645" width="0.7" height="15.0" fill="rgb(218,63,15)" rx="2" ry="2" />
<text x="1188.15" y="655.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (177,305,943 samples, 0.02%)</title><rect x="709.2" y="485" width="0.3" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="712.23" y="495.5" ></text>
</g>
<g >
<title>__do_page_fault (99,846,476 samples, 0.01%)</title><rect x="11.3" y="469" width="0.2" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="14.32" y="479.5" ></text>
</g>
<g >
<title>system_call_after_swapgs (303,748,794 samples, 0.04%)</title><rect x="340.0" y="437" width="0.4" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="342.98" y="447.5" ></text>
</g>
<g >
<title>heap_tuple_needs_eventual_freeze (87,075,997 samples, 0.01%)</title><rect x="1135.6" y="533" width="0.2" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="1138.64" y="543.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (28,431,683,665 samples, 3.45%)</title><rect x="87.0" y="213" width="40.7" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="89.97" y="223.5" >__d..</text>
</g>
<g >
<title>scheduler_tick (71,018,716 samples, 0.01%)</title><rect x="990.8" y="357" width="0.1" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="993.83" y="367.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (1,328,906,781 samples, 0.16%)</title><rect x="56.4" y="421" width="1.9" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="59.39" y="431.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (110,251,515 samples, 0.01%)</title><rect x="65.4" y="405" width="0.2" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="68.42" y="415.5" ></text>
</g>
<g >
<title>BlockIdSet (551,870,522 samples, 0.07%)</title><rect x="665.5" y="517" width="0.8" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="668.52" y="527.5" ></text>
</g>
<g >
<title>update_min_vruntime (70,828,993 samples, 0.01%)</title><rect x="16.8" y="197" width="0.1" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="19.80" y="207.5" ></text>
</g>
<g >
<title>xfs_inode_item_format (145,347,130 samples, 0.02%)</title><rect x="15.2" y="229" width="0.2" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="18.17" y="239.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (98,290,110 samples, 0.01%)</title><rect x="23.5" y="437" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="26.55" y="447.5" ></text>
</g>
<g >
<title>security_file_permission (110,206,791 samples, 0.01%)</title><rect x="131.9" y="357" width="0.2" height="15.0" fill="rgb(225,96,23)" rx="2" ry="2" />
<text x="134.94" y="367.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (72,462,581 samples, 0.01%)</title><rect x="349.8" y="277" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="352.81" y="287.5" ></text>
</g>
<g >
<title>__set_page_dirty_no_writeback (84,521,957 samples, 0.01%)</title><rect x="1151.3" y="629" width="0.1" height="15.0" fill="rgb(223,86,20)" rx="2" ry="2" />
<text x="1154.29" y="639.5" ></text>
</g>
<g >
<title>gup_pud_range (83,925,706 samples, 0.01%)</title><rect x="49.1" y="629" width="0.1" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="52.08" y="639.5" ></text>
</g>
<g >
<title>BufferAlloc (220,804,767 samples, 0.03%)</title><rect x="23.4" y="501" width="0.3" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="26.39" y="511.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (1,363,648,876 samples, 0.17%)</title><rect x="244.5" y="293" width="1.9" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="247.47" y="303.5" ></text>
</g>
<g >
<title>page_verify_redirects (342,025,186 samples, 0.04%)</title><rect x="32.3" y="597" width="0.5" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="35.29" y="607.5" ></text>
</g>
<g >
<title>futex_wake (103,863,112 samples, 0.01%)</title><rect x="320.8" y="357" width="0.2" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="323.82" y="367.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (78,445,727 samples, 0.01%)</title><rect x="20.2" y="533" width="0.2" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="23.24" y="543.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (398,251,233 samples, 0.05%)</title><rect x="43.4" y="757" width="0.5" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="46.36" y="767.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (146,301,513 samples, 0.02%)</title><rect x="96.8" y="165" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="99.80" y="175.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (118,023,478 samples, 0.01%)</title><rect x="770.4" y="485" width="0.2" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="773.41" y="495.5" ></text>
</g>
<g >
<title>FlushBuffer (856,803,789 samples, 0.10%)</title><rect x="50.3" y="309" width="1.2" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" />
<text x="53.30" y="319.5" ></text>
</g>
<g >
<title>do_generic_file_read.constprop.52 (107,859,509 samples, 0.01%)</title><rect x="51.7" y="197" width="0.2" height="15.0" fill="rgb(205,4,1)" rx="2" ry="2" />
<text x="54.73" y="207.5" ></text>
</g>
<g >
<title>ttwu_do_wakeup (421,853,723 samples, 0.05%)</title><rect x="1180.7" y="693" width="0.6" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1183.72" y="703.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (2,723,497,334 samples, 0.33%)</title><rect x="440.8" y="165" width="3.9" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="443.85" y="175.5" ></text>
</g>
<g >
<title>do_sync_write (703,100,533 samples, 0.09%)</title><rect x="50.5" y="165" width="1.0" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="53.47" y="175.5" ></text>
</g>
<g >
<title>BufferIsValid (221,689,814 samples, 0.03%)</title><rect x="1144.9" y="533" width="0.4" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1147.94" y="543.5" ></text>
</g>
<g >
<title>__pagevec_lru_add_fn (163,878,265 samples, 0.02%)</title><rect x="95.5" y="133" width="0.3" height="15.0" fill="rgb(244,183,43)" rx="2" ry="2" />
<text x="98.54" y="143.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (171,683,917 samples, 0.02%)</title><rect x="57.3" y="309" width="0.3" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="60.32" y="319.5" ></text>
</g>
<g >
<title>BufferIsValid (72,498,815 samples, 0.01%)</title><rect x="43.0" y="597" width="0.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="45.97" y="607.5" ></text>
</g>
<g >
<title>mdreadv (187,324,214,970 samples, 22.73%)</title><rect x="333.0" y="485" width="268.1" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="335.96" y="495.5" >mdreadv</text>
</g>
<g >
<title>xfs_file_aio_write_checks (110,393,665 samples, 0.01%)</title><rect x="50.9" y="117" width="0.1" height="15.0" fill="rgb(249,206,49)" rx="2" ry="2" />
<text x="53.87" y="127.5" ></text>
</g>
<g >
<title>handle_mm_fault (2,173,650,218 samples, 0.26%)</title><rect x="243.5" y="357" width="3.1" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="246.50" y="367.5" ></text>
</g>
<g >
<title>dequeue_task_fair (182,159,418 samples, 0.02%)</title><rect x="130.7" y="213" width="0.3" height="15.0" fill="rgb(230,119,28)" rx="2" ry="2" />
<text x="133.73" y="223.5" ></text>
</g>
<g >
<title>enqueue_hrtimer (124,529,427 samples, 0.02%)</title><rect x="1185.9" y="645" width="0.2" height="15.0" fill="rgb(211,29,7)" rx="2" ry="2" />
<text x="1188.91" y="655.5" ></text>
</g>
<g >
<title>process_pm_pmsignal (124,943,774,712 samples, 15.16%)</title><rect x="54.6" y="709" width="178.9" height="15.0" fill="rgb(254,228,54)" rx="2" ry="2" />
<text x="57.63" y="719.5" >process_pm_pmsignal</text>
</g>
<g >
<title>try_to_wake_up (96,145,621 samples, 0.01%)</title><rect x="131.2" y="245" width="0.1" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="134.19" y="255.5" ></text>
</g>
<g >
<title>reschedule_interrupt (224,011,323 samples, 0.03%)</title><rect x="1189.6" y="645" width="0.3" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="1192.60" y="655.5" ></text>
</g>
<g >
<title>BufferGetPage (159,797,540 samples, 0.02%)</title><rect x="667.5" y="517" width="0.3" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="670.54" y="527.5" ></text>
</g>
<g >
<title>hash_bytes (816,975,810 samples, 0.10%)</title><rect x="613.8" y="341" width="1.2" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="616.78" y="351.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (77,288,274 samples, 0.01%)</title><rect x="621.3" y="421" width="0.1" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="624.34" y="431.5" ></text>
</g>
<g >
<title>ReadBuffer_common (229,827,170 samples, 0.03%)</title><rect x="23.4" y="565" width="0.3" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="26.39" y="575.5" ></text>
</g>
<g >
<title>system_call_fastpath (73,620,781 samples, 0.01%)</title><rect x="232.6" y="437" width="0.1" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="235.59" y="447.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (785,143,918 samples, 0.10%)</title><rect x="617.2" y="357" width="1.1" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="620.20" y="367.5" ></text>
</g>
<g >
<title>tick_sched_handle (110,255,351 samples, 0.01%)</title><rect x="1086.9" y="373" width="0.2" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="1089.95" y="383.5" ></text>
</g>
<g >
<title>finish_task_switch (94,603,716 samples, 0.01%)</title><rect x="131.0" y="229" width="0.1" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="134.01" y="239.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (127,433,970 samples, 0.02%)</title><rect x="682.4" y="437" width="0.1" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="685.35" y="447.5" ></text>
</g>
<g >
<title>heap_prune_chain (1,176,880,130 samples, 0.14%)</title><rect x="1131.5" y="533" width="1.7" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1134.49" y="543.5" ></text>
</g>
<g >
<title>__xfs_trans_commit (403,428,213 samples, 0.05%)</title><rect x="15.1" y="261" width="0.5" height="15.0" fill="rgb(223,85,20)" rx="2" ry="2" />
<text x="18.06" y="271.5" ></text>
</g>
<g >
<title>cpuidle_idle_call (10,447,641,085 samples, 1.27%)</title><rect x="1163.5" y="709" width="15.0" height="15.0" fill="rgb(207,9,2)" rx="2" ry="2" />
<text x="1166.51" y="719.5" ></text>
</g>
<g >
<title>tas (587,666,632 samples, 0.07%)</title><rect x="247.9" y="405" width="0.8" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="250.85" y="415.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (255,273,570 samples, 0.03%)</title><rect x="1097.4" y="453" width="0.4" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1100.40" y="463.5" ></text>
</g>
<g >
<title>sys_futex (323,879,808 samples, 0.04%)</title><rect x="1142.1" y="437" width="0.4" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="1145.06" y="447.5" ></text>
</g>
<g >
<title>cpuidle_idle_call (321,864,645 samples, 0.04%)</title><rect x="1189.5" y="661" width="0.4" height="15.0" fill="rgb(207,9,2)" rx="2" ry="2" />
<text x="1192.46" y="671.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (70,813,112 samples, 0.01%)</title><rect x="73.6" y="389" width="0.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="76.61" y="399.5" ></text>
</g>
<g >
<title>PageGetItemId (914,641,063 samples, 0.11%)</title><rect x="164.6" y="501" width="1.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="167.60" y="511.5" ></text>
</g>
<g >
<title>get_hash_entry (852,511,860 samples, 0.10%)</title><rect x="56.9" y="405" width="1.2" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="59.85" y="415.5" ></text>
</g>
<g >
<title>_cond_resched (110,296,033 samples, 0.01%)</title><rect x="79.0" y="309" width="0.2" height="15.0" fill="rgb(231,121,29)" rx="2" ry="2" />
<text x="82.01" y="319.5" ></text>
</g>
<g >
<title>vfs_read (128,631,091 samples, 0.02%)</title><rect x="51.7" y="277" width="0.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="54.71" y="287.5" ></text>
</g>
<g >
<title>tas (129,434,267 samples, 0.02%)</title><rect x="57.9" y="389" width="0.2" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="60.89" y="399.5" ></text>
</g>
<g >
<title>do_exit (10,517,164,883 samples, 1.28%)</title><rect x="1146.0" y="725" width="15.0" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1148.99" y="735.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (760,965,668 samples, 0.09%)</title><rect x="59.6" y="325" width="1.1" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="62.63" y="335.5" ></text>
</g>
<g >
<title>alloc_pages_current (1,465,407,686 samples, 0.18%)</title><rect x="386.4" y="213" width="2.1" height="15.0" fill="rgb(216,51,12)" rx="2" ry="2" />
<text x="389.40" y="223.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (120,317,963 samples, 0.01%)</title><rect x="37.2" y="581" width="0.2" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="40.20" y="591.5" ></text>
</g>
<g >
<title>__list_del_entry (606,310,164 samples, 0.07%)</title><rect x="573.5" y="85" width="0.8" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="576.46" y="95.5" ></text>
</g>
<g >
<title>heap_prepare_freeze_tuple (47,580,684,367 samples, 5.77%)</title><rect x="990.9" y="485" width="68.1" height="15.0" fill="rgb(227,101,24)" rx="2" ry="2" />
<text x="993.93" y="495.5" >heap_pr..</text>
</g>
<g >
<title>PageGetItemId (2,476,381,808 samples, 0.30%)</title><rect x="850.3" y="469" width="3.5" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="853.26" y="479.5" ></text>
</g>
<g >
<title>page_fault (1,158,273,366 samples, 0.14%)</title><rect x="59.2" y="405" width="1.7" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="62.23" y="415.5" ></text>
</g>
<g >
<title>system_call_after_swapgs (192,189,271 samples, 0.02%)</title><rect x="331.6" y="485" width="0.3" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="334.64" y="495.5" ></text>
</g>
<g >
<title>BufferGetPage (367,660,041 samples, 0.04%)</title><rect x="226.8" y="437" width="0.5" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="229.76" y="447.5" ></text>
</g>
<g >
<title>futex_wait (730,129,569 samples, 0.09%)</title><rect x="48.4" y="693" width="1.0" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="51.40" y="703.5" ></text>
</g>
<g >
<title>tick_check_idle (254,503,972 samples, 0.03%)</title><rect x="1164.1" y="645" width="0.4" height="15.0" fill="rgb(208,17,4)" rx="2" ry="2" />
<text x="1167.15" y="655.5" ></text>
</g>
<g >
<title>LockBuffer (2,663,642,562 samples, 0.32%)</title><rect x="1139.1" y="517" width="3.8" height="15.0" fill="rgb(235,142,34)" rx="2" ry="2" />
<text x="1142.09" y="527.5" ></text>
</g>
<g >
<title>radix_tree_descend (365,643,101 samples, 0.04%)</title><rect x="94.9" y="117" width="0.5" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="97.92" y="127.5" ></text>
</g>
<g >
<title>LWLockAcquire (1,136,756,674 samples, 0.14%)</title><rect x="616.8" y="389" width="1.6" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="619.76" y="399.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (16,873,002,420 samples, 2.05%)</title><rect x="101.4" y="133" width="24.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="104.40" y="143.5" >q..</text>
</g>
<g >
<title>down_write (956,045,785 samples, 0.12%)</title><rect x="15.8" y="325" width="1.4" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="18.83" y="335.5" ></text>
</g>
<g >
<title>heap_page_is_all_visible (9,611,264,937 samples, 1.17%)</title><rect x="138.1" y="517" width="13.8" height="15.0" fill="rgb(228,107,25)" rx="2" ry="2" />
<text x="141.11" y="527.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (1,123,942,176 samples, 0.14%)</title><rect x="1165.4" y="645" width="1.7" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="1168.44" y="655.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (600,586,378 samples, 0.07%)</title><rect x="246.8" y="405" width="0.9" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="249.83" y="415.5" ></text>
</g>
<g >
<title>tag_hash (1,038,492,742 samples, 0.13%)</title><rect x="613.8" y="357" width="1.4" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="616.76" y="367.5" ></text>
</g>
<g >
<title>PinBufferForBlock (1,183,027,082 samples, 0.14%)</title><rect x="134.0" y="405" width="1.7" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="136.99" y="415.5" ></text>
</g>
<g >
<title>mem_cgroup_cache_charge (1,087,429,015 samples, 0.13%)</title><rect x="97.0" y="165" width="1.6" height="15.0" fill="rgb(251,215,51)" rx="2" ry="2" />
<text x="100.01" y="175.5" ></text>
</g>
<g >
<title>BackendInitialize (3,095,632,095 samples, 0.38%)</title><rect x="50.2" y="677" width="4.4" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="53.19" y="687.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (146,853,501 samples, 0.02%)</title><rect x="135.0" y="357" width="0.2" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="138.03" y="367.5" ></text>
</g>
<g >
<title>ServerLoop (128,045,675,684 samples, 15.54%)</title><rect x="50.2" y="725" width="183.3" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="53.19" y="735.5" >ServerLoop</text>
</g>
<g >
<title>vacuum_rel (3,086,806,387 samples, 0.37%)</title><rect x="50.2" y="517" width="4.4" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="53.20" y="527.5" ></text>
</g>
<g >
<title>__pread_nocancel (140,093,762 samples, 0.02%)</title><rect x="51.7" y="325" width="0.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="54.69" y="335.5" ></text>
</g>
<g >
<title>get_hash_value (380,506,797 samples, 0.05%)</title><rect x="238.0" y="437" width="0.6" height="15.0" fill="rgb(211,27,6)" rx="2" ry="2" />
<text x="241.02" y="447.5" ></text>
</g>
<g >
<title>__rwsem_mark_wake (137,865,275 samples, 0.02%)</title><rect x="51.1" y="53" width="0.2" height="15.0" fill="rgb(206,8,1)" rx="2" ry="2" />
<text x="54.10" y="63.5" ></text>
</g>
<g >
<title>clear_page_c_e (11,909,543,008 samples, 1.44%)</title><rect x="393.8" y="197" width="17.0" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="396.79" y="207.5" ></text>
</g>
<g >
<title>heap_page_is_all_visible (51,361,490,837 samples, 6.23%)</title><rect x="636.0" y="533" width="73.6" height="15.0" fill="rgb(228,107,25)" rx="2" ry="2" />
<text x="639.03" y="543.5" >heap_pag..</text>
</g>
<g >
<title>ItemPointerIsValid (235,071,207 samples, 0.03%)</title><rect x="230.2" y="485" width="0.3" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="233.17" y="495.5" ></text>
</g>
<g >
<title>dequeue_entity (154,556,759 samples, 0.02%)</title><rect x="130.8" y="197" width="0.2" height="15.0" fill="rgb(233,130,31)" rx="2" ry="2" />
<text x="133.76" y="207.5" ></text>
</g>
<g >
<title>do_futex (730,129,569 samples, 0.09%)</title><rect x="48.4" y="709" width="1.0" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="51.40" y="719.5" ></text>
</g>
<g >
<title>radix_tree_descend (88,773,751 samples, 0.01%)</title><rect x="256.8" y="229" width="0.1" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="259.80" y="239.5" ></text>
</g>
<g >
<title>tick_nohz_idle_exit (1,244,621,438 samples, 0.15%)</title><rect x="1186.8" y="725" width="1.8" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="1189.77" y="735.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (1,521,504,898 samples, 0.18%)</title><rect x="1152.2" y="549" width="2.2" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="1155.22" y="559.5" ></text>
</g>
<g >
<title>tick_sched_handle (105,887,461 samples, 0.01%)</title><rect x="203.0" y="357" width="0.2" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="206.01" y="367.5" ></text>
</g>
<g >
<title>LWLockAcquire (174,718,745 samples, 0.02%)</title><rect x="135.0" y="373" width="0.3" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="138.00" y="383.5" ></text>
</g>
<g >
<title>StrategyGetBuffer (8,227,865,316 samples, 1.00%)</title><rect x="61.7" y="421" width="11.7" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="64.66" y="431.5" ></text>
</g>
<g >
<title>BlockIdSet (793,078,598 samples, 0.10%)</title><rect x="758.2" y="517" width="1.1" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="761.16" y="527.5" ></text>
</g>
<g >
<title>tick_sched_handle (74,180,775 samples, 0.01%)</title><rect x="621.3" y="373" width="0.1" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="624.34" y="383.5" ></text>
</g>
<g >
<title>up_read (79,012,962 samples, 0.01%)</title><rect x="585.5" y="261" width="0.2" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="588.55" y="271.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (99,586,058 samples, 0.01%)</title><rect x="39.8" y="581" width="0.1" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="42.79" y="591.5" ></text>
</g>
<g >
<title>xfs_file_buffered_aio_write (300,277,888 samples, 0.04%)</title><rect x="10.4" y="565" width="0.5" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="13.43" y="575.5" ></text>
</g>
<g >
<title>PinBufferForBlock (5,817,463,213 samples, 0.71%)</title><rect x="612.8" y="421" width="8.3" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="615.82" y="431.5" ></text>
</g>
<g >
<title>_cond_resched (76,797,490 samples, 0.01%)</title><rect x="343.4" y="325" width="0.1" height="15.0" fill="rgb(231,121,29)" rx="2" ry="2" />
<text x="346.40" y="335.5" ></text>
</g>
<g >
<title>page_fault (177,844,528 samples, 0.02%)</title><rect x="74.1" y="405" width="0.3" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="77.12" y="415.5" ></text>
</g>
<g >
<title>PageGetItemId (2,397,128,644 samples, 0.29%)</title><rect x="788.6" y="501" width="3.5" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="791.63" y="511.5" ></text>
</g>
<g >
<title>unlock_page (100,983,691 samples, 0.01%)</title><rect x="49.3" y="645" width="0.1" height="15.0" fill="rgb(220,69,16)" rx="2" ry="2" />
<text x="52.29" y="655.5" ></text>
</g>
<g >
<title>tick_sched_handle (144,074,202 samples, 0.02%)</title><rect x="828.8" y="389" width="0.3" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="831.85" y="399.5" ></text>
</g>
<g >
<title>PageGetItemId (141,210,959 samples, 0.02%)</title><rect x="29.5" y="613" width="0.2" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="32.53" y="623.5" ></text>
</g>
<g >
<title>find_busiest_group (73,446,886 samples, 0.01%)</title><rect x="1189.8" y="485" width="0.1" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="1192.76" y="495.5" ></text>
</g>
<g >
<title>tick_sched_timer (279,325,368 samples, 0.03%)</title><rect x="445.1" y="101" width="0.4" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="448.13" y="111.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (364,696,395 samples, 0.04%)</title><rect x="1102.1" y="469" width="0.5" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1105.06" y="479.5" ></text>
</g>
<g >
<title>free_pages_and_swap_cache (1,475,718,560 samples, 0.18%)</title><rect x="1158.6" y="613" width="2.1" height="15.0" fill="rgb(222,82,19)" rx="2" ry="2" />
<text x="1161.63" y="623.5" ></text>
</g>
<g >
<title>iomap_write_end (374,091,118 samples, 0.05%)</title><rect x="14.0" y="293" width="0.5" height="15.0" fill="rgb(242,171,41)" rx="2" ry="2" />
<text x="17.00" y="303.5" ></text>
</g>
<g >
<title>do_read_fault.isra.63 (1,097,689,706 samples, 0.13%)</title><rect x="255.5" y="357" width="1.6" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="258.51" y="367.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32 (213,806,923 samples, 0.03%)</title><rect x="329.7" y="469" width="0.3" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="332.73" y="479.5" ></text>
</g>
<g >
<title>PinBuffer (132,780,443 samples, 0.02%)</title><rect x="135.3" y="373" width="0.2" height="15.0" fill="rgb(219,64,15)" rx="2" ry="2" />
<text x="138.34" y="383.5" ></text>
</g>
<g >
<title>BufferIsValid (199,830,704 samples, 0.02%)</title><rect x="227.9" y="421" width="0.3" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="230.89" y="431.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (94,510,130 samples, 0.01%)</title><rect x="258.1" y="421" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="261.07" y="431.5" ></text>
</g>
<g >
<title>LWLockQueueSelf (69,869,533 samples, 0.01%)</title><rect x="320.4" y="437" width="0.1" height="15.0" fill="rgb(236,146,35)" rx="2" ry="2" />
<text x="323.36" y="447.5" ></text>
</g>
<g >
<title>__mem_cgroup_commit_charge (3,671,260,686 samples, 0.45%)</title><rect x="446.3" y="149" width="5.3" height="15.0" fill="rgb(212,32,7)" rx="2" ry="2" />
<text x="449.33" y="159.5" ></text>
</g>
<g >
<title>PinBufferForBlock (986,844,257 samples, 0.12%)</title><rect x="50.2" y="357" width="1.4" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="53.22" y="367.5" ></text>
</g>
<g >
<title>spin_delay (136,825,160 samples, 0.02%)</title><rect x="72.0" y="373" width="0.2" height="15.0" fill="rgb(240,162,38)" rx="2" ry="2" />
<text x="75.00" y="383.5" ></text>
</g>
<g >
<title>WaitReadBuffers (1,367,408,349 samples, 0.17%)</title><rect x="20.5" y="613" width="1.9" height="15.0" fill="rgb(210,26,6)" rx="2" ry="2" />
<text x="23.48" y="623.5" ></text>
</g>
<g >
<title>do_writepages (115,471,936 samples, 0.01%)</title><rect x="10.1" y="629" width="0.2" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="13.10" y="639.5" ></text>
</g>
<g >
<title>sys_write (314,355,073 samples, 0.04%)</title><rect x="10.4" y="629" width="0.5" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="13.42" y="639.5" ></text>
</g>
<g >
<title>rcu_idle_enter (103,348,351 samples, 0.01%)</title><rect x="1178.8" y="725" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="1181.83" y="735.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_add_u32_impl (778,073,501 samples, 0.09%)</title><rect x="275.2" y="405" width="1.1" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="278.18" y="415.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (6,968,404,220 samples, 0.85%)</title><rect x="220.2" y="485" width="10.0" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="223.19" y="495.5" ></text>
</g>
<g >
<title>pgstat_count_io_op_time (87,632,648 samples, 0.01%)</title><rect x="76.6" y="485" width="0.1" height="15.0" fill="rgb(209,19,4)" rx="2" ry="2" />
<text x="79.57" y="495.5" ></text>
</g>
<g >
<title>FileReadV (38,665,271,064 samples, 4.69%)</title><rect x="76.9" y="453" width="55.3" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="79.87" y="463.5" >FileR..</text>
</g>
<g >
<title>get_hash_entry (182,309,876 samples, 0.02%)</title><rect x="11.2" y="517" width="0.3" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="14.24" y="527.5" ></text>
</g>
<g >
<title>pgstat_count_io_op_time (471,510,682 samples, 0.06%)</title><rect x="332.1" y="501" width="0.6" height="15.0" fill="rgb(209,19,4)" rx="2" ry="2" />
<text x="335.06" y="511.5" ></text>
</g>
<g >
<title>smgropen (85,656,911 samples, 0.01%)</title><rect x="12.1" y="517" width="0.2" height="15.0" fill="rgb(211,28,6)" rx="2" ry="2" />
<text x="15.13" y="527.5" ></text>
</g>
<g >
<title>MarkBufferDirtyHint (539,831,467 samples, 0.07%)</title><rect x="41.7" y="565" width="0.8" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="44.69" y="575.5" ></text>
</g>
<g >
<title>__audit_syscall_exit (144,066,144 samples, 0.02%)</title><rect x="77.6" y="405" width="0.2" height="15.0" fill="rgb(218,62,14)" rx="2" ry="2" />
<text x="80.60" y="415.5" ></text>
</g>
<g >
<title>__schedule (396,109,392 samples, 0.05%)</title><rect x="310.8" y="293" width="0.6" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="313.79" y="303.5" ></text>
</g>
<g >
<title>system_call_fastpath (704,492,392 samples, 0.09%)</title><rect x="310.5" y="373" width="1.0" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="313.46" y="383.5" ></text>
</g>
<g >
<title>smgrwrite (5,080,836,104 samples, 0.62%)</title><rect x="12.3" y="517" width="7.2" height="15.0" fill="rgb(229,112,26)" rx="2" ry="2" />
<text x="15.25" y="527.5" ></text>
</g>
<g >
<title>BufTableHashCode (393,071,281 samples, 0.05%)</title><rect x="238.0" y="453" width="0.6" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="241.01" y="463.5" ></text>
</g>
<g >
<title>sys_futex (73,620,781 samples, 0.01%)</title><rect x="232.6" y="421" width="0.1" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="235.59" y="431.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (93,560,043 samples, 0.01%)</title><rect x="134.8" y="357" width="0.1" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="137.80" y="367.5" ></text>
</g>
<g >
<title>dequeue_task_fair (529,132,857 samples, 0.06%)</title><rect x="593.0" y="229" width="0.7" height="15.0" fill="rgb(230,119,28)" rx="2" ry="2" />
<text x="595.97" y="239.5" ></text>
</g>
<g >
<title>deactivate_task (699,485,140 samples, 0.08%)</title><rect x="592.8" y="245" width="1.0" height="15.0" fill="rgb(206,8,2)" rx="2" ry="2" />
<text x="595.82" y="255.5" ></text>
</g>
<g >
<title>activate_task (862,742,985 samples, 0.10%)</title><rect x="1179.5" y="693" width="1.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1182.48" y="703.5" ></text>
</g>
<g >
<title>handle_mm_fault (1,155,822,417 samples, 0.14%)</title><rect x="327.4" y="405" width="1.7" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="330.40" y="415.5" ></text>
</g>
<g >
<title>ItemPointerIsValid (328,053,928 samples, 0.04%)</title><rect x="145.9" y="485" width="0.5" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="148.93" y="495.5" ></text>
</g>
<g >
<title>MarkBufferDirty (536,993,730 samples, 0.07%)</title><rect x="1142.9" y="517" width="0.8" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="1145.91" y="527.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (89,443,719 samples, 0.01%)</title><rect x="43.4" y="693" width="0.1" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="46.36" y="703.5" ></text>
</g>
<g >
<title>ttwu_do_activate (1,314,857,365 samples, 0.16%)</title><rect x="1179.4" y="709" width="1.9" height="15.0" fill="rgb(215,48,11)" rx="2" ry="2" />
<text x="1182.44" y="719.5" ></text>
</g>
<g >
<title>try_to_wake_up (660,719,524 samples, 0.08%)</title><rect x="1165.7" y="581" width="1.0" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="1168.71" y="591.5" ></text>
</g>
<g >
<title>StartReadBuffer (6,579,262,292 samples, 0.80%)</title><rect x="611.8" y="453" width="9.5" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="614.83" y="463.5" ></text>
</g>
<g >
<title>do_page_fault (1,157,242,416 samples, 0.14%)</title><rect x="59.2" y="389" width="1.7" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="62.23" y="399.5" ></text>
</g>
<g >
<title>start_kernel (379,723,816 samples, 0.05%)</title><rect x="1189.5" y="725" width="0.5" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="1192.46" y="735.5" ></text>
</g>
<g >
<title>page_fault (149,153,652,515 samples, 18.10%)</title><rect x="372.3" y="309" width="213.5" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="375.31" y="319.5" >page_fault</text>
</g>
<g >
<title>generic_segment_checks (75,732,677 samples, 0.01%)</title><rect x="587.0" y="341" width="0.1" height="15.0" fill="rgb(236,144,34)" rx="2" ry="2" />
<text x="590.01" y="351.5" ></text>
</g>
<g >
<title>GetBufferDescriptor (334,944,306 samples, 0.04%)</title><rect x="1093.2" y="453" width="0.5" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="1096.23" y="463.5" ></text>
</g>
<g >
<title>StrategyGetBuffer (41,781,460,625 samples, 5.07%)</title><rect x="258.5" y="437" width="59.8" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="261.48" y="447.5" >Strate..</text>
</g>
<g >
<title>wake_up_q (372,517,381 samples, 0.05%)</title><rect x="594.6" y="277" width="0.5" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="597.61" y="287.5" ></text>
</g>
<g >
<title>sem_post@@GLIBC_2.2.5 (109,531,290 samples, 0.01%)</title><rect x="73.9" y="405" width="0.2" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="76.93" y="415.5" ></text>
</g>
<g >
<title>quiet_vmstat (479,485,129 samples, 0.06%)</title><rect x="1188.7" y="741" width="0.6" height="15.0" fill="rgb(249,204,48)" rx="2" ry="2" />
<text x="1191.66" y="751.5" ></text>
</g>
<g >
<title>shared_ts_memory_usage (326,223,408 samples, 0.04%)</title><rect x="132.4" y="517" width="0.5" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="135.41" y="527.5" ></text>
</g>
<g >
<title>xfs_file_aio_read (123,055,875 samples, 0.01%)</title><rect x="51.7" y="245" width="0.2" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="54.71" y="255.5" ></text>
</g>
<g >
<title>page_fault (1,241,456,530 samples, 0.15%)</title><rect x="327.3" y="453" width="1.8" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="330.31" y="463.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (116,623,553,586 samples, 14.15%)</title><rect x="410.9" y="197" width="167.0" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="413.95" y="207.5" >shmem_getpage_gfp</text>
</g>
<g >
<title>pg_atomic_read_u32_impl (73,640,039 samples, 0.01%)</title><rect x="20.2" y="501" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="23.25" y="511.5" ></text>
</g>
<g >
<title>postgres (803,306,357,975 samples, 97.46%)</title><rect x="11.0" y="789" width="1150.0" height="15.0" fill="rgb(233,131,31)" rx="2" ry="2" />
<text x="13.98" y="799.5" >postgres</text>
</g>
<g >
<title>vm_readbuf (95,772,990 samples, 0.01%)</title><rect x="1145.3" y="533" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1148.27" y="543.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (457,486,155 samples, 0.06%)</title><rect x="60.0" y="245" width="0.7" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="63.05" y="255.5" ></text>
</g>
<g >
<title>TidStoreMemoryUsage (341,841,353 samples, 0.04%)</title><rect x="132.4" y="533" width="0.5" height="15.0" fill="rgb(229,113,27)" rx="2" ry="2" />
<text x="135.39" y="543.5" ></text>
</g>
<g >
<title>LWLockAcquire (83,031,264 samples, 0.01%)</title><rect x="20.2" y="549" width="0.2" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="23.23" y="559.5" ></text>
</g>
<g >
<title>__inc_zone_state (88,589,607 samples, 0.01%)</title><rect x="126.2" y="101" width="0.1" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="129.18" y="111.5" ></text>
</g>
<g >
<title>ReadBufferExtended (254,631,147,709 samples, 30.89%)</title><rect x="236.8" y="549" width="364.5" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="239.80" y="559.5" >ReadBufferExtended</text>
</g>
<g >
<title>pick_next_task_rt (76,627,109 samples, 0.01%)</title><rect x="1182.7" y="693" width="0.1" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1185.71" y="703.5" ></text>
</g>
<g >
<title>heap_page_prune_and_freeze (55,344,640,583 samples, 6.71%)</title><rect x="151.9" y="517" width="79.2" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="154.87" y="527.5" >heap_page..</text>
</g>
<g >
<title>tick_sched_handle (100,961,491 samples, 0.01%)</title><rect x="1034.0" y="373" width="0.1" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="1036.96" y="383.5" ></text>
</g>
<g >
<title>TransactionIdGetCommitLSN (365,727,757 samples, 0.04%)</title><rect x="1123.1" y="485" width="0.5" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="1126.08" y="495.5" ></text>
</g>
<g >
<title>futex_wait_queue_me (110,500,112 samples, 0.01%)</title><rect x="48.6" y="677" width="0.1" height="15.0" fill="rgb(254,228,54)" rx="2" ry="2" />
<text x="51.57" y="687.5" ></text>
</g>
<g >
<title>BackgroundWorkerMain (637,248,213,532 samples, 77.32%)</title><rect x="233.7" y="661" width="912.3" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="236.66" y="671.5" >BackgroundWorkerMain</text>
</g>
<g >
<title>BufTableInsert (1,361,390,705 samples, 0.17%)</title><rect x="56.3" y="437" width="2.0" height="15.0" fill="rgb(206,8,1)" rx="2" ry="2" />
<text x="59.34" y="447.5" ></text>
</g>
<g >
<title>get_hash_value (178,509,464 samples, 0.02%)</title><rect x="56.1" y="421" width="0.2" height="15.0" fill="rgb(211,27,6)" rx="2" ry="2" />
<text x="59.09" y="431.5" ></text>
</g>
<g >
<title>rwsem_down_read_failed (3,012,432,972 samples, 0.37%)</title><rect x="590.8" y="293" width="4.3" height="15.0" fill="rgb(254,225,54)" rx="2" ry="2" />
<text x="593.83" y="303.5" ></text>
</g>
<g >
<title>schedule (1,619,490,228 samples, 0.20%)</title><rect x="592.3" y="277" width="2.3" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="595.29" y="287.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (72,545,272,590 samples, 8.80%)</title><rect x="465.7" y="133" width="103.9" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="468.73" y="143.5" >native_queue..</text>
</g>
<g >
<title>heap_prune_satisfies_vacuum (45,735,515,921 samples, 5.55%)</title><rect x="1064.3" y="517" width="65.4" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="1067.26" y="527.5" >heap_pr..</text>
</g>
<g >
<title>sysret_audit (229,738,891 samples, 0.03%)</title><rect x="77.5" y="421" width="0.3" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="80.50" y="431.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (71,568,769 samples, 0.01%)</title><rect x="625.4" y="501" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="628.42" y="511.5" ></text>
</g>
<g >
<title>TerminateBufferIO (829,824,700 samples, 0.10%)</title><rect x="75.0" y="485" width="1.1" height="15.0" fill="rgb(239,160,38)" rx="2" ry="2" />
<text x="77.96" y="495.5" ></text>
</g>
<g >
<title>__xfs_trans_commit (112,688,247 samples, 0.01%)</title><rect x="586.7" y="245" width="0.1" height="15.0" fill="rgb(223,85,20)" rx="2" ry="2" />
<text x="589.66" y="255.5" ></text>
</g>
<g >
<title>LockBufHdr (107,984,003 samples, 0.01%)</title><rect x="236.4" y="533" width="0.1" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="239.38" y="543.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (491,740,979 samples, 0.06%)</title><rect x="151.1" y="501" width="0.7" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="154.15" y="511.5" ></text>
</g>
<g >
<title>wake_up_q (128,474,798 samples, 0.02%)</title><rect x="51.3" y="53" width="0.2" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="54.30" y="63.5" ></text>
</g>
<g >
<title>_raw_spin_lock_irqsave (1,537,044,185 samples, 0.19%)</title><rect x="1152.2" y="581" width="2.2" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="1155.20" y="591.5" ></text>
</g>
<g >
<title>GetBufferDescriptor (290,238,862 samples, 0.04%)</title><rect x="1111.4" y="453" width="0.4" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="1114.41" y="463.5" ></text>
</g>
<g >
<title>get_page_from_freelist (829,319,855 samples, 0.10%)</title><rect x="125.9" y="117" width="1.2" height="15.0" fill="rgb(252,218,52)" rx="2" ry="2" />
<text x="128.91" y="127.5" ></text>
</g>
<g >
<title>BufferIsPermanent (297,412,599 samples, 0.04%)</title><rect x="1083.9" y="485" width="0.4" height="15.0" fill="rgb(250,210,50)" rx="2" ry="2" />
<text x="1086.91" y="495.5" ></text>
</g>
<g >
<title>do_sync_read (96,959,467 samples, 0.01%)</title><rect x="340.7" y="405" width="0.2" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="343.73" y="415.5" ></text>
</g>
<g >
<title>hash_initial_lookup (122,832,498 samples, 0.01%)</title><rect x="58.1" y="405" width="0.1" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="61.07" y="415.5" ></text>
</g>
<g >
<title>activate_task (298,546,138 samples, 0.04%)</title><rect x="1166.1" y="549" width="0.4" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1169.12" y="559.5" ></text>
</g>
<g >
<title>enqueue_task_fair (212,481,452 samples, 0.03%)</title><rect x="1166.2" y="533" width="0.3" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="1169.22" y="543.5" ></text>
</g>
<g >
<title>StrategyGetBuffer (400,166,752 samples, 0.05%)</title><rect x="47.0" y="741" width="0.5" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="49.95" y="751.5" ></text>
</g>
<g >
<title>pgstat_progress_update_param (94,577,348 samples, 0.01%)</title><rect x="232.9" y="533" width="0.1" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="235.88" y="543.5" ></text>
</g>
<g >
<title>hrtimer_nanosleep (609,658,589 samples, 0.07%)</title><rect x="310.6" y="341" width="0.9" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="313.59" y="351.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (328,016,233 samples, 0.04%)</title><rect x="235.3" y="501" width="0.5" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="238.31" y="511.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (134,744,497 samples, 0.02%)</title><rect x="853.8" y="421" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="856.81" y="431.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (127,765,126 samples, 0.02%)</title><rect x="42.9" y="613" width="0.2" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="45.91" y="623.5" ></text>
</g>
<g >
<title>__find_get_page (1,512,560,062 samples, 0.18%)</title><rect x="269.2" y="277" width="2.2" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="272.23" y="287.5" ></text>
</g>
<g >
<title>__pagevec_lru_add_fn (549,387,897 samples, 0.07%)</title><rect x="439.4" y="149" width="0.7" height="15.0" fill="rgb(244,183,43)" rx="2" ry="2" />
<text x="442.36" y="159.5" ></text>
</g>
<g >
<title>xfs_trans_commit (404,505,494 samples, 0.05%)</title><rect x="15.1" y="277" width="0.5" height="15.0" fill="rgb(250,210,50)" rx="2" ry="2" />
<text x="18.06" y="287.5" ></text>
</g>
<g >
<title>FileReadV (1,107,321,897 samples, 0.13%)</title><rect x="20.8" y="565" width="1.6" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="23.84" y="575.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (867,878,838 samples, 0.11%)</title><rect x="1120.8" y="469" width="1.3" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="1123.85" y="479.5" ></text>
</g>
<g >
<title>arch_cpu_idle (321,864,645 samples, 0.04%)</title><rect x="1189.5" y="677" width="0.4" height="15.0" fill="rgb(218,62,14)" rx="2" ry="2" />
<text x="1192.46" y="687.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (135,504,497 samples, 0.02%)</title><rect x="64.1" y="245" width="0.2" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="67.12" y="255.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (1,016,725,410 samples, 0.12%)</title><rect x="255.5" y="341" width="1.5" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="258.54" y="351.5" ></text>
</g>
<g >
<title>tick_sched_timer (141,961,854 samples, 0.02%)</title><rect x="709.2" y="437" width="0.2" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="712.24" y="447.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (107,024,501 samples, 0.01%)</title><rect x="130.2" y="245" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="133.20" y="255.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (71,259,274 samples, 0.01%)</title><rect x="1143.0" y="485" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1146.04" y="495.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (136,932,215 samples, 0.02%)</title><rect x="1125.3" y="485" width="0.2" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="1128.32" y="495.5" ></text>
</g>
<g >
<title>__pread_nocancel (989,449,507 samples, 0.12%)</title><rect x="21.0" y="549" width="1.4" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="24.01" y="559.5" ></text>
</g>
<g >
<title>scheduler_tick (207,063,124 samples, 0.03%)</title><rect x="445.2" y="53" width="0.3" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="448.23" y="63.5" ></text>
</g>
<g >
<title>iomap_apply (1,349,795,106 samples, 0.16%)</title><rect x="12.7" y="325" width="2.0" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="15.74" y="335.5" ></text>
</g>
<g >
<title>PageGetItemId (1,217,995,004 samples, 0.15%)</title><rect x="632.9" y="533" width="1.7" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="635.85" y="543.5" ></text>
</g>
<g >
<title>__do_page_fault (1,204,569,436 samples, 0.15%)</title><rect x="327.3" y="421" width="1.8" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="330.33" y="431.5" ></text>
</g>
<g >
<title>BufferIsValid (612,407,810 samples, 0.07%)</title><rect x="1094.4" y="437" width="0.9" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1097.42" y="447.5" ></text>
</g>
<g >
<title>LockBufHdr (2,046,264,994 samples, 0.25%)</title><rect x="263.7" y="421" width="2.9" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="266.66" y="431.5" ></text>
</g>
<g >
<title>ItemPointerIsValid (1,938,716,806 samples, 0.24%)</title><rect x="1084.3" y="485" width="2.8" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1087.33" y="495.5" ></text>
</g>
<g >
<title>__set_page_dirty_no_writeback (139,953,307 samples, 0.02%)</title><rect x="577.9" y="229" width="0.2" height="15.0" fill="rgb(223,86,20)" rx="2" ry="2" />
<text x="580.93" y="239.5" ></text>
</g>
<g >
<title>mdreadv (1,124,397,244 samples, 0.14%)</title><rect x="20.8" y="581" width="1.6" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="23.83" y="591.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (105,887,461 samples, 0.01%)</title><rect x="203.0" y="437" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="206.01" y="447.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (281,396,594 samples, 0.03%)</title><rect x="11.2" y="533" width="0.4" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="14.20" y="543.5" ></text>
</g>
<g >
<title>StrategyGetBuffer (308,807,514 samples, 0.04%)</title><rect x="43.5" y="677" width="0.4" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="46.49" y="687.5" ></text>
</g>
<g >
<title>do_page_fault (30,574,677,911 samples, 3.71%)</title><rect x="85.4" y="277" width="43.8" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="88.39" y="287.5" >do_p..</text>
</g>
<g >
<title>tick_nohz_restart (519,686,314 samples, 0.06%)</title><rect x="1187.4" y="709" width="0.7" height="15.0" fill="rgb(246,191,45)" rx="2" ry="2" />
<text x="1190.41" y="719.5" ></text>
</g>
<g >
<title>__alloc_pages_nodemask (1,339,431,443 samples, 0.16%)</title><rect x="386.5" y="197" width="1.9" height="15.0" fill="rgb(228,108,25)" rx="2" ry="2" />
<text x="389.50" y="207.5" ></text>
</g>
<g >
<title>do_page_fault (1,241,456,530 samples, 0.15%)</title><rect x="327.3" y="437" width="1.8" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="330.31" y="447.5" ></text>
</g>
<g >
<title>get_hash_value (290,571,195 samples, 0.04%)</title><rect x="134.3" y="357" width="0.4" height="15.0" fill="rgb(211,27,6)" rx="2" ry="2" />
<text x="137.28" y="367.5" ></text>
</g>
<g >
<title>dequeue_entity (218,929,197 samples, 0.03%)</title><rect x="16.6" y="213" width="0.3" height="15.0" fill="rgb(233,130,31)" rx="2" ry="2" />
<text x="19.59" y="223.5" ></text>
</g>
<g >
<title>balance_dirty_pages_ratelimited (437,419,495 samples, 0.05%)</title><rect x="389.2" y="245" width="0.6" height="15.0" fill="rgb(212,34,8)" rx="2" ry="2" />
<text x="392.19" y="255.5" ></text>
</g>
<g >
<title>ItemPointerSet (994,427,062 samples, 0.12%)</title><rect x="161.6" y="501" width="1.5" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="164.64" y="511.5" ></text>
</g>
<g >
<title>call_rwsem_down_read_failed (897,090,581 samples, 0.11%)</title><rect x="130.0" y="293" width="1.3" height="15.0" fill="rgb(228,110,26)" rx="2" ry="2" />
<text x="133.04" y="303.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (765,542,700 samples, 0.09%)</title><rect x="276.4" y="421" width="1.1" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="279.42" y="431.5" ></text>
</g>
<g >
<title>perf (495,522,613 samples, 0.06%)</title><rect x="10.3" y="789" width="0.7" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="13.27" y="799.5" ></text>
</g>
<g >
<title>UnlockBufHdr (89,022,688 samples, 0.01%)</title><rect x="1118.8" y="453" width="0.1" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="1121.81" y="463.5" ></text>
</g>
<g >
<title>system_call_fastpath (109,695,662 samples, 0.01%)</title><rect x="604.3" y="453" width="0.2" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="607.35" y="463.5" ></text>
</g>
<g >
<title>__schedule (1,476,060,735 samples, 0.18%)</title><rect x="592.4" y="261" width="2.1" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="595.41" y="271.5" ></text>
</g>
<g >
<title>dequeue_task_fair (337,071,229 samples, 0.04%)</title><rect x="16.4" y="229" width="0.5" height="15.0" fill="rgb(230,119,28)" rx="2" ry="2" />
<text x="19.44" y="239.5" ></text>
</g>
<g >
<title>BufTableLookup (693,381,643 samples, 0.08%)</title><rect x="615.4" y="389" width="1.0" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="618.39" y="399.5" ></text>
</g>
<g >
<title>__mem_cgroup_count_vm_event (82,486,615 samples, 0.01%)</title><rect x="268.2" y="357" width="0.2" height="15.0" fill="rgb(217,56,13)" rx="2" ry="2" />
<text x="271.23" y="367.5" ></text>
</g>
<g >
<title>page_fault (191,062,306 samples, 0.02%)</title><rect x="49.7" y="709" width="0.3" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="52.70" y="719.5" ></text>
</g>
<g >
<title>__find_lock_page (237,787,344 samples, 0.03%)</title><rect x="64.0" y="277" width="0.3" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="66.98" y="287.5" ></text>
</g>
<g >
<title>put_prev_task_idle (82,518,274 samples, 0.01%)</title><rect x="1182.5" y="677" width="0.1" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="1185.45" y="687.5" ></text>
</g>
<g >
<title>table_relation_vacuum (22,616,657,143 samples, 2.74%)</title><rect x="11.0" y="725" width="32.4" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="13.98" y="735.5" >ta..</text>
</g>
<g >
<title>__hrtimer_run_queues (137,764,498 samples, 0.02%)</title><rect x="1033.9" y="405" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="1036.92" y="415.5" ></text>
</g>
<g >
<title>do_sync_read (813,584,497 samples, 0.10%)</title><rect x="21.2" y="485" width="1.2" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="24.20" y="495.5" ></text>
</g>
<g >
<title>select_task_rq_fair (138,619,233 samples, 0.02%)</title><rect x="18.9" y="245" width="0.2" height="15.0" fill="rgb(211,29,7)" rx="2" ry="2" />
<text x="21.86" y="255.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (91,262,100 samples, 0.01%)</title><rect x="1144.8" y="517" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1147.80" y="527.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (137,764,122 samples, 0.02%)</title><rect x="990.7" y="469" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="993.73" y="479.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (90,077,236 samples, 0.01%)</title><rect x="1143.5" y="485" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1146.55" y="495.5" ></text>
</g>
<g >
<title>parallel_vacuum_main (637,238,905,240 samples, 77.31%)</title><rect x="233.7" y="629" width="912.3" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="236.67" y="639.5" >parallel_vacuum_main</text>
</g>
<g >
<title>__memcmp_sse4_1 (184,818,275 samples, 0.02%)</title><rect x="615.6" y="373" width="0.3" height="15.0" fill="rgb(240,162,38)" rx="2" ry="2" />
<text x="618.64" y="383.5" ></text>
</g>
<g >
<title>tick_sched_timer (70,310,500 samples, 0.01%)</title><rect x="96.9" y="85" width="0.1" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="99.89" y="95.5" ></text>
</g>
<g >
<title>TransactionIdFollows (75,328,725 samples, 0.01%)</title><rect x="37.1" y="581" width="0.1" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="40.10" y="591.5" ></text>
</g>
<g >
<title>BufferAlloc (135,121,430 samples, 0.02%)</title><rect x="22.8" y="517" width="0.2" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="25.80" y="527.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (102,529,432 samples, 0.01%)</title><rect x="1125.8" y="469" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="1128.79" y="479.5" ></text>
</g>
<g >
<title>postmaster_child_launch (3,096,750,624 samples, 0.38%)</title><rect x="50.2" y="693" width="4.4" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="53.19" y="703.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (177,305,943 samples, 0.02%)</title><rect x="709.2" y="469" width="0.3" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="712.23" y="479.5" ></text>
</g>
<g >
<title>BufferIsValid (92,358,521 samples, 0.01%)</title><rect x="630.7" y="517" width="0.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="633.71" y="527.5" ></text>
</g>
<g >
<title>BufTableHashCode (179,304,943 samples, 0.02%)</title><rect x="56.1" y="437" width="0.2" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="59.09" y="447.5" ></text>
</g>
<g >
<title>update_process_times (263,322,681 samples, 0.03%)</title><rect x="445.1" y="69" width="0.4" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="448.15" y="79.5" ></text>
</g>
<g >
<title>BufferIsValid (171,181,149 samples, 0.02%)</title><rect x="225.7" y="453" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="228.67" y="463.5" ></text>
</g>
<g >
<title>do_generic_file_read.constprop.52 (35,083,865,799 samples, 4.26%)</title><rect x="79.2" y="309" width="50.2" height="15.0" fill="rgb(205,4,1)" rx="2" ry="2" />
<text x="82.16" y="319.5" >do_ge..</text>
</g>
<g >
<title>local_apic_timer_interrupt (385,779,583 samples, 0.05%)</title><rect x="445.1" y="149" width="0.5" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="448.05" y="159.5" ></text>
</g>
<g >
<title>PageGetItem (491,686,765 samples, 0.06%)</title><rect x="148.4" y="501" width="0.7" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="151.41" y="511.5" ></text>
</g>
<g >
<title>xfs_file_buffered_aio_read (36,944,327,015 samples, 4.48%)</title><rect x="78.9" y="341" width="52.9" height="15.0" fill="rgb(217,55,13)" rx="2" ry="2" />
<text x="81.93" y="351.5" >xfs_f..</text>
</g>
<g >
<title>UnlockBufHdr (102,139,725 samples, 0.01%)</title><rect x="330.5" y="485" width="0.2" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="333.55" y="495.5" ></text>
</g>
<g >
<title>get_futex_key (101,293,331 samples, 0.01%)</title><rect x="1142.1" y="389" width="0.2" height="15.0" fill="rgb(252,216,51)" rx="2" ry="2" />
<text x="1145.14" y="399.5" ></text>
</g>
<g >
<title>nohz_balance_exit_idle.part.59 (74,180,775 samples, 0.01%)</title><rect x="621.3" y="309" width="0.1" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="624.34" y="319.5" ></text>
</g>
<g >
<title>heap_page_prune_and_freeze (11,253,277,969 samples, 1.37%)</title><rect x="26.7" y="629" width="16.1" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="29.68" y="639.5" ></text>
</g>
<g >
<title>tick_sched_timer (149,652,535 samples, 0.02%)</title><rect x="828.8" y="405" width="0.3" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="831.84" y="415.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (102,529,432 samples, 0.01%)</title><rect x="1125.8" y="485" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1128.79" y="495.5" ></text>
</g>
<g >
<title>radix_tree_descend (1,477,690,372 samples, 0.18%)</title><rect x="460.5" y="149" width="2.1" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="463.45" y="159.5" ></text>
</g>
<g >
<title>update_vacuum_error_info (86,440,702 samples, 0.01%)</title><rect x="1145.7" y="565" width="0.1" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="1148.71" y="575.5" ></text>
</g>
<g >
<title>TransactionLogFetch (274,508,106 samples, 0.03%)</title><rect x="1122.7" y="469" width="0.4" height="15.0" fill="rgb(244,180,43)" rx="2" ry="2" />
<text x="1125.68" y="479.5" ></text>
</g>
<g >
<title>pick_next_task_fair (161,094,855 samples, 0.02%)</title><rect x="594.2" y="245" width="0.2" height="15.0" fill="rgb(242,170,40)" rx="2" ry="2" />
<text x="597.17" y="255.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (1,074,397,054 samples, 0.13%)</title><rect x="760.4" y="517" width="1.5" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="763.38" y="527.5" ></text>
</g>
<g >
<title>BackgroundWorkerMain (124,816,047,342 samples, 15.14%)</title><rect x="54.6" y="645" width="178.7" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="57.63" y="655.5" >BackgroundWorkerMain</text>
</g>
<g >
<title>shmem_fault (993,215,265 samples, 0.12%)</title><rect x="255.6" y="325" width="1.4" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="258.56" y="335.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (170,406,319 samples, 0.02%)</title><rect x="25.4" y="597" width="0.3" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="28.42" y="607.5" ></text>
</g>
<g >
<title>perf_event_task_tick (76,661,909 samples, 0.01%)</title><rect x="445.3" y="37" width="0.1" height="15.0" fill="rgb(205,3,0)" rx="2" ry="2" />
<text x="448.27" y="47.5" ></text>
</g>
<g >
<title>PageGetItem (3,316,241,201 samples, 0.40%)</title><rect x="692.7" y="517" width="4.8" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="695.73" y="527.5" ></text>
</g>
<g >
<title>do_futex (73,620,781 samples, 0.01%)</title><rect x="232.6" y="405" width="0.1" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="235.59" y="415.5" ></text>
</g>
<g >
<title>cpuidle_enter_state (6,320,770,283 samples, 0.77%)</title><rect x="1167.1" y="693" width="9.0" height="15.0" fill="rgb(221,73,17)" rx="2" ry="2" />
<text x="1170.08" y="703.5" ></text>
</g>
<g >
<title>do_page_fault (324,679,284 samples, 0.04%)</title><rect x="57.2" y="373" width="0.5" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="60.20" y="383.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (372,390,171 samples, 0.05%)</title><rect x="784.3" y="517" width="0.5" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="787.29" y="527.5" ></text>
</g>
<g >
<title>ttwu_do_activate (151,810,473 samples, 0.02%)</title><rect x="19.1" y="245" width="0.2" height="15.0" fill="rgb(215,48,11)" rx="2" ry="2" />
<text x="22.06" y="255.5" ></text>
</g>
<g >
<title>ItemPointerSet (299,085,291 samples, 0.04%)</title><rect x="25.8" y="613" width="0.4" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="28.80" y="623.5" ></text>
</g>
<g >
<title>pgstat_count_io_op_n (81,315,588 samples, 0.01%)</title><rect x="76.6" y="469" width="0.1" height="15.0" fill="rgb(232,128,30)" rx="2" ry="2" />
<text x="79.57" y="479.5" ></text>
</g>
<g >
<title>smgrwrite (749,558,632 samples, 0.09%)</title><rect x="50.4" y="293" width="1.1" height="15.0" fill="rgb(229,112,26)" rx="2" ry="2" />
<text x="53.45" y="303.5" ></text>
</g>
<g >
<title>tick_sched_handle (95,227,023 samples, 0.01%)</title><rect x="393.6" y="101" width="0.2" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="396.64" y="111.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (142,917,883 samples, 0.02%)</title><rect x="54.3" y="373" width="0.2" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="57.30" y="383.5" ></text>
</g>
<g >
<title>heap_prune_satisfies_vacuum (1,722,818,335 samples, 0.21%)</title><rect x="1133.2" y="533" width="2.4" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="1136.17" y="543.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuum (10,301,360,663 samples, 1.25%)</title><rect x="667.8" y="517" width="14.7" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="670.79" y="527.5" ></text>
</g>
<g >
<title>__block_commit_write.isra.21 (300,651,630 samples, 0.04%)</title><rect x="14.0" y="261" width="0.4" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="17.00" y="271.5" ></text>
</g>
<g >
<title>scheduler_tick (92,097,932 samples, 0.01%)</title><rect x="1034.0" y="341" width="0.1" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="1036.97" y="351.5" ></text>
</g>
<g >
<title>heap_vac_scan_next_block_parallel (1,790,805,887 samples, 0.22%)</title><rect x="133.2" y="517" width="2.6" height="15.0" fill="rgb(247,197,47)" rx="2" ry="2" />
<text x="136.24" y="527.5" ></text>
</g>
<g >
<title>tick_sched_timer (230,462,948 samples, 0.03%)</title><rect x="784.5" y="437" width="0.3" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="787.48" y="447.5" ></text>
</g>
<g >
<title>handle_mm_fault (402,562,211 samples, 0.05%)</title><rect x="63.8" y="357" width="0.6" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="66.78" y="367.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (119,736,523 samples, 0.01%)</title><rect x="1058.9" y="453" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1061.88" y="463.5" ></text>
</g>
<g >
<title>BlockIdSet (132,668,736 samples, 0.02%)</title><rect x="28.9" y="597" width="0.2" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="31.90" y="607.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (373,773,448 samples, 0.05%)</title><rect x="591.7" y="245" width="0.6" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="594.75" y="255.5" ></text>
</g>
<g >
<title>ReadBuffer_common (254,522,939,179 samples, 30.88%)</title><rect x="236.9" y="533" width="364.4" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="239.91" y="543.5" >ReadBuffer_common</text>
</g>
<g >
<title>__radix_tree_create (483,779,620 samples, 0.06%)</title><rect x="99.6" y="133" width="0.7" height="15.0" fill="rgb(248,201,48)" rx="2" ry="2" />
<text x="102.56" y="143.5" ></text>
</g>
<g >
<title>UnpinBufferNoOwner (162,914,461 samples, 0.02%)</title><rect x="607.3" y="485" width="0.2" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="610.27" y="495.5" ></text>
</g>
<g >
<title>PostgresMain (3,095,632,095 samples, 0.38%)</title><rect x="50.2" y="661" width="4.4" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="53.19" y="671.5" ></text>
</g>
<g >
<title>ProcessUtility (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="581" width="4.4" height="15.0" fill="rgb(231,123,29)" rx="2" ry="2" />
<text x="53.20" y="591.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (143,789,492 samples, 0.02%)</title><rect x="1033.9" y="421" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="1036.92" y="431.5" ></text>
</g>
<g >
<title>do_nanosleep (553,058,085 samples, 0.07%)</title><rect x="310.6" y="325" width="0.8" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="313.60" y="335.5" ></text>
</g>
<g >
<title>find_busiest_group (85,105,846 samples, 0.01%)</title><rect x="311.2" y="245" width="0.1" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="314.21" y="255.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuum (1,989,103,980 samples, 0.24%)</title><rect x="143.6" y="501" width="2.8" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="146.55" y="511.5" ></text>
</g>
<g >
<title>system_call_after_swapgs (92,960,302 samples, 0.01%)</title><rect x="73.8" y="389" width="0.1" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="76.80" y="399.5" ></text>
</g>
<g >
<title>call_rwsem_down_write_failed (82,091,217 samples, 0.01%)</title><rect x="586.5" y="229" width="0.1" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="589.51" y="239.5" ></text>
</g>
<g >
<title>__do_page_fault (306,316,030 samples, 0.04%)</title><rect x="75.4" y="405" width="0.4" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="78.40" y="415.5" ></text>
</g>
<g >
<title>PageGetItem (4,033,244,819 samples, 0.49%)</title><rect x="916.2" y="501" width="5.7" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="919.17" y="511.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (98,836,349 samples, 0.01%)</title><rect x="218.7" y="501" width="0.1" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="221.66" y="511.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (341,834,314 samples, 0.04%)</title><rect x="41.9" y="549" width="0.5" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="44.91" y="559.5" ></text>
</g>
<g >
<title>irq_exit (564,062,376 samples, 0.07%)</title><rect x="1164.6" y="661" width="0.8" height="15.0" fill="rgb(249,206,49)" rx="2" ry="2" />
<text x="1167.57" y="671.5" ></text>
</g>
<g >
<title>free_pgtables (105,984,661 samples, 0.01%)</title><rect x="1146.0" y="677" width="0.1" height="15.0" fill="rgb(208,17,4)" rx="2" ry="2" />
<text x="1148.99" y="687.5" ></text>
</g>
<g >
<title>sys_futex (732,904,907 samples, 0.09%)</title><rect x="48.4" y="725" width="1.0" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="51.40" y="735.5" ></text>
</g>
<g >
<title>tick_sched_timer (125,901,891 samples, 0.02%)</title><rect x="1033.9" y="389" width="0.2" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="1036.92" y="399.5" ></text>
</g>
<g >
<title>tick_nohz_stop_sched_tick (1,603,892,618 samples, 0.19%)</title><rect x="1184.3" y="693" width="2.3" height="15.0" fill="rgb(219,64,15)" rx="2" ry="2" />
<text x="1187.31" y="703.5" ></text>
</g>
<g >
<title>_raw_spin_lock_irqsave (129,160,649 samples, 0.02%)</title><rect x="1179.2" y="709" width="0.2" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="1182.23" y="719.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (888,720,123 samples, 0.11%)</title><rect x="270.1" y="261" width="1.3" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="273.12" y="271.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (172,532,482 samples, 0.02%)</title><rect x="132.5" y="469" width="0.3" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="135.52" y="479.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (1,833,301,401 samples, 0.22%)</title><rect x="268.8" y="309" width="2.6" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="271.81" y="319.5" ></text>
</g>
<g >
<title>radix_tree_descend (152,385,801 samples, 0.02%)</title><rect x="271.2" y="245" width="0.2" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="274.18" y="255.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_sub_u32_impl (104,123,405 samples, 0.01%)</title><rect x="1142.6" y="453" width="0.2" height="15.0" fill="rgb(225,94,22)" rx="2" ry="2" />
<text x="1145.64" y="463.5" ></text>
</g>
<g >
<title>TransactionIdFollows (213,456,833 samples, 0.03%)</title><rect x="137.7" y="517" width="0.3" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="140.71" y="527.5" ></text>
</g>
<g >
<title>pagevec_lru_move_fn (1,699,665,581 samples, 0.21%)</title><rect x="1152.1" y="597" width="2.4" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="1155.11" y="607.5" ></text>
</g>
<g >
<title>pg_atomic_sub_fetch_u32 (308,807,514 samples, 0.04%)</title><rect x="43.5" y="693" width="0.4" height="15.0" fill="rgb(242,174,41)" rx="2" ry="2" />
<text x="46.49" y="703.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (1,470,684,851 samples, 0.18%)</title><rect x="244.3" y="325" width="2.1" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="247.32" y="335.5" ></text>
</g>
<g >
<title>ConditionalLockBuffer (620,191,015 samples, 0.08%)</title><rect x="234.9" y="533" width="0.9" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="237.94" y="543.5" ></text>
</g>
<g >
<title>do_futex (103,863,112 samples, 0.01%)</title><rect x="320.8" y="373" width="0.2" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="323.82" y="383.5" ></text>
</g>
<g >
<title>ReadBufferExtended (231,711,172 samples, 0.03%)</title><rect x="23.4" y="581" width="0.3" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="26.38" y="591.5" ></text>
</g>
<g >
<title>shmem_fault (1,451,151,018 samples, 0.18%)</title><rect x="244.3" y="309" width="2.1" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="247.35" y="319.5" ></text>
</g>
<g >
<title>LockBufHdr (202,762,573 samples, 0.02%)</title><rect x="331.0" y="469" width="0.3" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="334.01" y="479.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (75,893,920 samples, 0.01%)</title><rect x="173.3" y="437" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="176.34" y="447.5" ></text>
</g>
<g >
<title>nohz_balance_exit_idle.part.59 (100,029,700 samples, 0.01%)</title><rect x="1086.9" y="325" width="0.2" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="1089.95" y="335.5" ></text>
</g>
<g >
<title>smgrreadv (1,141,045,560 samples, 0.14%)</title><rect x="20.8" y="597" width="1.6" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="23.80" y="607.5" ></text>
</g>
<g >
<title>heap_prune_satisfies_vacuum (1,801,765,657 samples, 0.22%)</title><rect x="40.2" y="613" width="2.5" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="43.16" y="623.5" ></text>
</g>
<g >
<title>wake_up_q (1,009,697,283 samples, 0.12%)</title><rect x="17.8" y="277" width="1.5" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="20.83" y="287.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (235,130,798 samples, 0.03%)</title><rect x="631.0" y="501" width="0.4" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="634.04" y="511.5" ></text>
</g>
<g >
<title>PageRepairFragmentation (43,469,266,273 samples, 5.27%)</title><rect x="792.1" y="501" width="62.2" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="795.08" y="511.5" >PageRe..</text>
</g>
<g >
<title>mdwritev (749,558,632 samples, 0.09%)</title><rect x="50.4" y="261" width="1.1" height="15.0" fill="rgb(215,50,12)" rx="2" ry="2" />
<text x="53.45" y="271.5" ></text>
</g>
<g >
<title>native_write_msr_safe (699,279,606 samples, 0.08%)</title><rect x="1161.6" y="773" width="1.0" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="1164.63" y="783.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (726,176,207 samples, 0.09%)</title><rect x="270.1" y="245" width="1.1" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="273.12" y="255.5" ></text>
</g>
<g >
<title>shared_ts_memory_usage (2,096,652,975 samples, 0.25%)</title><rect x="601.6" y="533" width="3.0" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="604.63" y="543.5" ></text>
</g>
<g >
<title>TransactionIdDidCommit (109,615,455 samples, 0.01%)</title><rect x="229.4" y="469" width="0.2" height="15.0" fill="rgb(216,51,12)" rx="2" ry="2" />
<text x="232.44" y="479.5" ></text>
</g>
<g >
<title>heap_prune_satisfies_vacuum (224,303,972 samples, 0.03%)</title><rect x="231.6" y="517" width="0.3" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="234.55" y="527.5" ></text>
</g>
<g >
<title>current_kernel_time (70,877,978 samples, 0.01%)</title><rect x="335.6" y="405" width="0.1" height="15.0" fill="rgb(230,117,28)" rx="2" ry="2" />
<text x="338.59" y="415.5" ></text>
</g>
<g >
<title>xfs_file_aio_write (4,603,972,718 samples, 0.56%)</title><rect x="12.7" y="373" width="6.6" height="15.0" fill="rgb(251,211,50)" rx="2" ry="2" />
<text x="15.69" y="383.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (917,989,083 samples, 0.11%)</title><rect x="682.5" y="517" width="1.4" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="685.54" y="527.5" ></text>
</g>
<g >
<title>LWLockHeldByMeInMode (70,070,532 samples, 0.01%)</title><rect x="1143.1" y="501" width="0.1" height="15.0" fill="rgb(207,12,2)" rx="2" ry="2" />
<text x="1146.14" y="511.5" ></text>
</g>
<g >
<title>PageGetItemId (1,148,809,284 samples, 0.14%)</title><rect x="192.3" y="485" width="1.6" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="195.28" y="495.5" ></text>
</g>
<g >
<title>visibilitymap_get_status (8,154,032,961 samples, 0.99%)</title><rect x="609.9" y="517" width="11.7" height="15.0" fill="rgb(217,59,14)" rx="2" ry="2" />
<text x="612.89" y="527.5" ></text>
</g>
<g >
<title>__writeback_single_inode (115,471,936 samples, 0.01%)</title><rect x="10.1" y="645" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="13.10" y="655.5" ></text>
</g>
<g >
<title>pick_next_task_fair (358,590,811 samples, 0.04%)</title><rect x="1182.2" y="693" width="0.5" height="15.0" fill="rgb(242,170,40)" rx="2" ry="2" />
<text x="1185.20" y="703.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (75,893,920 samples, 0.01%)</title><rect x="173.3" y="405" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="176.34" y="415.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (80,712,318 samples, 0.01%)</title><rect x="301.0" y="373" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="304.00" y="383.5" ></text>
</g>
<g >
<title>nohz_balance_exit_idle.part.59 (75,162,367 samples, 0.01%)</title><rect x="173.3" y="309" width="0.1" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="176.34" y="319.5" ></text>
</g>
<g >
<title>__memmove_ssse3 (2,699,176,943 samples, 0.33%)</title><rect x="824.9" y="485" width="3.9" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="827.95" y="495.5" ></text>
</g>
<g >
<title>pg_rotate_left32 (79,970,721 samples, 0.01%)</title><rect x="134.6" y="325" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="137.58" y="335.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (896,177,642 samples, 0.11%)</title><rect x="1119.0" y="453" width="1.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="1121.96" y="463.5" ></text>
</g>
<g >
<title>alloc_pages_vma (1,082,554,484 samples, 0.13%)</title><rect x="125.6" y="149" width="1.6" height="15.0" fill="rgb(253,224,53)" rx="2" ry="2" />
<text x="128.64" y="159.5" ></text>
</g>
<g >
<title>scheduler_ipi (205,569,025 samples, 0.02%)</title><rect x="1189.6" y="613" width="0.3" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="1192.62" y="623.5" ></text>
</g>
<g >
<title>do_page_fault (477,991,274 samples, 0.06%)</title><rect x="63.7" y="389" width="0.7" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="66.71" y="399.5" ></text>
</g>
<g >
<title>__wake_up_bit (238,607,146 samples, 0.03%)</title><rect x="584.8" y="213" width="0.4" height="15.0" fill="rgb(229,113,27)" rx="2" ry="2" />
<text x="587.82" y="223.5" ></text>
</g>
<g >
<title>__find_lock_page (8,385,097,254 samples, 1.02%)</title><rect x="426.8" y="181" width="12.0" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="429.84" y="191.5" ></text>
</g>
<g >
<title>system_call_fastpath (95,889,441 samples, 0.01%)</title><rect x="71.8" y="357" width="0.2" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="74.83" y="367.5" ></text>
</g>
<g >
<title>StartReadBuffer (227,498,364 samples, 0.03%)</title><rect x="23.4" y="549" width="0.3" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="26.39" y="559.5" ></text>
</g>
<g >
<title>tas (250,400,588 samples, 0.03%)</title><rect x="47.8" y="741" width="0.4" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="50.80" y="751.5" ></text>
</g>
<g >
<title>BufferGetBlock (120,192,872 samples, 0.01%)</title><rect x="1138.4" y="501" width="0.1" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="1141.37" y="511.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (438,961,729 samples, 0.05%)</title><rect x="1144.3" y="533" width="0.6" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="1147.31" y="543.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (6,503,999,018 samples, 0.79%)</title><rect x="671.6" y="501" width="9.3" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="674.61" y="511.5" ></text>
</g>
<g >
<title>LWLockAcquire (1,178,208,717 samples, 0.14%)</title><rect x="1139.7" y="501" width="1.7" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="1142.72" y="511.5" ></text>
</g>
<g >
<title>__rwsem_mark_wake (422,744,642 samples, 0.05%)</title><rect x="17.2" y="277" width="0.6" height="15.0" fill="rgb(206,8,1)" rx="2" ry="2" />
<text x="20.21" y="287.5" ></text>
</g>
<g >
<title>file_update_time (101,488,659 samples, 0.01%)</title><rect x="50.9" y="101" width="0.1" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="53.88" y="111.5" ></text>
</g>
<g >
<title>sem_post@@GLIBC_2.2.5 (73,620,781 samples, 0.01%)</title><rect x="232.6" y="453" width="0.1" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="235.59" y="463.5" ></text>
</g>
<g >
<title>do_futex (109,531,290 samples, 0.01%)</title><rect x="73.9" y="357" width="0.2" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="76.93" y="367.5" ></text>
</g>
<g >
<title>tick_program_event (311,256,336 samples, 0.04%)</title><rect x="1185.3" y="629" width="0.4" height="15.0" fill="rgb(241,166,39)" rx="2" ry="2" />
<text x="1188.25" y="639.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (2,046,736,897 samples, 0.25%)</title><rect x="268.5" y="341" width="2.9" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="271.50" y="351.5" ></text>
</g>
<g >
<title>BufTableLookup (2,215,030,560 samples, 0.27%)</title><rect x="58.3" y="437" width="3.2" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="61.29" y="447.5" ></text>
</g>
<g >
<title>__find_get_page (659,326,879 samples, 0.08%)</title><rect x="59.8" y="261" width="0.9" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="62.76" y="271.5" ></text>
</g>
<g >
<title>BufTableLookup (5,925,703,624 samples, 0.72%)</title><rect x="249.0" y="453" width="8.4" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="251.96" y="463.5" ></text>
</g>
<g >
<title>__schedule (558,121,049 samples, 0.07%)</title><rect x="130.4" y="245" width="0.8" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="133.38" y="255.5" ></text>
</g>
<g >
<title>sysret_audit (819,271,228 samples, 0.10%)</title><rect x="335.9" y="437" width="1.1" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="338.86" y="447.5" ></text>
</g>
<g >
<title>__do_page_fault (143,134,447,326 samples, 17.37%)</title><rect x="380.7" y="277" width="205.0" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="383.74" y="287.5" >__do_page_fault</text>
</g>
<g >
<title>FileAccess (91,790,077 samples, 0.01%)</title><rect x="334.3" y="453" width="0.1" height="15.0" fill="rgb(245,187,44)" rx="2" ry="2" />
<text x="337.28" y="463.5" ></text>
</g>
<g >
<title>try_to_wake_up (153,248,404 samples, 0.02%)</title><rect x="596.8" y="261" width="0.3" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="599.84" y="271.5" ></text>
</g>
<g >
<title>hrtimer_start (819,085,682 samples, 0.10%)</title><rect x="1185.0" y="677" width="1.2" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="1188.01" y="687.5" ></text>
</g>
<g >
<title>PageGetItem (869,016,305 samples, 0.11%)</title><rect x="163.4" y="501" width="1.2" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="166.36" y="511.5" ></text>
</g>
<g >
<title>mem_cgroup_update_page_stat (1,372,720,940 samples, 0.17%)</title><rect x="1155.5" y="613" width="1.9" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="1158.48" y="623.5" ></text>
</g>
<g >
<title>wake_up_q (149,194,408 samples, 0.02%)</title><rect x="1142.3" y="389" width="0.2" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="1145.29" y="399.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (577,326,841 samples, 0.07%)</title><rect x="227.7" y="437" width="0.8" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="230.68" y="447.5" ></text>
</g>
<g >
<title>PageGetItemId (137,355,995 samples, 0.02%)</title><rect x="31.2" y="581" width="0.2" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="34.22" y="591.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (60,001,797,621 samples, 7.28%)</title><rect x="237.2" y="501" width="85.9" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="240.20" y="511.5" >StartReadB..</text>
</g>
<g >
<title>hash_bytes (156,817,810 samples, 0.02%)</title><rect x="134.4" y="325" width="0.2" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="137.35" y="335.5" ></text>
</g>
<g >
<title>enqueue_entity (165,851,095 samples, 0.02%)</title><rect x="1166.3" y="517" width="0.2" height="15.0" fill="rgb(218,62,15)" rx="2" ry="2" />
<text x="1169.28" y="527.5" ></text>
</g>
<g >
<title>LWLockRelease (501,842,807 samples, 0.06%)</title><rect x="73.7" y="437" width="0.8" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="76.74" y="447.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (1,330,162,189 samples, 0.16%)</title><rect x="93.5" y="133" width="1.9" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="96.54" y="143.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (108,313,034 samples, 0.01%)</title><rect x="321.0" y="437" width="0.1" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="323.99" y="447.5" ></text>
</g>
<g >
<title>heap_prune_chain (138,370,919,442 samples, 16.79%)</title><rect x="864.8" y="517" width="198.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="867.81" y="527.5" >heap_prune_chain</text>
</g>
<g >
<title>ConditionalLockBuffer (75,564,266 samples, 0.01%)</title><rect x="55.4" y="517" width="0.2" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="58.45" y="527.5" ></text>
</g>
<g >
<title>TransactionIdFollows (168,187,208 samples, 0.02%)</title><rect x="193.9" y="485" width="0.3" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="196.93" y="495.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (225,838,527 samples, 0.03%)</title><rect x="23.4" y="533" width="0.3" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="26.39" y="543.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (78,246,028 samples, 0.01%)</title><rect x="57.5" y="229" width="0.1" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="60.45" y="239.5" ></text>
</g>
<g >
<title>TransactionIdPrecedesOrEquals (339,865,302 samples, 0.04%)</title><rect x="217.4" y="437" width="0.5" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="220.39" y="447.5" ></text>
</g>
<g >
<title>HeapTupleHeaderAdvanceConflictHorizon (206,349,584 samples, 0.03%)</title><rect x="760.1" y="517" width="0.3" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="763.09" y="527.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (373,773,448 samples, 0.05%)</title><rect x="591.7" y="261" width="0.6" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="594.75" y="271.5" ></text>
</g>
<g >
<title>BufferDescriptorGetBuffer (279,220,599 samples, 0.03%)</title><rect x="263.7" y="405" width="0.4" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="266.70" y="415.5" ></text>
</g>
<g >
<title>smgrreadv (72,230,433 samples, 0.01%)</title><rect x="601.2" y="517" width="0.1" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="604.19" y="527.5" ></text>
</g>
<g >
<title>TerminateBufferIO (75,927,925 samples, 0.01%)</title><rect x="12.0" y="517" width="0.1" height="15.0" fill="rgb(239,160,38)" rx="2" ry="2" />
<text x="14.98" y="527.5" ></text>
</g>
<g >
<title>__pread_nocancel (319,415,208 samples, 0.04%)</title><rect x="49.5" y="725" width="0.5" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="52.51" y="735.5" ></text>
</g>
<g >
<title>s_lock (113,376,933 samples, 0.01%)</title><rect x="247.7" y="405" width="0.2" height="15.0" fill="rgb(211,29,7)" rx="2" ry="2" />
<text x="250.69" y="415.5" ></text>
</g>
<g >
<title>heap_vac_scan_next_block_parallel (346,032,104 samples, 0.04%)</title><rect x="23.3" y="629" width="0.4" height="15.0" fill="rgb(247,197,47)" rx="2" ry="2" />
<text x="26.25" y="639.5" ></text>
</g>
<g >
<title>ItemPointerSet (304,559,443 samples, 0.04%)</title><rect x="136.5" y="517" width="0.5" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="139.51" y="527.5" ></text>
</g>
<g >
<title>all (824,213,431,609 samples, 100%)</title><rect x="10.0" y="805" width="1180.0" height="15.0" fill="rgb(213,39,9)" rx="2" ry="2" />
<text x="13.00" y="815.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (177,153,295 samples, 0.02%)</title><rect x="1033.9" y="453" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="1036.87" y="463.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (90,485,653 samples, 0.01%)</title><rect x="349.8" y="293" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="352.79" y="303.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (1,016,370,074 samples, 0.12%)</title><rect x="348.2" y="293" width="1.4" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="351.19" y="303.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (1,521,504,898 samples, 0.18%)</title><rect x="1152.2" y="565" width="2.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1155.22" y="575.5" ></text>
</g>
<g >
<title>pgstat_report_wait_end (80,771,282 samples, 0.01%)</title><rect x="600.3" y="453" width="0.1" height="15.0" fill="rgb(218,60,14)" rx="2" ry="2" />
<text x="603.25" y="463.5" ></text>
</g>
<g >
<title>call_rwsem_down_read_failed (3,037,112,745 samples, 0.37%)</title><rect x="590.8" y="309" width="4.3" height="15.0" fill="rgb(228,110,26)" rx="2" ry="2" />
<text x="593.80" y="319.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (113,577,889 samples, 0.01%)</title><rect x="1086.9" y="453" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="1089.95" y="463.5" ></text>
</g>
<g >
<title>BufferIsValid (752,199,942 samples, 0.09%)</title><rect x="1112.8" y="437" width="1.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1115.81" y="447.5" ></text>
</g>
<g >
<title>call_softirq (534,868,085 samples, 0.06%)</title><rect x="1164.6" y="629" width="0.7" height="15.0" fill="rgb(225,94,22)" rx="2" ry="2" />
<text x="1167.57" y="639.5" ></text>
</g>
<g >
<title>PageGetItem (692,124,148 samples, 0.08%)</title><rect x="631.9" y="533" width="1.0" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="634.86" y="543.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (77,288,274 samples, 0.01%)</title><rect x="621.3" y="469" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="624.34" y="479.5" ></text>
</g>
<g >
<title>xfs_file_buffered_aio_read (112,499,076 samples, 0.01%)</title><rect x="51.7" y="229" width="0.2" height="15.0" fill="rgb(217,55,13)" rx="2" ry="2" />
<text x="54.73" y="239.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (204,343,660 samples, 0.02%)</title><rect x="1143.2" y="501" width="0.3" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="1146.24" y="511.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (97,340,457 samples, 0.01%)</title><rect x="594.9" y="229" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="597.91" y="239.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (163,680,503 samples, 0.02%)</title><rect x="318.9" y="405" width="0.3" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="321.94" y="415.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (340,966,235 samples, 0.04%)</title><rect x="784.3" y="469" width="0.5" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="787.34" y="479.5" ></text>
</g>
<g >
<title>__pread_nocancel (38,456,749,002 samples, 4.67%)</title><rect x="77.1" y="437" width="55.1" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="80.13" y="447.5" >__pre..</text>
</g>
<g >
<title>heap_prepare_freeze_tuple (8,946,773,570 samples, 1.09%)</title><rect x="205.1" y="469" width="12.8" height="15.0" fill="rgb(227,101,24)" rx="2" ry="2" />
<text x="208.09" y="479.5" ></text>
</g>
<g >
<title>LockBufHdr (384,468,694 samples, 0.05%)</title><rect x="329.5" y="485" width="0.5" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="332.50" y="495.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (7,029,628,639 samples, 0.85%)</title><rect x="238.9" y="437" width="10.1" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="241.89" y="447.5" ></text>
</g>
<g >
<title>menu_select (1,120,409,766 samples, 0.14%)</title><rect x="1176.6" y="693" width="1.6" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="1179.58" y="703.5" ></text>
</g>
<g >
<title>load_balance (91,989,712 samples, 0.01%)</title><rect x="311.2" y="261" width="0.1" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="314.21" y="271.5" ></text>
</g>
<g >
<title>BufferIsPermanent (273,594,320 samples, 0.03%)</title><rect x="41.3" y="565" width="0.3" height="15.0" fill="rgb(250,210,50)" rx="2" ry="2" />
<text x="44.25" y="575.5" ></text>
</g>
<g >
<title>system_call_fastpath (889,747,881 samples, 0.11%)</title><rect x="21.2" y="533" width="1.2" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="24.15" y="543.5" ></text>
</g>
<g >
<title>cmd_record (494,635,465 samples, 0.06%)</title><rect x="10.3" y="725" width="0.7" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="13.27" y="735.5" ></text>
</g>
<g >
<title>vacuum (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="533" width="4.4" height="15.0" fill="rgb(219,66,15)" rx="2" ry="2" />
<text x="53.20" y="543.5" ></text>
</g>
<g >
<title>xfs_iunlock (267,816,803 samples, 0.03%)</title><rect x="51.1" y="117" width="0.4" height="15.0" fill="rgb(232,127,30)" rx="2" ry="2" />
<text x="54.10" y="127.5" ></text>
</g>
<g >
<title>heap_parallel_vacuum_scan_worker (124,325,521,611 samples, 15.08%)</title><rect x="55.2" y="565" width="178.0" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="58.21" y="575.5" >heap_parallel_vacuum_sc..</text>
</g>
<g >
<title>irq_enter (326,610,123 samples, 0.04%)</title><rect x="1164.1" y="661" width="0.5" height="15.0" fill="rgb(253,225,53)" rx="2" ry="2" />
<text x="1167.10" y="671.5" ></text>
</g>
<g >
<title>mem_cgroup_cache_charge (4,681,231,025 samples, 0.57%)</title><rect x="445.6" y="181" width="6.7" height="15.0" fill="rgb(251,215,51)" rx="2" ry="2" />
<text x="448.61" y="191.5" ></text>
</g>
<g >
<title>hrtimer_start_range_ns (96,060,738 samples, 0.01%)</title><rect x="310.6" y="309" width="0.2" height="15.0" fill="rgb(244,179,42)" rx="2" ry="2" />
<text x="313.61" y="319.5" ></text>
</g>
<g >
<title>update_process_times (144,074,202 samples, 0.02%)</title><rect x="828.8" y="373" width="0.3" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="831.85" y="383.5" ></text>
</g>
<g >
<title>__tlb_remove_page (89,036,450 samples, 0.01%)</title><rect x="1151.4" y="629" width="0.1" height="15.0" fill="rgb(236,142,34)" rx="2" ry="2" />
<text x="1154.41" y="639.5" ></text>
</g>
<g >
<title>PageGetItem (80,324,566 samples, 0.01%)</title><rect x="36.8" y="581" width="0.1" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="39.81" y="591.5" ></text>
</g>
<g >
<title>__new_sem_wait_slow.constprop.0 (131,147,634 samples, 0.02%)</title><rect x="47.5" y="741" width="0.2" height="15.0" fill="rgb(250,208,49)" rx="2" ry="2" />
<text x="50.55" y="751.5" ></text>
</g>
<g >
<title>PageGetMaxOffsetNumber (88,762,334 samples, 0.01%)</title><rect x="824.8" y="485" width="0.1" height="15.0" fill="rgb(234,133,32)" rx="2" ry="2" />
<text x="827.76" y="495.5" ></text>
</g>
<g >
<title>smgrreadv (158,124,941 samples, 0.02%)</title><rect x="51.7" y="373" width="0.2" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="54.67" y="383.5" ></text>
</g>
<g >
<title>nohz_balance_exit_idle.part.59 (105,181,066 samples, 0.01%)</title><rect x="203.0" y="293" width="0.2" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="206.02" y="303.5" ></text>
</g>
<g >
<title>load_balance (85,584,062 samples, 0.01%)</title><rect x="1189.7" y="501" width="0.2" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="1192.74" y="511.5" ></text>
</g>
<g >
<title>TransactionIdIsInProgress (1,202,065,743 samples, 0.15%)</title><rect x="1123.6" y="485" width="1.7" height="15.0" fill="rgb(208,16,3)" rx="2" ry="2" />
<text x="1126.60" y="495.5" ></text>
</g>
<g >
<title>page_fault (2,344,754,514 samples, 0.28%)</title><rect x="243.3" y="405" width="3.4" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="246.33" y="415.5" ></text>
</g>
<g >
<title>hrtimer_try_to_cancel (101,851,143 samples, 0.01%)</title><rect x="1187.4" y="677" width="0.2" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="1190.44" y="687.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (164,125,955 samples, 0.02%)</title><rect x="231.9" y="501" width="0.3" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="234.92" y="511.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_sub_u32_impl (144,543,924 samples, 0.02%)</title><rect x="321.3" y="405" width="0.2" height="15.0" fill="rgb(225,94,22)" rx="2" ry="2" />
<text x="324.27" y="415.5" ></text>
</g>
<g >
<title>system_call_fastpath (181,287,469,115 samples, 22.00%)</title><rect x="340.4" y="437" width="259.6" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="343.41" y="447.5" >system_call_fastpath</text>
</g>
<g >
<title>TransactionIdPrecedes (511,887,427 samples, 0.06%)</title><rect x="1124.6" y="469" width="0.7" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="1127.59" y="479.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (636,854,493 samples, 0.08%)</title><rect x="95.9" y="133" width="0.9" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="98.88" y="143.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (878,253,293 samples, 0.11%)</title><rect x="255.7" y="309" width="1.3" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="258.73" y="319.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (376,838,370 samples, 0.05%)</title><rect x="1164.8" y="469" width="0.5" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1167.76" y="479.5" ></text>
</g>
<g >
<title>LWLockWaitListLock (73,040,904 samples, 0.01%)</title><rect x="1141.8" y="469" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="1144.80" y="479.5" ></text>
</g>
<g >
<title>TransactionIdPrecedesOrEquals (569,285,335 samples, 0.07%)</title><rect x="1033.1" y="469" width="0.8" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="1036.06" y="479.5" ></text>
</g>
<g >
<title>LWLockHeldByMeInMode (90,032,597 samples, 0.01%)</title><rect x="770.2" y="501" width="0.1" height="15.0" fill="rgb(207,12,2)" rx="2" ry="2" />
<text x="773.18" y="511.5" ></text>
</g>
<g >
<title>PageGetItemId (117,637,617 samples, 0.01%)</title><rect x="36.9" y="581" width="0.2" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="39.93" y="591.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (119,542,197 samples, 0.01%)</title><rect x="87.2" y="181" width="0.2" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="90.21" y="191.5" ></text>
</g>
<g >
<title>ktime_get (100,614,536 samples, 0.01%)</title><rect x="682.4" y="405" width="0.1" height="15.0" fill="rgb(207,10,2)" rx="2" ry="2" />
<text x="685.36" y="415.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (128,189,296 samples, 0.02%)</title><rect x="990.7" y="453" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="993.75" y="463.5" ></text>
</g>
<g >
<title>tas (2,467,560,318 samples, 0.30%)</title><rect x="325.9" y="469" width="3.6" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="328.95" y="479.5" ></text>
</g>
<g >
<title>shmem_fault (130,369,365,543 samples, 15.82%)</title><rect x="391.3" y="213" width="186.6" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="394.28" y="223.5" >shmem_fault</text>
</g>
<g >
<title>do_sync_read (91,148,571 samples, 0.01%)</title><rect x="78.4" y="389" width="0.2" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="81.45" y="399.5" ></text>
</g>
<g >
<title>lazy_scan_heap (3,081,731,940 samples, 0.37%)</title><rect x="50.2" y="469" width="4.4" height="15.0" fill="rgb(248,198,47)" rx="2" ry="2" />
<text x="53.20" y="479.5" ></text>
</g>
<g >
<title>__audit_syscall_entry (378,489,622 samples, 0.05%)</title><rect x="335.1" y="421" width="0.6" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="338.14" y="431.5" ></text>
</g>
<g >
<title>LWLockQueueSelf (116,372,146 samples, 0.01%)</title><rect x="1141.1" y="485" width="0.2" height="15.0" fill="rgb(236,146,35)" rx="2" ry="2" />
<text x="1144.12" y="495.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (275,527,208 samples, 0.03%)</title><rect x="63.9" y="325" width="0.4" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="66.93" y="335.5" ></text>
</g>
<g >
<title>sys_exit_group (10,517,164,883 samples, 1.28%)</title><rect x="1146.0" y="757" width="15.0" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="1148.99" y="767.5" ></text>
</g>
<g >
<title>scheduler_tick (75,893,920 samples, 0.01%)</title><rect x="173.3" y="325" width="0.2" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="176.34" y="335.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (5,981,226,504 samples, 0.73%)</title><rect x="1024.5" y="469" width="8.6" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="1027.49" y="479.5" ></text>
</g>
<g >
<title>s_lock (5,406,470,673 samples, 0.66%)</title><rect x="65.6" y="405" width="7.7" height="15.0" fill="rgb(211,29,7)" rx="2" ry="2" />
<text x="68.58" y="415.5" ></text>
</g>
<g >
<title>vfs_read (180,700,479,590 samples, 21.92%)</title><rect x="341.3" y="405" width="258.7" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="344.25" y="415.5" >vfs_read</text>
</g>
<g >
<title>local_apic_timer_interrupt (77,614,966 samples, 0.01%)</title><rect x="87.4" y="149" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="90.38" y="159.5" ></text>
</g>
<g >
<title>radix_tree_descend (393,041,870 samples, 0.05%)</title><rect x="100.3" y="133" width="0.5" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="103.25" y="143.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (257,884,254 samples, 0.03%)</title><rect x="64.0" y="293" width="0.3" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="66.95" y="303.5" ></text>
</g>
<g >
<title>do_generic_file_read.constprop.52 (734,745,438 samples, 0.09%)</title><rect x="21.3" y="421" width="1.0" height="15.0" fill="rgb(205,4,1)" rx="2" ry="2" />
<text x="24.25" y="431.5" ></text>
</g>
<g >
<title>handle_mm_fault (1,048,916,750 samples, 0.13%)</title><rect x="59.3" y="357" width="1.5" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="62.35" y="367.5" ></text>
</g>
<g >
<title>cpumask_next_and (73,626,817 samples, 0.01%)</title><rect x="311.2" y="229" width="0.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="314.21" y="239.5" ></text>
</g>
<g >
<title>irq_exit (204,186,860 samples, 0.02%)</title><rect x="1189.6" y="597" width="0.3" height="15.0" fill="rgb(249,206,49)" rx="2" ry="2" />
<text x="1192.63" y="607.5" ></text>
</g>
<g >
<title>call_rwsem_wake (267,816,803 samples, 0.03%)</title><rect x="51.1" y="85" width="0.4" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="54.10" y="95.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (376,838,370 samples, 0.05%)</title><rect x="1164.8" y="485" width="0.5" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="1167.76" y="495.5" ></text>
</g>
<g >
<title>system_call_fastpath (735,498,408 samples, 0.09%)</title><rect x="48.4" y="741" width="1.0" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="51.40" y="751.5" ></text>
</g>
<g >
<title>LWLockDequeueSelf (133,353,127 samples, 0.02%)</title><rect x="1140.9" y="485" width="0.2" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="1143.93" y="495.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (112,709,505 samples, 0.01%)</title><rect x="1086.9" y="421" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="1089.95" y="431.5" ></text>
</g>
<g >
<title>do_page_fault (1,514,272,238 samples, 0.18%)</title><rect x="255.0" y="405" width="2.1" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="257.96" y="415.5" ></text>
</g>
<g >
<title>page_fault (1,522,966,548 samples, 0.18%)</title><rect x="254.9" y="421" width="2.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="257.95" y="431.5" ></text>
</g>
<g >
<title>ParallelWorkerMain (637,248,213,532 samples, 77.32%)</title><rect x="233.7" y="645" width="912.3" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="236.66" y="655.5" >ParallelWorkerMain</text>
</g>
<g >
<title>dsa_get_total_size (275,422,087 samples, 0.03%)</title><rect x="132.5" y="501" width="0.4" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="135.49" y="511.5" ></text>
</g>
<g >
<title>pgstat_tracks_io_op (124,884,007 samples, 0.02%)</title><rect x="621.0" y="373" width="0.1" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="623.96" y="383.5" ></text>
</g>
<g >
<title>vacuum_rel (22,616,657,143 samples, 2.74%)</title><rect x="11.0" y="741" width="32.4" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="13.98" y="751.5" >va..</text>
</g>
<g >
<title>sem_post@@GLIBC_2.2.5 (373,026,221 samples, 0.05%)</title><rect x="1142.0" y="469" width="0.5" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="1144.99" y="479.5" ></text>
</g>
<g >
<title>schedule (442,089,951 samples, 0.05%)</title><rect x="310.8" y="309" width="0.6" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="313.76" y="319.5" ></text>
</g>
<g >
<title>list_del (217,015,500 samples, 0.03%)</title><rect x="574.6" y="117" width="0.3" height="15.0" fill="rgb(235,140,33)" rx="2" ry="2" />
<text x="577.60" y="127.5" ></text>
</g>
<g >
<title>vm_normal_page (93,833,366 samples, 0.01%)</title><rect x="1160.9" y="645" width="0.1" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="1163.87" y="655.5" ></text>
</g>
<g >
<title>_mdfd_getseg (242,127,705 samples, 0.03%)</title><rect x="600.5" y="469" width="0.4" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="603.52" y="479.5" ></text>
</g>
<g >
<title>ResourceOwnerForgetBuffer (141,956,280 samples, 0.02%)</title><rect x="607.1" y="485" width="0.2" height="15.0" fill="rgb(247,193,46)" rx="2" ry="2" />
<text x="610.06" y="495.5" ></text>
</g>
<g >
<title>tick_sched_handle (70,310,500 samples, 0.01%)</title><rect x="96.9" y="69" width="0.1" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="99.89" y="79.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (212,887,781 samples, 0.03%)</title><rect x="393.5" y="197" width="0.3" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="396.48" y="207.5" ></text>
</g>
<g >
<title>SetHintBits (4,415,796,632 samples, 0.54%)</title><rect x="223.1" y="469" width="6.3" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="226.12" y="479.5" ></text>
</g>
<g >
<title>__do_page_fault (1,486,296,025 samples, 0.18%)</title><rect x="255.0" y="389" width="2.1" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="257.96" y="399.5" ></text>
</g>
<g >
<title>heap_page_prune_execute (55,866,958,960 samples, 6.78%)</title><rect x="784.8" y="517" width="80.0" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="787.82" y="527.5" >heap_page..</text>
</g>
<g >
<title>ExecVacuum (22,616,657,143 samples, 2.74%)</title><rect x="11.0" y="773" width="32.4" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="13.98" y="783.5" >Ex..</text>
</g>
<g >
<title>try_to_wake_up (367,067,424 samples, 0.04%)</title><rect x="594.6" y="261" width="0.5" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="597.61" y="271.5" ></text>
</g>
<g >
<title>LWLockRelease (398,251,233 samples, 0.05%)</title><rect x="43.4" y="709" width="0.5" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="46.36" y="719.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (135,558,230 samples, 0.02%)</title><rect x="132.6" y="453" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="135.57" y="463.5" ></text>
</g>
<g >
<title>tas (1,519,794,054 samples, 0.18%)</title><rect x="315.6" y="405" width="2.2" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="318.64" y="415.5" ></text>
</g>
<g >
<title>sys_nanosleep (687,299,706 samples, 0.08%)</title><rect x="310.5" y="357" width="1.0" height="15.0" fill="rgb(248,200,48)" rx="2" ry="2" />
<text x="313.48" y="367.5" ></text>
</g>
<g >
<title>free_pgd_range (87,499,351 samples, 0.01%)</title><rect x="1146.0" y="661" width="0.1" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="1149.00" y="671.5" ></text>
</g>
<g >
<title>worker_thread (115,471,936 samples, 0.01%)</title><rect x="10.1" y="741" width="0.2" height="15.0" fill="rgb(214,45,10)" rx="2" ry="2" />
<text x="13.10" y="751.5" ></text>
</g>
<g >
<title>BufferGetPage (151,275,356 samples, 0.02%)</title><rect x="759.6" y="517" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="762.59" y="527.5" ></text>
</g>
<g >
<title>BackgroundWorkerInitializeConnectionByOid (212,994,535 samples, 0.03%)</title><rect x="54.7" y="613" width="0.3" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="57.69" y="623.5" ></text>
</g>
<g >
<title>file_update_time (118,238,904 samples, 0.01%)</title><rect x="328.7" y="373" width="0.1" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="331.67" y="383.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (163,321,519 samples, 0.02%)</title><rect x="393.6" y="181" width="0.2" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="396.56" y="191.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (77,614,966 samples, 0.01%)</title><rect x="87.4" y="133" width="0.1" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="90.38" y="143.5" ></text>
</g>
<g >
<title>TransactionIdIsInProgress (77,584,434 samples, 0.01%)</title><rect x="230.7" y="485" width="0.1" height="15.0" fill="rgb(208,16,3)" rx="2" ry="2" />
<text x="233.68" y="495.5" ></text>
</g>
<g >
<title>handle_mm_fault (3,095,780,057 samples, 0.38%)</title><rect x="267.3" y="373" width="4.5" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="270.33" y="383.5" ></text>
</g>
<g >
<title>__mem_cgroup_count_vm_event (201,912,294 samples, 0.02%)</title><rect x="386.1" y="245" width="0.3" height="15.0" fill="rgb(217,56,13)" rx="2" ry="2" />
<text x="389.09" y="255.5" ></text>
</g>
<g >
<title>perf_mmap__push (399,474,370 samples, 0.05%)</title><rect x="10.3" y="693" width="0.6" height="15.0" fill="rgb(248,201,48)" rx="2" ry="2" />
<text x="13.30" y="703.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (910,640,070 samples, 0.11%)</title><rect x="93.6" y="117" width="1.3" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="96.59" y="127.5" ></text>
</g>
<g >
<title>heap_vac_scan_next_block_parallel (10,695,648,653 samples, 1.30%)</title><rect x="606.4" y="533" width="15.3" height="15.0" fill="rgb(247,197,47)" rx="2" ry="2" />
<text x="609.39" y="543.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (75,510,927 samples, 0.01%)</title><rect x="697.4" y="485" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="700.37" y="495.5" ></text>
</g>
<g >
<title>system_call_fastpath (37,596,418,486 samples, 4.56%)</title><rect x="78.4" y="421" width="53.8" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="81.35" y="431.5" >syste..</text>
</g>
<g >
<title>handle_mm_fault (1,405,115,788 samples, 0.17%)</title><rect x="255.1" y="373" width="2.0" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="258.07" y="383.5" ></text>
</g>
<g >
<title>heap_prune_satisfies_vacuum (8,381,688,717 samples, 1.02%)</title><rect x="218.8" y="501" width="12.0" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="221.80" y="511.5" ></text>
</g>
<g >
<title>pagevec_lru_move_fn (213,189,052 samples, 0.03%)</title><rect x="1159.0" y="565" width="0.3" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="1162.03" y="575.5" ></text>
</g>
<g >
<title>perform_spin_delay (518,164,650 samples, 0.06%)</title><rect x="64.4" y="405" width="0.7" height="15.0" fill="rgb(247,196,46)" rx="2" ry="2" />
<text x="67.40" y="415.5" ></text>
</g>
<g >
<title>BufTableInsert (290,289,581 samples, 0.04%)</title><rect x="11.2" y="549" width="0.4" height="15.0" fill="rgb(206,8,1)" rx="2" ry="2" />
<text x="14.18" y="559.5" ></text>
</g>
<g >
<title>xfs_file_aio_write_checks (735,375,822 samples, 0.09%)</title><rect x="14.8" y="341" width="1.0" height="15.0" fill="rgb(249,206,49)" rx="2" ry="2" />
<text x="17.77" y="351.5" ></text>
</g>
<g >
<title>radix_tree_descend (122,707,598 samples, 0.01%)</title><rect x="246.1" y="213" width="0.1" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="249.06" y="223.5" ></text>
</g>
<g >
<title>visibilitymap_get_status (1,488,188,829 samples, 0.18%)</title><rect x="133.7" y="501" width="2.1" height="15.0" fill="rgb(217,59,14)" rx="2" ry="2" />
<text x="136.67" y="511.5" ></text>
</g>
<g >
<title>LWLockRelease (689,058,711 samples, 0.08%)</title><rect x="320.5" y="453" width="1.0" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="323.55" y="463.5" ></text>
</g>
<g >
<title>ConditionalLockBuffer (79,461,601 samples, 0.01%)</title><rect x="234.7" y="549" width="0.1" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="237.72" y="559.5" ></text>
</g>
<g >
<title>__set_page_dirty_no_writeback (94,364,097 samples, 0.01%)</title><rect x="584.3" y="213" width="0.1" height="15.0" fill="rgb(223,86,20)" rx="2" ry="2" />
<text x="587.28" y="223.5" ></text>
</g>
<g >
<title>GetBufferDescriptor (499,565,098 samples, 0.06%)</title><rect x="1101.3" y="469" width="0.8" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="1104.34" y="479.5" ></text>
</g>
<g >
<title>memmove@plt (208,989,997 samples, 0.03%)</title><rect x="854.0" y="485" width="0.3" height="15.0" fill="rgb(232,128,30)" rx="2" ry="2" />
<text x="857.01" y="495.5" ></text>
</g>
<g >
<title>pte_alloc_one (266,847,590 samples, 0.03%)</title><rect x="86.3" y="213" width="0.4" height="15.0" fill="rgb(252,217,51)" rx="2" ry="2" />
<text x="89.33" y="223.5" ></text>
</g>
<g >
<title>idle_cpu (81,790,463 samples, 0.01%)</title><rect x="18.9" y="229" width="0.2" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="21.94" y="239.5" ></text>
</g>
<g >
<title>fsm_set_and_search (402,086,166 samples, 0.05%)</title><rect x="22.5" y="629" width="0.5" height="15.0" fill="rgb(218,64,15)" rx="2" ry="2" />
<text x="25.46" y="639.5" ></text>
</g>
<g >
<title>tick_sched_timer (112,843,505 samples, 0.01%)</title><rect x="990.8" y="405" width="0.1" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="993.77" y="415.5" ></text>
</g>
<g >
<title>native_write_msr_safe (187,683,415 samples, 0.02%)</title><rect x="1185.4" y="581" width="0.3" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="1188.39" y="591.5" ></text>
</g>
<g >
<title>iomap_write_actor (133,604,874 samples, 0.02%)</title><rect x="50.6" y="85" width="0.2" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="53.63" y="95.5" ></text>
</g>
<g >
<title>IOContextForStrategy (131,930,868 samples, 0.02%)</title><rect x="612.6" y="421" width="0.2" height="15.0" fill="rgb(241,169,40)" rx="2" ry="2" />
<text x="615.63" y="431.5" ></text>
</g>
<g >
<title>bdi_writeback_workfn (115,471,936 samples, 0.01%)</title><rect x="10.1" y="709" width="0.2" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="13.10" y="719.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (147,030,058 samples, 0.02%)</title><rect x="42.2" y="533" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="45.19" y="543.5" ></text>
</g>
<g >
<title>mem_cgroup_charge_common (4,343,329,049 samples, 0.53%)</title><rect x="446.1" y="165" width="6.2" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="449.09" y="175.5" ></text>
</g>
<g >
<title>pagevec_lru_move_fn (240,465,341 samples, 0.03%)</title><rect x="95.5" y="149" width="0.4" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="98.52" y="159.5" ></text>
</g>
<g >
<title>heap_prepare_freeze_tuple (1,215,768,247 samples, 0.15%)</title><rect x="933.8" y="501" width="1.8" height="15.0" fill="rgb(227,101,24)" rx="2" ry="2" />
<text x="936.82" y="511.5" ></text>
</g>
<g >
<title>compactify_tuples (540,545,086 samples, 0.07%)</title><rect x="31.5" y="581" width="0.8" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="34.51" y="591.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (148,789,221 samples, 0.02%)</title><rect x="759.4" y="517" width="0.2" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="762.38" y="527.5" ></text>
</g>
<g >
<title>HeapTupleHeaderAdvanceConflictHorizon (237,164,345 samples, 0.03%)</title><rect x="34.3" y="597" width="0.3" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="37.27" y="607.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (76,829,307 samples, 0.01%)</title><rect x="1184.5" y="677" width="0.1" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="1187.47" y="687.5" ></text>
</g>
<g >
<title>shmem_recalc_inode (290,823,174 samples, 0.04%)</title><rect x="127.2" y="165" width="0.5" height="15.0" fill="rgb(214,42,10)" rx="2" ry="2" />
<text x="130.25" y="175.5" ></text>
</g>
<g >
<title>pgstat_count_io_op (308,656,816 samples, 0.04%)</title><rect x="620.7" y="405" width="0.4" height="15.0" fill="rgb(207,10,2)" rx="2" ry="2" />
<text x="623.70" y="415.5" ></text>
</g>
<g >
<title>heap_page_prune_and_freeze (294,701,437,150 samples, 35.76%)</title><rect x="709.6" y="533" width="421.9" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="712.57" y="543.5" >heap_page_prune_and_freeze</text>
</g>
<g >
<title>smp_apic_timer_interrupt (2,125,741,723 samples, 0.26%)</title><rect x="1164.0" y="677" width="3.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="1167.03" y="687.5" ></text>
</g>
<g >
<title>finish_task_switch (168,902,113 samples, 0.02%)</title><rect x="16.9" y="245" width="0.3" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="19.93" y="255.5" ></text>
</g>
<g >
<title>page_fault (110,831,926 samples, 0.01%)</title><rect x="47.8" y="725" width="0.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="50.85" y="735.5" ></text>
</g>
<g >
<title>up_read (1,154,836,205 samples, 0.14%)</title><rect x="595.4" y="325" width="1.7" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="598.40" y="335.5" ></text>
</g>
<g >
<title>_raw_spin_lock_irqsave (136,206,863 samples, 0.02%)</title><rect x="18.6" y="245" width="0.2" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="21.61" y="255.5" ></text>
</g>
<g >
<title>heap_vac_scan_next_block (10,773,298,017 samples, 1.31%)</title><rect x="606.3" y="549" width="15.5" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="609.34" y="559.5" ></text>
</g>
<g >
<title>perform_spin_delay (1,301,039,731 samples, 0.16%)</title><rect x="70.3" y="389" width="1.9" height="15.0" fill="rgb(247,196,46)" rx="2" ry="2" />
<text x="73.33" y="399.5" ></text>
</g>
<g >
<title>PostmasterMain (128,045,675,684 samples, 15.54%)</title><rect x="50.2" y="741" width="183.3" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="53.19" y="751.5" >PostmasterMain</text>
</g>
<g >
<title>MarkBufferDirtyHint (361,697,728 samples, 0.04%)</title><rect x="1087.1" y="485" width="0.5" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="1090.11" y="495.5" ></text>
</g>
<g >
<title>rwsem_down_read_failed (885,210,911 samples, 0.11%)</title><rect x="130.1" y="277" width="1.2" height="15.0" fill="rgb(254,225,54)" rx="2" ry="2" />
<text x="133.06" y="287.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (80,712,318 samples, 0.01%)</title><rect x="301.0" y="357" width="0.1" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="304.00" y="367.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (89,760,923 samples, 0.01%)</title><rect x="301.0" y="405" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="304.00" y="415.5" ></text>
</g>
<g >
<title>PostmasterMain (637,279,068,599 samples, 77.32%)</title><rect x="233.6" y="757" width="912.4" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="236.61" y="767.5" >PostmasterMain</text>
</g>
<g >
<title>MarkBufferDirty (943,624,973 samples, 0.11%)</title><rect x="769.5" y="517" width="1.3" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="772.49" y="527.5" ></text>
</g>
<g >
<title>ReadBuffer_common (1,173,786,906 samples, 0.14%)</title><rect x="50.2" y="405" width="1.7" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="53.22" y="415.5" ></text>
</g>
<g >
<title>BufferIsValid (76,715,302 samples, 0.01%)</title><rect x="235.0" y="517" width="0.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="238.00" y="527.5" ></text>
</g>
<g >
<title>error_entry (168,602,788 samples, 0.02%)</title><rect x="370.9" y="309" width="0.2" height="15.0" fill="rgb(240,163,39)" rx="2" ry="2" />
<text x="373.89" y="319.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (127,038,596 samples, 0.02%)</title><rect x="229.0" y="437" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="231.97" y="447.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32 (316,793,348 samples, 0.04%)</title><rect x="63.2" y="389" width="0.5" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="66.23" y="399.5" ></text>
</g>
<g >
<title>timerqueue_add (87,412,832 samples, 0.01%)</title><rect x="1186.0" y="629" width="0.1" height="15.0" fill="rgb(214,42,10)" rx="2" ry="2" />
<text x="1188.96" y="639.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (352,160,558 samples, 0.04%)</title><rect x="630.9" y="517" width="0.5" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="633.87" y="527.5" ></text>
</g>
<g >
<title>pread@plt (82,980,503 samples, 0.01%)</title><rect x="600.4" y="453" width="0.1" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="603.40" y="463.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (147,471,237 samples, 0.02%)</title><rect x="393.6" y="149" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="396.58" y="159.5" ></text>
</g>
<g >
<title>standard_ProcessUtility (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="565" width="4.4" height="15.0" fill="rgb(233,132,31)" rx="2" ry="2" />
<text x="53.20" y="575.5" ></text>
</g>
<g >
<title>finish_task_switch (192,923,108 samples, 0.02%)</title><rect x="593.9" y="245" width="0.3" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="596.89" y="255.5" ></text>
</g>
<g >
<title>LWLockAcquire (221,825,128 samples, 0.03%)</title><rect x="132.5" y="485" width="0.3" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="135.50" y="495.5" ></text>
</g>
<g >
<title>BufferAlloc (6,493,726,765 samples, 0.79%)</title><rect x="11.2" y="565" width="9.3" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="14.17" y="575.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32 (89,124,358 samples, 0.01%)</title><rect x="331.5" y="469" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="334.47" y="479.5" ></text>
</g>
<g >
<title>__find_lock_page (1,559,173,446 samples, 0.19%)</title><rect x="269.2" y="293" width="2.2" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="272.18" y="303.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (251,687,844 samples, 0.03%)</title><rect x="933.5" y="485" width="0.3" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="936.46" y="495.5" ></text>
</g>
<g >
<title>shmem_fault (169,710,844 samples, 0.02%)</title><rect x="57.3" y="293" width="0.3" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="60.32" y="303.5" ></text>
</g>
<g >
<title>page_add_file_rmap (84,308,943 samples, 0.01%)</title><rect x="127.8" y="197" width="0.1" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="130.80" y="207.5" ></text>
</g>
<g >
<title>system_call_fastpath (4,753,132,452 samples, 0.58%)</title><rect x="12.5" y="437" width="6.8" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="15.53" y="447.5" ></text>
</g>
<g >
<title>buffers_to_iovec (119,193,866 samples, 0.01%)</title><rect x="600.9" y="469" width="0.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="603.86" y="479.5" ></text>
</g>
<g >
<title>dequeue_entity (107,065,471 samples, 0.01%)</title><rect x="310.9" y="245" width="0.2" height="15.0" fill="rgb(233,130,31)" rx="2" ry="2" />
<text x="313.94" y="255.5" ></text>
</g>
<g >
<title>tas (340,898,858 samples, 0.04%)</title><rect x="317.8" y="421" width="0.5" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="320.81" y="431.5" ></text>
</g>
<g >
<title>system_call_fastpath (328,235,540 samples, 0.04%)</title><rect x="1142.1" y="453" width="0.4" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="1145.05" y="463.5" ></text>
</g>
<g >
<title>BlockIdSet (518,101,453 samples, 0.06%)</title><rect x="162.3" y="485" width="0.8" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="165.32" y="495.5" ></text>
</g>
<g >
<title>up_write (1,452,303,115 samples, 0.18%)</title><rect x="17.2" y="325" width="2.1" height="15.0" fill="rgb(235,139,33)" rx="2" ry="2" />
<text x="20.20" y="335.5" ></text>
</g>
<g >
<title>free_hot_cold_page (499,674,412 samples, 0.06%)</title><rect x="1164.6" y="517" width="0.7" height="15.0" fill="rgb(215,49,11)" rx="2" ry="2" />
<text x="1167.61" y="527.5" ></text>
</g>
<g >
<title>LWLockWakeup (261,748,892 samples, 0.03%)</title><rect x="604.1" y="485" width="0.4" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="607.13" y="495.5" ></text>
</g>
<g >
<title>xfs_log_commit_cil (93,134,707 samples, 0.01%)</title><rect x="586.7" y="229" width="0.1" height="15.0" fill="rgb(207,11,2)" rx="2" ry="2" />
<text x="589.68" y="239.5" ></text>
</g>
<g >
<title>rwsem_wake (1,442,221,405 samples, 0.17%)</title><rect x="17.2" y="293" width="2.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="20.21" y="303.5" ></text>
</g>
<g >
<title>PageIsVerifiedExtended (210,041,493 samples, 0.03%)</title><rect x="325.1" y="501" width="0.3" height="15.0" fill="rgb(251,215,51)" rx="2" ry="2" />
<text x="328.05" y="511.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (987,780,807 samples, 0.12%)</title><rect x="50.2" y="373" width="1.4" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="53.22" y="383.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (760,843,430 samples, 0.09%)</title><rect x="666.4" y="501" width="1.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="669.44" y="511.5" ></text>
</g>
<g >
<title>tick_sched_timer (151,978,247 samples, 0.02%)</title><rect x="933.5" y="421" width="0.3" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="936.55" y="431.5" ></text>
</g>
<g >
<title>do_start_bgworker (124,943,333,272 samples, 15.16%)</title><rect x="54.6" y="677" width="178.9" height="15.0" fill="rgb(217,58,14)" rx="2" ry="2" />
<text x="57.63" y="687.5" >do_start_bgworker</text>
</g>
<g >
<title>cpuidle_get_cpu_driver (152,075,996 samples, 0.02%)</title><rect x="1176.1" y="693" width="0.2" height="15.0" fill="rgb(231,121,29)" rx="2" ry="2" />
<text x="1179.13" y="703.5" ></text>
</g>
<g >
<title>BufferIsValid (116,426,514 samples, 0.01%)</title><rect x="227.1" y="405" width="0.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="230.06" y="415.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (16,015,892,933 samples, 1.94%)</title><rect x="195.3" y="485" width="23.0" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="198.33" y="495.5" >h..</text>
</g>
<g >
<title>PageGetItem (74,045,246 samples, 0.01%)</title><rect x="53.3" y="373" width="0.1" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="56.27" y="383.5" ></text>
</g>
<g >
<title>StartReadBuffer (987,780,807 samples, 0.12%)</title><rect x="50.2" y="389" width="1.4" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="53.22" y="399.5" ></text>
</g>
<g >
<title>WaitReadBuffers (194,162,665,756 samples, 23.56%)</title><rect x="323.2" y="517" width="277.9" height="15.0" fill="rgb(210,26,6)" rx="2" ry="2" />
<text x="326.17" y="527.5" >WaitReadBuffers</text>
</g>
<g >
<title>__list_del_entry (174,296,378 samples, 0.02%)</title><rect x="574.7" y="101" width="0.2" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="577.66" y="111.5" ></text>
</g>
<g >
<title>call_rwsem_wake (1,442,932,814 samples, 0.18%)</title><rect x="17.2" y="309" width="2.1" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="20.21" y="319.5" ></text>
</g>
<g >
<title>BufferAlloc (5,090,339,570 samples, 0.62%)</title><rect x="613.1" y="405" width="7.3" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="616.13" y="415.5" ></text>
</g>
<g >
<title>start_cpu (19,105,050,665 samples, 2.32%)</title><rect x="1162.6" y="773" width="27.4" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="1165.65" y="783.5" >s..</text>
</g>
<g >
<title>_raw_spin_lock_irqsave (97,476,627 samples, 0.01%)</title><rect x="14.2" y="213" width="0.1" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="17.17" y="223.5" ></text>
</g>
<g >
<title>WaitReadBuffersCanStartIO (75,733,651 samples, 0.01%)</title><rect x="20.6" y="597" width="0.1" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="23.64" y="607.5" ></text>
</g>
<g >
<title>BlockIdSet (428,700,237 samples, 0.05%)</title><rect x="147.8" y="485" width="0.6" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="150.80" y="495.5" ></text>
</g>
<g >
<title>BufferGetBlockNumber (74,426,847 samples, 0.01%)</title><rect x="143.4" y="501" width="0.1" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="146.41" y="511.5" ></text>
</g>
<g >
<title>auditsys (75,952,356 samples, 0.01%)</title><rect x="77.3" y="421" width="0.2" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="80.35" y="431.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32 (630,445,434 samples, 0.08%)</title><rect x="321.8" y="437" width="0.9" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="324.85" y="447.5" ></text>
</g>
<g >
<title>InitBufferTag (219,492,168 samples, 0.03%)</title><rect x="616.4" y="389" width="0.4" height="15.0" fill="rgb(230,116,27)" rx="2" ry="2" />
<text x="619.45" y="399.5" ></text>
</g>
<g >
<title>vacuum_delay_point (91,371,963 samples, 0.01%)</title><rect x="1145.8" y="565" width="0.2" height="15.0" fill="rgb(208,17,4)" rx="2" ry="2" />
<text x="1148.83" y="575.5" ></text>
</g>
<g >
<title>postmaster_child_launch (124,939,849,462 samples, 15.16%)</title><rect x="54.6" y="661" width="178.9" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="57.63" y="671.5" >postmaster_child_launch</text>
</g>
<g >
<title>pick_next_entity (84,032,717 samples, 0.01%)</title><rect x="1182.3" y="677" width="0.2" height="15.0" fill="rgb(244,181,43)" rx="2" ry="2" />
<text x="1185.33" y="687.5" ></text>
</g>
<g >
<title>error_swapgs (212,144,618 samples, 0.03%)</title><rect x="47.0" y="725" width="0.3" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="49.96" y="735.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (8,265,494,430 samples, 1.00%)</title><rect x="426.9" y="149" width="11.9" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="429.94" y="159.5" ></text>
</g>
<g >
<title>sysret_check (2,042,178,839 samples, 0.25%)</title><rect x="337.0" y="437" width="3.0" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="340.04" y="447.5" ></text>
</g>
<g >
<title>TransactionIdFollows (486,638,360 samples, 0.06%)</title><rect x="634.6" y="533" width="0.7" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="637.64" y="543.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (327,137,038 samples, 0.04%)</title><rect x="602.0" y="469" width="0.5" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="605.04" y="479.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (3,363,540,090 samples, 0.41%)</title><rect x="1111.8" y="453" width="4.8" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1114.82" y="463.5" ></text>
</g>
<g >
<title>PageGetItemId (2,601,470,039 samples, 0.32%)</title><rect x="977.3" y="485" width="3.7" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="980.26" y="495.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (6,507,529,578 samples, 0.79%)</title><rect x="11.2" y="597" width="9.3" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="14.16" y="607.5" ></text>
</g>
<g >
<title>LWLockHeldByMeInMode (122,673,726 samples, 0.01%)</title><rect x="631.5" y="517" width="0.2" height="15.0" fill="rgb(207,12,2)" rx="2" ry="2" />
<text x="634.51" y="527.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (80,440,470 samples, 0.01%)</title><rect x="79.9" y="261" width="0.2" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="82.95" y="271.5" ></text>
</g>
<g >
<title>LWLockAcquire (1,565,521,410 samples, 0.19%)</title><rect x="601.8" y="501" width="2.2" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="604.80" y="511.5" ></text>
</g>
<g >
<title>UnpinBuffer (371,735,190 samples, 0.05%)</title><rect x="607.0" y="501" width="0.5" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="609.97" y="511.5" ></text>
</g>
<g >
<title>LockBuffer (366,250,569 samples, 0.04%)</title><rect x="232.2" y="501" width="0.6" height="15.0" fill="rgb(235,142,34)" rx="2" ry="2" />
<text x="235.24" y="511.5" ></text>
</g>
<g >
<title>hrtimer_wakeup (760,595,344 samples, 0.09%)</title><rect x="1165.6" y="613" width="1.1" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="1168.57" y="623.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (75,893,920 samples, 0.01%)</title><rect x="173.3" y="453" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="176.34" y="463.5" ></text>
</g>
<g >
<title>TransactionIdFollows (180,256,970 samples, 0.02%)</title><rect x="191.0" y="469" width="0.2" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="193.98" y="479.5" ></text>
</g>
<g >
<title>native_write_msr_safe (147,847,573 samples, 0.02%)</title><rect x="1187.9" y="629" width="0.2" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="1190.86" y="639.5" ></text>
</g>
<g >
<title>parallel_vacuum_process_table (124,325,521,611 samples, 15.08%)</title><rect x="55.2" y="597" width="178.0" height="15.0" fill="rgb(205,3,0)" rx="2" ry="2" />
<text x="58.21" y="607.5" >parallel_vacuum_process..</text>
</g>
<g >
<title>__hrtimer_run_queues (80,197,691 samples, 0.01%)</title><rect x="853.8" y="405" width="0.1" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="856.82" y="415.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (603,472,699 samples, 0.07%)</title><rect x="392.6" y="197" width="0.9" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="395.62" y="207.5" ></text>
</g>
<g >
<title>htsv_get_valid_status (208,668,088 samples, 0.03%)</title><rect x="230.8" y="501" width="0.3" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="233.80" y="511.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (450,921,617 samples, 0.05%)</title><rect x="635.3" y="533" width="0.7" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="638.33" y="543.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (1,253,447,553 samples, 0.15%)</title><rect x="1059.0" y="485" width="1.8" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="1062.05" y="495.5" ></text>
</g>
<g >
<title>BufferGetBlock (2,078,122,577 samples, 0.25%)</title><rect x="1106.4" y="437" width="2.9" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="1109.37" y="447.5" ></text>
</g>
<g >
<title>page_fault (100,582,489 samples, 0.01%)</title><rect x="11.3" y="501" width="0.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="14.32" y="511.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (342,353,158 samples, 0.04%)</title><rect x="235.8" y="533" width="0.5" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="238.85" y="543.5" ></text>
</g>
<g >
<title>ReadBufferExtended (1,180,547,054 samples, 0.14%)</title><rect x="50.2" y="421" width="1.7" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="53.22" y="431.5" ></text>
</g>
<g >
<title>__alloc_pages_nodemask (175,142,150 samples, 0.02%)</title><rect x="86.5" y="181" width="0.2" height="15.0" fill="rgb(228,108,25)" rx="2" ry="2" />
<text x="89.45" y="191.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (75,893,920 samples, 0.01%)</title><rect x="173.3" y="389" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="176.34" y="399.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (117,806,007 samples, 0.01%)</title><rect x="770.0" y="501" width="0.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="772.98" y="511.5" ></text>
</g>
<g >
<title>heap_prepare_freeze_tuple (130,238,813 samples, 0.02%)</title><rect x="194.3" y="485" width="0.2" height="15.0" fill="rgb(227,101,24)" rx="2" ry="2" />
<text x="197.30" y="495.5" ></text>
</g>
<g >
<title>__tick_nohz_idle_enter (2,292,490,269 samples, 0.28%)</title><rect x="1183.3" y="709" width="3.3" height="15.0" fill="rgb(223,85,20)" rx="2" ry="2" />
<text x="1186.33" y="719.5" ></text>
</g>
<g >
<title>SetHintBits (928,911,221 samples, 0.11%)</title><rect x="41.2" y="581" width="1.3" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="44.16" y="591.5" ></text>
</g>
<g >
<title>StartReadBuffer (173,270,851 samples, 0.02%)</title><rect x="22.8" y="565" width="0.2" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="25.77" y="575.5" ></text>
</g>
<g >
<title>pgstat_tracks_io_object (129,551,904 samples, 0.02%)</title><rect x="332.5" y="453" width="0.2" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="335.54" y="463.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (372,039,770 samples, 0.05%)</title><rect x="445.1" y="133" width="0.5" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="448.07" y="143.5" ></text>
</g>
<g >
<title>TransactionIdFollows (2,709,146,850 samples, 0.33%)</title><rect x="981.0" y="485" width="3.9" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="983.98" y="495.5" ></text>
</g>
<g >
<title>_raw_qspin_lock (423,365,647 samples, 0.05%)</title><rect x="388.5" y="245" width="0.6" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="391.54" y="255.5" ></text>
</g>
<g >
<title>smp_reschedule_interrupt (205,569,025 samples, 0.02%)</title><rect x="1189.6" y="629" width="0.3" height="15.0" fill="rgb(225,96,23)" rx="2" ry="2" />
<text x="1192.62" y="639.5" ></text>
</g>
<g >
<title>lru_add_drain (232,758,886 samples, 0.03%)</title><rect x="1159.0" y="597" width="0.3" height="15.0" fill="rgb(229,113,27)" rx="2" ry="2" />
<text x="1162.01" y="607.5" ></text>
</g>
<g >
<title>heap_page_prune_execute (277,559,257 samples, 0.03%)</title><rect x="52.7" y="389" width="0.4" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="55.69" y="399.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (1,182,210,687 samples, 0.14%)</title><rect x="318.7" y="437" width="1.6" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="321.66" y="447.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (97,460,343 samples, 0.01%)</title><rect x="135.1" y="341" width="0.1" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="138.11" y="351.5" ></text>
</g>
<g >
<title>LWLockRelease (126,604,575 samples, 0.02%)</title><rect x="605.1" y="517" width="0.1" height="15.0" fill="rgb(217,58,13)" rx="2" ry="2" />
<text x="608.06" y="527.5" ></text>
</g>
<g >
<title>TransactionIdGetCommitLSN (97,888,579 samples, 0.01%)</title><rect x="229.6" y="469" width="0.1" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="232.60" y="479.5" ></text>
</g>
<g >
<title>UnlockReleaseBuffer (1,116,857,671 samples, 0.14%)</title><rect x="604.6" y="549" width="1.6" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="607.63" y="559.5" ></text>
</g>
<g >
<title>system_call_fastpath (105,899,380 samples, 0.01%)</title><rect x="320.8" y="405" width="0.2" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="323.82" y="415.5" ></text>
</g>
<g >
<title>xfs_vn_update_time (608,268,816 samples, 0.07%)</title><rect x="14.9" y="293" width="0.9" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="17.90" y="303.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (771,909,975 samples, 0.09%)</title><rect x="348.3" y="277" width="1.1" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="351.25" y="287.5" ></text>
</g>
<g >
<title>__do_page_fault (1,125,840,029 samples, 0.14%)</title><rect x="59.2" y="373" width="1.6" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="62.24" y="383.5" ></text>
</g>
<g >
<title>table_block_parallelscan_nextpage (164,549,416 samples, 0.02%)</title><rect x="133.4" y="501" width="0.3" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="136.43" y="511.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (738,358,079 samples, 0.09%)</title><rect x="666.5" y="485" width="1.0" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="669.47" y="495.5" ></text>
</g>
<g >
<title>BufferIsValid (130,432,966 samples, 0.02%)</title><rect x="631.2" y="485" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="634.18" y="495.5" ></text>
</g>
<g >
<title>UnpinBufferNoOwner (271,007,839 samples, 0.03%)</title><rect x="605.7" y="501" width="0.4" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="608.72" y="511.5" ></text>
</g>
<g >
<title>ResourceOwnerForgetBufferIO (328,641,149 samples, 0.04%)</title><rect x="330.1" y="485" width="0.4" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="333.08" y="495.5" ></text>
</g>
<g >
<title>xfs_file_buffered_aio_read (177,545,279,973 samples, 21.54%)</title><rect x="342.9" y="357" width="254.2" height="15.0" fill="rgb(217,55,13)" rx="2" ry="2" />
<text x="345.87" y="367.5" >xfs_file_buffered_aio_read</text>
</g>
<g >
<title>ktime_get (103,708,651 samples, 0.01%)</title><rect x="1176.4" y="693" width="0.2" height="15.0" fill="rgb(207,10,2)" rx="2" ry="2" />
<text x="1179.41" y="703.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (75,893,920 samples, 0.01%)</title><rect x="173.3" y="421" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="176.34" y="431.5" ></text>
</g>
<g >
<title>PageGetItemId (6,143,384,329 samples, 0.75%)</title><rect x="921.9" y="501" width="8.8" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="924.95" y="511.5" ></text>
</g>
<g >
<title>do_sync_read (125,082,965 samples, 0.02%)</title><rect x="51.7" y="261" width="0.2" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="54.71" y="271.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32_impl (1,664,615,677 samples, 0.20%)</title><rect x="264.2" y="389" width="2.4" height="15.0" fill="rgb(253,224,53)" rx="2" ry="2" />
<text x="267.20" y="399.5" ></text>
</g>
<g >
<title>touch_softlockup_watchdog_sched (74,319,380 samples, 0.01%)</title><rect x="1188.1" y="709" width="0.2" height="15.0" fill="rgb(208,17,4)" rx="2" ry="2" />
<text x="1191.15" y="719.5" ></text>
</g>
<g >
<title>error_sti (97,105,749 samples, 0.01%)</title><rect x="58.9" y="405" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="61.94" y="415.5" ></text>
</g>
<g >
<title>down_read_trylock (71,760,383 samples, 0.01%)</title><rect x="382.0" y="261" width="0.1" height="15.0" fill="rgb(219,66,15)" rx="2" ry="2" />
<text x="384.97" y="271.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (331,602,417 samples, 0.04%)</title><rect x="784.3" y="453" width="0.5" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="787.34" y="463.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (279,402,846 samples, 0.03%)</title><rect x="161.2" y="501" width="0.4" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="164.24" y="511.5" ></text>
</g>
<g >
<title>__do_softirq (534,263,057 samples, 0.06%)</title><rect x="1164.6" y="613" width="0.7" height="15.0" fill="rgb(246,191,45)" rx="2" ry="2" />
<text x="1167.57" y="623.5" ></text>
</g>
<g >
<title>FileWriteV (4,930,570,578 samples, 0.60%)</title><rect x="12.3" y="469" width="7.0" height="15.0" fill="rgb(248,201,48)" rx="2" ry="2" />
<text x="15.28" y="479.5" ></text>
</g>
<g >
<title>set_page_dirty (1,451,339,273 samples, 0.18%)</title><rect x="582.4" y="229" width="2.0" height="15.0" fill="rgb(231,123,29)" rx="2" ry="2" />
<text x="585.36" y="239.5" ></text>
</g>
<g >
<title>do_futex_wait.constprop.1 (842,534,330 samples, 0.10%)</title><rect x="48.2" y="757" width="1.2" height="15.0" fill="rgb(237,150,36)" rx="2" ry="2" />
<text x="51.24" y="767.5" ></text>
</g>
<g >
<title>xfs_file_aio_read (37,063,306,932 samples, 4.50%)</title><rect x="78.8" y="357" width="53.0" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="81.78" y="367.5" >xfs_f..</text>
</g>
<g >
<title>generic_file_aio_read (735,764,864 samples, 0.09%)</title><rect x="21.3" y="437" width="1.0" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="24.25" y="447.5" ></text>
</g>
<g >
<title>futex_wake (72,503,396 samples, 0.01%)</title><rect x="232.6" y="389" width="0.1" height="15.0" fill="rgb(219,65,15)" rx="2" ry="2" />
<text x="235.59" y="399.5" ></text>
</g>
<g >
<title>TransactionIdFollows (1,004,135,401 samples, 0.12%)</title><rect x="914.7" y="485" width="1.4" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="917.71" y="495.5" ></text>
</g>
<g >
<title>__dec_zone_page_state (184,851,726 samples, 0.02%)</title><rect x="1155.2" y="613" width="0.2" height="15.0" fill="rgb(250,208,49)" rx="2" ry="2" />
<text x="1158.17" y="623.5" ></text>
</g>
<g >
<title>do_page_fault (3,492,358,859 samples, 0.42%)</title><rect x="266.9" y="405" width="5.0" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="269.87" y="415.5" ></text>
</g>
<g >
<title>visibilitymap_set (252,733,328 samples, 0.03%)</title><rect x="42.9" y="629" width="0.4" height="15.0" fill="rgb(220,73,17)" rx="2" ry="2" />
<text x="45.89" y="639.5" ></text>
</g>
<g >
<title>BufferIsValid (143,364,206 samples, 0.02%)</title><rect x="42.0" y="533" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="44.98" y="543.5" ></text>
</g>
<g >
<title>ktime_get (151,921,201 samples, 0.02%)</title><rect x="1175.8" y="677" width="0.3" height="15.0" fill="rgb(207,10,2)" rx="2" ry="2" />
<text x="1178.84" y="687.5" ></text>
</g>
<g >
<title>__rmqueue (1,131,138,278 samples, 0.14%)</title><rect x="572.7" y="117" width="1.6" height="15.0" fill="rgb(249,203,48)" rx="2" ry="2" />
<text x="575.71" y="127.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (400,029,322 samples, 0.05%)</title><rect x="60.9" y="405" width="0.6" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="63.89" y="415.5" ></text>
</g>
<g >
<title>perf_pmu_sched_task (95,040,665 samples, 0.01%)</title><rect x="1181.9" y="661" width="0.2" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="1184.93" y="671.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (152,333,997 samples, 0.02%)</title><rect x="617.0" y="357" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="619.98" y="367.5" ></text>
</g>
<g >
<title>do_start_bgworker (637,265,816,681 samples, 77.32%)</title><rect x="233.6" y="693" width="912.4" height="15.0" fill="rgb(217,58,14)" rx="2" ry="2" />
<text x="236.63" y="703.5" >do_start_bgworker</text>
</g>
<g >
<title>tag_hash (365,341,020 samples, 0.04%)</title><rect x="238.0" y="421" width="0.6" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="241.05" y="431.5" ></text>
</g>
<g >
<title>heap_page_prune_execute (2,120,731,342 samples, 0.26%)</title><rect x="29.7" y="613" width="3.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="32.74" y="623.5" ></text>
</g>
<g >
<title>try_to_wake_up (149,194,408 samples, 0.02%)</title><rect x="1142.3" y="373" width="0.2" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="1145.29" y="383.5" ></text>
</g>
<g >
<title>do_page_fault (306,316,030 samples, 0.04%)</title><rect x="75.4" y="421" width="0.4" height="15.0" fill="rgb(216,54,13)" rx="2" ry="2" />
<text x="78.40" y="431.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (974,293,906 samples, 0.12%)</title><rect x="1165.4" y="629" width="1.4" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="1168.45" y="639.5" ></text>
</g>
<g >
<title>writeback_sb_inodes (115,471,936 samples, 0.01%)</title><rect x="10.1" y="661" width="0.2" height="15.0" fill="rgb(237,148,35)" rx="2" ry="2" />
<text x="13.10" y="671.5" ></text>
</g>
<g >
<title>__nanosleep_nocancel (161,702,524 samples, 0.02%)</title><rect x="71.7" y="373" width="0.3" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="74.74" y="383.5" ></text>
</g>
<g >
<title>__pte_alloc (280,034,695 samples, 0.03%)</title><rect x="86.3" y="229" width="0.4" height="15.0" fill="rgb(218,62,15)" rx="2" ry="2" />
<text x="89.31" y="239.5" ></text>
</g>
<g >
<title>__list_del_entry (257,289,144 samples, 0.03%)</title><rect x="126.5" y="69" width="0.4" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="129.50" y="79.5" ></text>
</g>
<g >
<title>copy_user_enhanced_fast_string (2,495,620,242 samples, 0.30%)</title><rect x="80.2" y="293" width="3.6" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="83.18" y="303.5" ></text>
</g>
<g >
<title>ResourceOwnerRememberBuffer (186,384,007 samples, 0.02%)</title><rect x="620.0" y="373" width="0.2" height="15.0" fill="rgb(205,0,0)" rx="2" ry="2" />
<text x="622.96" y="383.5" ></text>
</g>
<g >
<title>PageGetItemId (206,443,479 samples, 0.03%)</title><rect x="137.4" y="517" width="0.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="140.40" y="527.5" ></text>
</g>
<g >
<title>BlockIdSet (2,751,605,618 samples, 0.33%)</title><rect x="688.8" y="501" width="3.9" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="691.78" y="511.5" ></text>
</g>
<g >
<title>iomap_write_begin (201,277,307 samples, 0.02%)</title><rect x="13.7" y="293" width="0.3" height="15.0" fill="rgb(211,30,7)" rx="2" ry="2" />
<text x="16.71" y="303.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (2,576,624,425 samples, 0.31%)</title><rect x="1093.7" y="453" width="3.7" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1096.71" y="463.5" ></text>
</g>
<g >
<title>cpu_startup_entry (376,644,420 samples, 0.05%)</title><rect x="1189.5" y="693" width="0.5" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="1192.46" y="703.5" ></text>
</g>
<g >
<title>hash_initial_lookup (104,078,404 samples, 0.01%)</title><rect x="248.7" y="421" width="0.1" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="251.69" y="431.5" ></text>
</g>
<g >
<title>rwsem_wake (267,114,761 samples, 0.03%)</title><rect x="51.1" y="69" width="0.4" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="54.10" y="79.5" ></text>
</g>
<g >
<title>mdreadv (38,842,203,263 samples, 4.71%)</title><rect x="76.7" y="469" width="55.6" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="79.73" y="479.5" >mdreadv</text>
</g>
<g >
<title>GetPrivateRefCountEntry (233,531,901 samples, 0.03%)</title><rect x="228.2" y="421" width="0.3" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="231.17" y="431.5" ></text>
</g>
<g >
<title>radix_tree_descend (404,985,780 samples, 0.05%)</title><rect x="94.3" y="101" width="0.6" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="97.31" y="111.5" ></text>
</g>
<g >
<title>do_lazy_scan_heap (124,261,154,376 samples, 15.08%)</title><rect x="55.3" y="549" width="177.9" height="15.0" fill="rgb(221,75,18)" rx="2" ry="2" />
<text x="58.27" y="559.5" >do_lazy_scan_heap</text>
</g>
<g >
<title>__nanosleep_nocancel (1,192,127,972 samples, 0.14%)</title><rect x="309.8" y="389" width="1.7" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="312.76" y="399.5" ></text>
</g>
<g >
<title>get_page_from_freelist (303,916,593 samples, 0.04%)</title><rect x="387.9" y="181" width="0.5" height="15.0" fill="rgb(252,218,52)" rx="2" ry="2" />
<text x="390.93" y="191.5" ></text>
</g>
<g >
<title>unlock_page (653,952,329 samples, 0.08%)</title><rect x="584.4" y="229" width="1.0" height="15.0" fill="rgb(220,69,16)" rx="2" ry="2" />
<text x="587.44" y="239.5" ></text>
</g>
<g >
<title>deactivate_task (180,545,601 samples, 0.02%)</title><rect x="310.9" y="277" width="0.3" height="15.0" fill="rgb(206,8,2)" rx="2" ry="2" />
<text x="313.89" y="287.5" ></text>
</g>
<g >
<title>release_pages (982,397,947 samples, 0.12%)</title><rect x="1159.3" y="597" width="1.4" height="15.0" fill="rgb(228,106,25)" rx="2" ry="2" />
<text x="1162.34" y="607.5" ></text>
</g>
<g >
<title>spin_delay (1,097,293,537 samples, 0.13%)</title><rect x="311.6" y="389" width="1.6" height="15.0" fill="rgb(240,162,38)" rx="2" ry="2" />
<text x="314.61" y="399.5" ></text>
</g>
<g >
<title>PageGetItem (4,897,701,292 samples, 0.59%)</title><rect x="770.9" y="517" width="7.0" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="773.87" y="527.5" ></text>
</g>
<g >
<title>BufferIsPermanent (6,832,761,335 samples, 0.83%)</title><rect x="1089.9" y="469" width="9.8" height="15.0" fill="rgb(250,210,50)" rx="2" ry="2" />
<text x="1092.94" y="479.5" ></text>
</g>
<g >
<title>tick_sched_timer (105,887,461 samples, 0.01%)</title><rect x="203.0" y="373" width="0.2" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="206.01" y="383.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (1,175,890,158 samples, 0.14%)</title><rect x="1165.4" y="661" width="1.7" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="1168.38" y="671.5" ></text>
</g>
<g >
<title>radix_tree_descend (2,204,797,539 samples, 0.27%)</title><rect x="435.6" y="133" width="3.2" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="438.62" y="143.5" ></text>
</g>
<g >
<title>pgstat_count_io_op_n (295,160,674 samples, 0.04%)</title><rect x="620.7" y="389" width="0.4" height="15.0" fill="rgb(232,128,30)" rx="2" ry="2" />
<text x="623.71" y="399.5" ></text>
</g>
<g >
<title>PageGetItem (152,358,508 samples, 0.02%)</title><rect x="137.2" y="517" width="0.2" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="140.19" y="527.5" ></text>
</g>
<g >
<title>hash_initial_lookup (94,115,665 samples, 0.01%)</title><rect x="616.2" y="357" width="0.2" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="619.24" y="367.5" ></text>
</g>
<g >
<title>PageGetItemId (3,912,143,579 samples, 0.47%)</title><rect x="819.2" y="485" width="5.6" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="822.16" y="495.5" ></text>
</g>
<g >
<title>PageGetItemId (724,909,639 samples, 0.09%)</title><rect x="202.1" y="469" width="1.1" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="205.13" y="479.5" ></text>
</g>
<g >
<title>spin_delay (520,729,305 samples, 0.06%)</title><rect x="72.2" y="389" width="0.8" height="15.0" fill="rgb(240,162,38)" rx="2" ry="2" />
<text x="75.22" y="399.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (13,161,404,722 samples, 1.60%)</title><rect x="55.8" y="485" width="18.9" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="58.84" y="495.5" ></text>
</g>
<g >
<title>handle_mm_fault (305,172,058 samples, 0.04%)</title><rect x="57.2" y="341" width="0.5" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="60.23" y="351.5" ></text>
</g>
<g >
<title>update_vacuum_error_info (91,667,286 samples, 0.01%)</title><rect x="1144.0" y="549" width="0.1" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="1146.97" y="559.5" ></text>
</g>
<g >
<title>page_verify_redirects (1,503,325,323 samples, 0.18%)</title><rect x="179.2" y="485" width="2.1" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="182.16" y="495.5" ></text>
</g>
<g >
<title>touch_atime (109,998,855 samples, 0.01%)</title><rect x="129.2" y="293" width="0.2" height="15.0" fill="rgb(205,2,0)" rx="2" ry="2" />
<text x="132.24" y="303.5" ></text>
</g>
<g >
<title>PageGetItem (726,318,877 samples, 0.09%)</title><rect x="191.2" y="485" width="1.1" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="194.24" y="495.5" ></text>
</g>
<g >
<title>current_fs_time (107,024,710 samples, 0.01%)</title><rect x="582.0" y="213" width="0.2" height="15.0" fill="rgb(219,67,16)" rx="2" ry="2" />
<text x="585.00" y="223.5" ></text>
</g>
<g >
<title>startup_hacks (637,279,068,599 samples, 77.32%)</title><rect x="233.6" y="773" width="912.4" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="236.61" y="783.5" >startup_hacks</text>
</g>
<g >
<title>heap_vac_scan_next_block (349,956,658 samples, 0.04%)</title><rect x="23.3" y="645" width="0.5" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="26.25" y="655.5" ></text>
</g>
<g >
<title>xfs_iunlock (1,323,827,092 samples, 0.16%)</title><rect x="595.2" y="341" width="1.9" height="15.0" fill="rgb(232,127,30)" rx="2" ry="2" />
<text x="598.16" y="351.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (71,803,743 samples, 0.01%)</title><rect x="226.0" y="453" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="229.01" y="463.5" ></text>
</g>
<g >
<title>shmem_alloc_page (1,182,607,381 samples, 0.14%)</title><rect x="125.6" y="165" width="1.6" height="15.0" fill="rgb(214,42,10)" rx="2" ry="2" />
<text x="128.56" y="175.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (445,167,955 samples, 0.05%)</title><rect x="1116.6" y="453" width="0.7" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1119.64" y="463.5" ></text>
</g>
<g >
<title>LockBufHdr (81,409,022 samples, 0.01%)</title><rect x="228.8" y="437" width="0.1" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="231.81" y="447.5" ></text>
</g>
<g >
<title>page_fault (31,565,685,400 samples, 3.83%)</title><rect x="84.0" y="293" width="45.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="86.97" y="303.5" >page..</text>
</g>
<g >
<title>pg_atomic_sub_fetch_u32 (184,691,260 samples, 0.02%)</title><rect x="1142.5" y="485" width="0.3" height="15.0" fill="rgb(242,174,41)" rx="2" ry="2" />
<text x="1145.53" y="495.5" ></text>
</g>
<g >
<title>scheduler_tick (105,887,461 samples, 0.01%)</title><rect x="203.0" y="325" width="0.2" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="206.01" y="335.5" ></text>
</g>
<g >
<title>nr_iowait_cpu (143,367,153 samples, 0.02%)</title><rect x="1164.2" y="629" width="0.2" height="15.0" fill="rgb(252,216,51)" rx="2" ry="2" />
<text x="1167.17" y="639.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (85,121,277,223 samples, 10.33%)</title><rect x="939.0" y="501" width="121.8" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="941.98" y="511.5" >heap_prune_reco..</text>
</g>
<g >
<title>hash_bytes (302,876,303 samples, 0.04%)</title><rect x="238.1" y="405" width="0.4" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="241.06" y="415.5" ></text>
</g>
<g >
<title>PageGetItem (194,036,472 samples, 0.02%)</title><rect x="34.6" y="597" width="0.3" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="37.61" y="607.5" ></text>
</g>
<g >
<title>mem_cgroup_update_page_stat (85,938,277 samples, 0.01%)</title><rect x="579.2" y="197" width="0.1" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="582.22" y="207.5" ></text>
</g>
<g >
<title>up_write (267,816,803 samples, 0.03%)</title><rect x="51.1" y="101" width="0.4" height="15.0" fill="rgb(235,139,33)" rx="2" ry="2" />
<text x="54.10" y="111.5" ></text>
</g>
<g >
<title>__find_lock_page (1,317,373,682 samples, 0.16%)</title><rect x="244.5" y="277" width="1.9" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="247.53" y="287.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (88,086,298 samples, 0.01%)</title><rect x="41.5" y="549" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="44.47" y="559.5" ></text>
</g>
<g >
<title>ItemPointerSet (1,206,980,740 samples, 0.15%)</title><rect x="146.7" y="501" width="1.7" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="149.69" y="511.5" ></text>
</g>
<g >
<title>TransactionIdFollows (834,428,765 samples, 0.10%)</title><rect x="150.0" y="501" width="1.1" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="152.95" y="511.5" ></text>
</g>
<g >
<title>lapic_next_deadline (167,573,385 samples, 0.02%)</title><rect x="1187.8" y="645" width="0.3" height="15.0" fill="rgb(222,82,19)" rx="2" ry="2" />
<text x="1190.83" y="655.5" ></text>
</g>
<g >
<title>iomap_write_actor (257,110,195 samples, 0.03%)</title><rect x="10.4" y="517" width="0.4" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="13.44" y="527.5" ></text>
</g>
<g >
<title>do_set_pte (99,454,556 samples, 0.01%)</title><rect x="271.4" y="341" width="0.2" height="15.0" fill="rgb(253,221,52)" rx="2" ry="2" />
<text x="274.44" y="351.5" ></text>
</g>
<g >
<title>schedule (104,722,866 samples, 0.01%)</title><rect x="48.6" y="661" width="0.1" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="51.57" y="671.5" ></text>
</g>
<g >
<title>__find_get_page (232,454,527 samples, 0.03%)</title><rect x="64.0" y="261" width="0.3" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="66.99" y="271.5" ></text>
</g>
<g >
<title>PageRepairFragmentation (209,052,116 samples, 0.03%)</title><rect x="52.7" y="373" width="0.3" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="55.73" y="383.5" ></text>
</g>
<g >
<title>sys_pread64 (889,747,881 samples, 0.11%)</title><rect x="21.2" y="517" width="1.2" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="24.15" y="527.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (115,895,397 samples, 0.01%)</title><rect x="1139.5" y="501" width="0.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1142.53" y="511.5" ></text>
</g>
<g >
<title>update_process_times (110,255,351 samples, 0.01%)</title><rect x="1086.9" y="357" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1089.95" y="367.5" ></text>
</g>
<g >
<title>__find_get_page (720,144,232 samples, 0.09%)</title><rect x="256.0" y="277" width="1.0" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="258.95" y="287.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (38,186,273,238 samples, 4.63%)</title><rect x="1071.3" y="501" width="54.6" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="1074.28" y="511.5" >HeapT..</text>
</g>
<g >
<title>get_hash_entry (92,960,302 samples, 0.01%)</title><rect x="73.8" y="405" width="0.1" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="76.80" y="415.5" ></text>
</g>
<g >
<title>lazy_scan_prune (1,746,448,023 samples, 0.21%)</title><rect x="52.1" y="421" width="2.5" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="55.10" y="431.5" ></text>
</g>
<g >
<title>PortalRunMulti (3,088,212,841 samples, 0.37%)</title><rect x="50.2" y="613" width="4.4" height="15.0" fill="rgb(245,184,44)" rx="2" ry="2" />
<text x="53.20" y="623.5" ></text>
</g>
<g >
<title>timerqueue_del (139,858,843 samples, 0.02%)</title><rect x="1185.7" y="629" width="0.2" height="15.0" fill="rgb(236,145,34)" rx="2" ry="2" />
<text x="1188.70" y="639.5" ></text>
</g>
<g >
<title>radix_tree_lookup_slot (761,273,605 samples, 0.09%)</title><rect x="245.3" y="245" width="1.1" height="15.0" fill="rgb(210,23,5)" rx="2" ry="2" />
<text x="248.32" y="255.5" ></text>
</g>
<g >
<title>do_parallel_lazy_scan_heap (3,081,731,940 samples, 0.37%)</title><rect x="50.2" y="453" width="4.4" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="53.20" y="463.5" ></text>
</g>
<g >
<title>tag_hash (178,509,464 samples, 0.02%)</title><rect x="56.1" y="405" width="0.2" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="59.09" y="415.5" ></text>
</g>
<g >
<title>ItemPointerSet (6,201,621,340 samples, 0.75%)</title><rect x="683.9" y="517" width="8.8" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="686.86" y="527.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (82,655,314 samples, 0.01%)</title><rect x="604.9" y="517" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="607.93" y="527.5" ></text>
</g>
<g >
<title>sys_futex (105,899,380 samples, 0.01%)</title><rect x="320.8" y="389" width="0.2" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="323.82" y="399.5" ></text>
</g>
<g >
<title>memcg_check_events (289,441,041 samples, 0.04%)</title><rect x="451.2" y="133" width="0.4" height="15.0" fill="rgb(206,4,1)" rx="2" ry="2" />
<text x="454.17" y="143.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (74,985,907 samples, 0.01%)</title><rect x="621.3" y="405" width="0.1" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="624.34" y="415.5" ></text>
</g>
<g >
<title>pgstat_progress_update_param (117,893,398 samples, 0.01%)</title><rect x="1143.8" y="549" width="0.1" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="1146.77" y="559.5" ></text>
</g>
<g >
<title>BackendStartup (3,096,750,624 samples, 0.38%)</title><rect x="50.2" y="709" width="4.4" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="53.19" y="719.5" ></text>
</g>
<g >
<title>UnlockReleaseBuffer (241,450,219 samples, 0.03%)</title><rect x="132.9" y="533" width="0.3" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="135.88" y="543.5" ></text>
</g>
<g >
<title>pg_atomic_sub_fetch_u32 (233,788,298 samples, 0.03%)</title><rect x="321.1" y="437" width="0.4" height="15.0" fill="rgb(242,174,41)" rx="2" ry="2" />
<text x="324.14" y="447.5" ></text>
</g>
<g >
<title>fget_light (260,561,970 samples, 0.03%)</title><rect x="340.9" y="405" width="0.3" height="15.0" fill="rgb(211,28,6)" rx="2" ry="2" />
<text x="343.87" y="415.5" ></text>
</g>
<g >
<title>__find_get_page (1,332,434,262 samples, 0.16%)</title><rect x="93.5" y="149" width="1.9" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="96.54" y="159.5" ></text>
</g>
<g >
<title>__list_del_entry (105,743,768 samples, 0.01%)</title><rect x="572.5" y="117" width="0.1" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="575.48" y="127.5" ></text>
</g>
<g >
<title>register_dirty_segment (93,260,048 samples, 0.01%)</title><rect x="19.4" y="469" width="0.1" height="15.0" fill="rgb(253,223,53)" rx="2" ry="2" />
<text x="22.39" y="479.5" ></text>
</g>
<g >
<title>path_put (75,882,312 samples, 0.01%)</title><rect x="77.7" y="389" width="0.1" height="15.0" fill="rgb(249,206,49)" rx="2" ry="2" />
<text x="80.70" y="399.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (105,887,461 samples, 0.01%)</title><rect x="203.0" y="405" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="206.01" y="415.5" ></text>
</g>
<g >
<title>clear_page_c_e (120,456,993 samples, 0.01%)</title><rect x="86.5" y="165" width="0.1" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="89.46" y="175.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (179,584,817 samples, 0.02%)</title><rect x="1120.0" y="437" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1122.99" y="447.5" ></text>
</g>
<g >
<title>__pread_nocancel (185,374,778,929 samples, 22.49%)</title><rect x="334.6" y="453" width="265.4" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="337.57" y="463.5" >__pread_nocancel</text>
</g>
<g >
<title>__lru_cache_add (1,253,252,879 samples, 0.15%)</title><rect x="439.0" y="181" width="1.8" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="441.97" y="191.5" ></text>
</g>
<g >
<title>heap_page_prune_execute (10,657,505,639 samples, 1.29%)</title><rect x="166.1" y="501" width="15.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="169.05" y="511.5" ></text>
</g>
<g >
<title>StartReadBuffer (1,298,543,922 samples, 0.16%)</title><rect x="133.8" y="437" width="1.9" height="15.0" fill="rgb(222,78,18)" rx="2" ry="2" />
<text x="136.83" y="447.5" ></text>
</g>
<g >
<title>visibilitymap_pin (88,388,723 samples, 0.01%)</title><rect x="233.0" y="533" width="0.2" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="236.04" y="543.5" ></text>
</g>
<g >
<title>MarkBufferDirtyHint (12,348,481,497 samples, 1.50%)</title><rect x="1102.8" y="469" width="17.7" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="1105.78" y="479.5" ></text>
</g>
<g >
<title>rwsem_down_write_failed (77,779,055 samples, 0.01%)</title><rect x="586.5" y="213" width="0.1" height="15.0" fill="rgb(230,116,27)" rx="2" ry="2" />
<text x="589.51" y="223.5" ></text>
</g>
<g >
<title>BufTableDelete (214,760,085 samples, 0.03%)</title><rect x="19.5" y="517" width="0.3" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="22.53" y="527.5" ></text>
</g>
<g >
<title>BufferGetPage (161,013,276 samples, 0.02%)</title><rect x="1138.3" y="517" width="0.3" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="1141.35" y="527.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (224,533,485 samples, 0.03%)</title><rect x="257.1" y="421" width="0.3" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="260.13" y="431.5" ></text>
</g>
<g >
<title>call_rwsem_wake (177,494,685 samples, 0.02%)</title><rect x="596.8" y="309" width="0.3" height="15.0" fill="rgb(231,119,28)" rx="2" ry="2" />
<text x="599.80" y="319.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (136,246,566 samples, 0.02%)</title><rect x="1120.2" y="453" width="0.2" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1123.25" y="463.5" ></text>
</g>
<g >
<title>handle_mm_fault (142,095,976,800 samples, 17.24%)</title><rect x="382.1" y="261" width="203.4" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="385.10" y="271.5" >handle_mm_fault</text>
</g>
<g >
<title>unmap_page_range (10,140,091,964 samples, 1.23%)</title><rect x="1146.3" y="645" width="14.6" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1149.35" y="655.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (126,685,162 samples, 0.02%)</title><rect x="990.7" y="421" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="993.75" y="431.5" ></text>
</g>
<g >
<title>pg_rotate_left32 (89,420,642 samples, 0.01%)</title><rect x="238.4" y="389" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="241.37" y="399.5" ></text>
</g>
<g >
<title>page_remove_rmap (2,012,295,836 samples, 0.24%)</title><rect x="1154.6" y="629" width="2.8" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="1157.57" y="639.5" ></text>
</g>
<g >
<title>__find_get_page (89,010,712 samples, 0.01%)</title><rect x="13.9" y="245" width="0.1" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="16.87" y="255.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (1,061,775,149 samples, 0.13%)</title><rect x="616.8" y="373" width="1.5" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="619.81" y="383.5" ></text>
</g>
<g >
<title>__find_lock_page (461,878,109 samples, 0.06%)</title><rect x="328.0" y="325" width="0.6" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="330.96" y="335.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (609,936,763 samples, 0.07%)</title><rect x="327.7" y="373" width="0.9" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="330.75" y="383.5" ></text>
</g>
<g >
<title>scheduler_tick (177,710,001 samples, 0.02%)</title><rect x="784.6" y="389" width="0.2" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="787.56" y="399.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (200,576,965 samples, 0.02%)</title><rect x="1159.0" y="517" width="0.3" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="1162.04" y="527.5" ></text>
</g>
<g >
<title>update_process_times (96,190,387 samples, 0.01%)</title><rect x="990.8" y="373" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="993.79" y="383.5" ></text>
</g>
<g >
<title>sys_pread64 (181,223,138,798 samples, 21.99%)</title><rect x="340.5" y="421" width="259.5" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="343.50" y="431.5" >sys_pread64</text>
</g>
<g >
<title>startup_hacks (128,045,675,684 samples, 15.54%)</title><rect x="50.2" y="757" width="183.3" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="53.19" y="767.5" >startup_hacks</text>
</g>
<g >
<title>pg_atomic_read_u32 (99,054,752 samples, 0.01%)</title><rect x="23.5" y="453" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="26.54" y="463.5" ></text>
</g>
<g >
<title>__do_page_fault (323,802,914 samples, 0.04%)</title><rect x="57.2" y="357" width="0.5" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="60.20" y="367.5" ></text>
</g>
<g >
<title>compactify_tuples (17,388,591,782 samples, 2.11%)</title><rect x="829.1" y="485" width="24.9" height="15.0" fill="rgb(209,21,5)" rx="2" ry="2" />
<text x="832.11" y="495.5" >c..</text>
</g>
<g >
<title>__libc_start_main (495,367,615 samples, 0.06%)</title><rect x="10.3" y="773" width="0.7" height="15.0" fill="rgb(236,142,34)" rx="2" ry="2" />
<text x="13.27" y="783.5" ></text>
</g>
<g >
<title>StartReadBuffersImpl (1,284,090,552 samples, 0.16%)</title><rect x="133.9" y="421" width="1.8" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="136.85" y="431.5" ></text>
</g>
<g >
<title>radix_tree_descend (166,031,505 samples, 0.02%)</title><rect x="270.9" y="229" width="0.3" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="273.93" y="239.5" ></text>
</g>
<g >
<title>page_fault (75,077,316 samples, 0.01%)</title><rect x="11.8" y="517" width="0.1" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="14.78" y="527.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (565,069,961 samples, 0.07%)</title><rect x="53.5" y="373" width="0.8" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="56.46" y="383.5" ></text>
</g>
<g >
<title>page_fault (3,497,024,377 samples, 0.42%)</title><rect x="266.9" y="421" width="5.0" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="269.86" y="431.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (151,872,239 samples, 0.02%)</title><rect x="393.6" y="165" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="396.57" y="175.5" ></text>
</g>
<g >
<title>sys_futex (109,531,290 samples, 0.01%)</title><rect x="73.9" y="373" width="0.2" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="76.93" y="383.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (72,546,724,437 samples, 8.80%)</title><rect x="465.7" y="149" width="103.9" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="468.73" y="159.5" >queued_spin_..</text>
</g>
<g >
<title>[perf] (424,046,642 samples, 0.05%)</title><rect x="10.3" y="709" width="0.6" height="15.0" fill="rgb(253,223,53)" rx="2" ry="2" />
<text x="13.27" y="719.5" ></text>
</g>
<g >
<title>PageIsNew (82,558,643 samples, 0.01%)</title><rect x="22.6" y="597" width="0.2" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="25.63" y="607.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (87,695,082 samples, 0.01%)</title><rect x="23.4" y="469" width="0.1" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="26.40" y="479.5" ></text>
</g>
<g >
<title>__block_write_begin_int (101,153,593 samples, 0.01%)</title><rect x="13.7" y="277" width="0.2" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="16.71" y="287.5" ></text>
</g>
<g >
<title>LWLockAcquire (113,831,556 samples, 0.01%)</title><rect x="23.5" y="485" width="0.2" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="26.53" y="495.5" ></text>
</g>
<g >
<title>radix_tree_descend (2,324,884,247 samples, 0.28%)</title><rect x="432.2" y="117" width="3.4" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="435.24" y="127.5" ></text>
</g>
<g >
<title>BufferIsPermanent (1,355,635,527 samples, 0.16%)</title><rect x="223.7" y="453" width="2.0" height="15.0" fill="rgb(250,210,50)" rx="2" ry="2" />
<text x="226.73" y="463.5" ></text>
</g>
<g >
<title>xfs_file_aio_write (301,734,021 samples, 0.04%)</title><rect x="10.4" y="581" width="0.5" height="15.0" fill="rgb(251,211,50)" rx="2" ry="2" />
<text x="13.43" y="591.5" ></text>
</g>
<g >
<title>clockevents_program_event (70,291,015 samples, 0.01%)</title><rect x="1167.0" y="613" width="0.1" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="1169.95" y="623.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (94,565,604 samples, 0.01%)</title><rect x="631.4" y="517" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="634.37" y="527.5" ></text>
</g>
<g >
<title>__radix_tree_insert (943,612,261 samples, 0.11%)</title><rect x="99.5" y="149" width="1.3" height="15.0" fill="rgb(235,140,33)" rx="2" ry="2" />
<text x="102.46" y="159.5" ></text>
</g>
<g >
<title>do_read_fault.isra.63 (304,123,019 samples, 0.04%)</title><rect x="63.9" y="341" width="0.5" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="66.92" y="351.5" ></text>
</g>
<g >
<title>start_secondary (18,725,326,849 samples, 2.27%)</title><rect x="1162.6" y="757" width="26.9" height="15.0" fill="rgb(242,170,40)" rx="2" ry="2" />
<text x="1165.65" y="767.5" >s..</text>
</g>
<g >
<title>HeapTupleSatisfiesVacuum (291,239,335 samples, 0.04%)</title><rect x="25.3" y="613" width="0.4" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="28.31" y="623.5" ></text>
</g>
<g >
<title>wake_up_process (744,716,072 samples, 0.09%)</title><rect x="1165.6" y="597" width="1.1" height="15.0" fill="rgb(213,37,8)" rx="2" ry="2" />
<text x="1168.59" y="607.5" ></text>
</g>
<g >
<title>system_call_after_swapgs (89,443,719 samples, 0.01%)</title><rect x="43.4" y="677" width="0.1" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="46.36" y="687.5" ></text>
</g>
<g >
<title>GetBufferDescriptor (149,671,047 samples, 0.02%)</title><rect x="227.5" y="437" width="0.2" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="230.47" y="447.5" ></text>
</g>
<g >
<title>ResourceOwnerForget (307,334,323 samples, 0.04%)</title><rect x="330.1" y="469" width="0.4" height="15.0" fill="rgb(235,142,33)" rx="2" ry="2" />
<text x="333.11" y="479.5" ></text>
</g>
<g >
<title>page_fault (98,350,942 samples, 0.01%)</title><rect x="310.0" y="373" width="0.2" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="313.04" y="383.5" ></text>
</g>
<g >
<title>BufferGetPage (174,057,608 samples, 0.02%)</title><rect x="233.7" y="565" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="236.67" y="575.5" ></text>
</g>
<g >
<title>smgrreadv (38,868,154,468 samples, 4.72%)</title><rect x="76.7" y="485" width="55.6" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="79.69" y="495.5" >smgrr..</text>
</g>
<g >
<title>__find_lock_page (1,338,191,759 samples, 0.16%)</title><rect x="93.5" y="165" width="1.9" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="96.53" y="175.5" ></text>
</g>
<g >
<title>tick_sched_handle (96,945,934 samples, 0.01%)</title><rect x="990.8" y="389" width="0.1" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="993.79" y="399.5" ></text>
</g>
<g >
<title>TransactionIdGetCommitLSN (261,442,926 samples, 0.03%)</title><rect x="1120.5" y="469" width="0.3" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="1123.45" y="479.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (376,838,370 samples, 0.05%)</title><rect x="1164.8" y="453" width="0.5" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="1167.76" y="463.5" ></text>
</g>
<g >
<title>ServerLoop (637,279,068,599 samples, 77.32%)</title><rect x="233.6" y="741" width="912.4" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="236.61" y="751.5" >ServerLoop</text>
</g>
<g >
<title>xfs_iunlock (1,452,303,115 samples, 0.18%)</title><rect x="17.2" y="341" width="2.1" height="15.0" fill="rgb(232,127,30)" rx="2" ry="2" />
<text x="20.20" y="351.5" ></text>
</g>
<g >
<title>update_process_times (75,893,920 samples, 0.01%)</title><rect x="173.3" y="341" width="0.2" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="176.34" y="351.5" ></text>
</g>
<g >
<title>parallel_vacuum_main (124,426,515,195 samples, 15.10%)</title><rect x="55.2" y="613" width="178.1" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="58.19" y="623.5" >parallel_vacuum_main</text>
</g>
<g >
<title>MarkBufferDirty (184,363,082 samples, 0.02%)</title><rect x="163.1" y="501" width="0.3" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="166.09" y="511.5" ></text>
</g>
<g >
<title>BufferIsValid (148,150,867 samples, 0.02%)</title><rect x="228.3" y="405" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="231.29" y="415.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (25,879,854,090 samples, 3.14%)</title><rect x="90.6" y="181" width="37.1" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="93.61" y="191.5" >shm..</text>
</g>
<g >
<title>pg_rotate_left32 (207,721,547 samples, 0.03%)</title><rect x="615.0" y="341" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="617.95" y="351.5" ></text>
</g>
<g >
<title>try_to_wake_up (987,409,050 samples, 0.12%)</title><rect x="17.9" y="261" width="1.4" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="20.86" y="271.5" ></text>
</g>
<g >
<title>ReadBuffer_common (1,309,935,066 samples, 0.16%)</title><rect x="133.8" y="453" width="1.9" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="136.82" y="463.5" ></text>
</g>
<g >
<title>deactivate_task (196,962,692 samples, 0.02%)</title><rect x="130.7" y="229" width="0.3" height="15.0" fill="rgb(206,8,2)" rx="2" ry="2" />
<text x="133.73" y="239.5" ></text>
</g>
<g >
<title>iomap_file_buffered_write (271,395,752 samples, 0.03%)</title><rect x="10.4" y="549" width="0.4" height="15.0" fill="rgb(206,6,1)" rx="2" ry="2" />
<text x="13.44" y="559.5" ></text>
</g>
<g >
<title>BufferIsValid (86,736,907 samples, 0.01%)</title><rect x="606.7" y="517" width="0.1" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="609.72" y="527.5" ></text>
</g>
<g >
<title>clockevents_program_event (287,736,488 samples, 0.03%)</title><rect x="1185.3" y="613" width="0.4" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="1188.26" y="623.5" ></text>
</g>
<g >
<title>update_time (618,631,583 samples, 0.08%)</title><rect x="14.9" y="309" width="0.9" height="15.0" fill="rgb(211,31,7)" rx="2" ry="2" />
<text x="17.88" y="319.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (399,479,590 samples, 0.05%)</title><rect x="1098.9" y="437" width="0.5" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1101.86" y="447.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (78,077,511 samples, 0.01%)</title><rect x="235.5" y="469" width="0.1" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="238.51" y="479.5" ></text>
</g>
<g >
<title>tick_nohz_idle_enter (2,419,701,390 samples, 0.29%)</title><rect x="1183.3" y="725" width="3.5" height="15.0" fill="rgb(250,211,50)" rx="2" ry="2" />
<text x="1186.31" y="735.5" ></text>
</g>
<g >
<title>BufferIsValid (358,459,423 samples, 0.04%)</title><rect x="1096.9" y="421" width="0.5" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1099.85" y="431.5" ></text>
</g>
<g >
<title>shmem_fault (78,256,442 samples, 0.01%)</title><rect x="75.5" y="341" width="0.1" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="78.50" y="351.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (78,424,895 samples, 0.01%)</title><rect x="235.7" y="485" width="0.1" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="238.65" y="495.5" ></text>
</g>
<g >
<title>do_lazy_scan_heap (3,079,689,901 samples, 0.37%)</title><rect x="50.2" y="437" width="4.4" height="15.0" fill="rgb(221,75,18)" rx="2" ry="2" />
<text x="53.21" y="447.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (74,441,753 samples, 0.01%)</title><rect x="87.4" y="117" width="0.1" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="90.38" y="127.5" ></text>
</g>
<g >
<title>MarkBufferDirty (1,113,157,223 samples, 0.14%)</title><rect x="630.2" y="533" width="1.6" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="633.20" y="543.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32_impl (916,327,583 samples, 0.11%)</title><rect x="602.5" y="453" width="1.4" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="605.55" y="463.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32_impl (194,896,887 samples, 0.02%)</title><rect x="329.8" y="453" width="0.2" height="15.0" fill="rgb(253,224,53)" rx="2" ry="2" />
<text x="332.76" y="463.5" ></text>
</g>
<g >
<title>HeapTupleHeaderAdvanceConflictHorizon (1,221,295,602 samples, 0.15%)</title><rect x="189.5" y="485" width="1.7" height="15.0" fill="rgb(240,164,39)" rx="2" ry="2" />
<text x="192.49" y="495.5" ></text>
</g>
<g >
<title>get_hash_entry (5,600,904,123 samples, 0.68%)</title><rect x="240.7" y="421" width="8.0" height="15.0" fill="rgb(225,93,22)" rx="2" ry="2" />
<text x="243.67" y="431.5" ></text>
</g>
<g >
<title>TransactionIdIsInProgress (201,566,214 samples, 0.02%)</title><rect x="229.7" y="469" width="0.3" height="15.0" fill="rgb(208,16,3)" rx="2" ry="2" />
<text x="232.74" y="479.5" ></text>
</g>
<g >
<title>ResourceOwnerForget (102,780,829 samples, 0.01%)</title><rect x="607.1" y="469" width="0.2" height="15.0" fill="rgb(235,142,33)" rx="2" ry="2" />
<text x="610.12" y="479.5" ></text>
</g>
<g >
<title>__list_add (74,140,569 samples, 0.01%)</title><rect x="570.6" y="133" width="0.1" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="573.60" y="143.5" ></text>
</g>
<g >
<title>TransactionIdDidCommit (686,273,186 samples, 0.08%)</title><rect x="1122.1" y="485" width="1.0" height="15.0" fill="rgb(216,51,12)" rx="2" ry="2" />
<text x="1125.10" y="495.5" ></text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (107,024,501 samples, 0.01%)</title><rect x="130.2" y="229" width="0.2" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="133.20" y="239.5" ></text>
</g>
<g >
<title>BufTableLookup (89,512,577 samples, 0.01%)</title><rect x="23.4" y="485" width="0.1" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="26.40" y="495.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (82,446,602 samples, 0.01%)</title><rect x="229.9" y="453" width="0.1" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="232.91" y="463.5" ></text>
</g>
<g >
<title>TransactionIdFollows (96,453,919 samples, 0.01%)</title><rect x="26.5" y="613" width="0.1" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="29.47" y="623.5" ></text>
</g>
<g >
<title>retint_swapgs (87,849,845 samples, 0.01%)</title><rect x="246.7" y="405" width="0.1" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="249.70" y="415.5" ></text>
</g>
<g >
<title>__radix_tree_create (2,480,859,096 samples, 0.30%)</title><rect x="456.9" y="149" width="3.6" height="15.0" fill="rgb(248,201,48)" rx="2" ry="2" />
<text x="459.90" y="159.5" ></text>
</g>
<g >
<title>xfs_trans_ijoin (84,314,857 samples, 0.01%)</title><rect x="15.6" y="277" width="0.2" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="18.64" y="287.5" ></text>
</g>
<g >
<title>GetVictimBuffer (5,804,088,260 samples, 0.70%)</title><rect x="11.9" y="549" width="8.3" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="14.92" y="559.5" ></text>
</g>
<g >
<title>lru_add_drain_cpu (231,948,564 samples, 0.03%)</title><rect x="1159.0" y="581" width="0.3" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="1162.01" y="591.5" ></text>
</g>
<g >
<title>_raw_spin_lock_irqsave (200,576,965 samples, 0.02%)</title><rect x="1159.0" y="549" width="0.3" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="1162.04" y="559.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (134,744,497 samples, 0.02%)</title><rect x="853.8" y="437" width="0.2" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="856.81" y="447.5" ></text>
</g>
<g >
<title>__writeback_inodes_wb (115,471,936 samples, 0.01%)</title><rect x="10.1" y="677" width="0.2" height="15.0" fill="rgb(234,133,32)" rx="2" ry="2" />
<text x="13.10" y="687.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (585,022,069 samples, 0.07%)</title><rect x="224.4" y="437" width="0.9" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="227.43" y="447.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (202,480,015 samples, 0.02%)</title><rect x="933.5" y="437" width="0.3" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="936.47" y="447.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_add_u32 (179,345,556 samples, 0.02%)</title><rect x="65.1" y="405" width="0.3" height="15.0" fill="rgb(206,4,1)" rx="2" ry="2" />
<text x="68.14" y="415.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (927,468,592 samples, 0.11%)</title><rect x="602.5" y="469" width="1.4" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="605.53" y="479.5" ></text>
</g>
<g >
<title>BufTableInsert (7,254,884,841 samples, 0.88%)</title><rect x="238.6" y="453" width="10.4" height="15.0" fill="rgb(206,8,1)" rx="2" ry="2" />
<text x="241.58" y="463.5" ></text>
</g>
<g >
<title>__find_get_page (414,148,384 samples, 0.05%)</title><rect x="328.0" y="309" width="0.6" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="330.97" y="319.5" ></text>
</g>
<g >
<title>_raw_qspin_lock_irq (74,745,830,129 samples, 9.07%)</title><rect x="462.6" y="165" width="107.0" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="465.58" y="175.5" >_raw_qspin_lo..</text>
</g>
<g >
<title>smp_apic_timer_interrupt (89,760,923 samples, 0.01%)</title><rect x="301.0" y="389" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="304.00" y="399.5" ></text>
</g>
<g >
<title>iput (96,801,624 samples, 0.01%)</title><rect x="48.4" y="661" width="0.2" height="15.0" fill="rgb(206,7,1)" rx="2" ry="2" />
<text x="51.43" y="671.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (111,981,977 samples, 0.01%)</title><rect x="1086.9" y="405" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="1089.95" y="415.5" ></text>
</g>
<g >
<title>PageRepairFragmentation (8,308,151,997 samples, 1.01%)</title><rect x="167.3" y="485" width="11.9" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="170.26" y="495.5" ></text>
</g>
<g >
<title>vfs_read (861,773,929 samples, 0.10%)</title><rect x="21.2" y="501" width="1.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="24.19" y="511.5" ></text>
</g>
<g >
<title>LWLockWaitListLock (133,068,234 samples, 0.02%)</title><rect x="604.1" y="469" width="0.2" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="607.14" y="479.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (74,369,605 samples, 0.01%)</title><rect x="20.2" y="517" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="23.25" y="527.5" ></text>
</g>
<g >
<title>rwsem_down_write_failed (832,917,593 samples, 0.10%)</title><rect x="16.0" y="293" width="1.2" height="15.0" fill="rgb(230,116,27)" rx="2" ry="2" />
<text x="19.01" y="303.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (137,640,001 samples, 0.02%)</title><rect x="393.6" y="133" width="0.2" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="396.58" y="143.5" ></text>
</g>
<g >
<title>pgstat_progress_update_param (159,560,102 samples, 0.02%)</title><rect x="1145.5" y="565" width="0.2" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="1148.48" y="575.5" ></text>
</g>
<g >
<title>hash_bytes (144,268,336 samples, 0.02%)</title><rect x="613.6" y="357" width="0.2" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="616.56" y="367.5" ></text>
</g>
<g >
<title>__list_add (78,715,736 samples, 0.01%)</title><rect x="591.5" y="277" width="0.1" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="594.48" y="287.5" ></text>
</g>
<g >
<title>PinBufferForBlock (59,826,216,177 samples, 7.26%)</title><rect x="237.5" y="485" width="85.6" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="240.45" y="495.5" >PinBufferF..</text>
</g>
<g >
<title>radix_tree_maybe_preload (118,390,622 samples, 0.01%)</title><rect x="452.3" y="181" width="0.2" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="455.35" y="191.5" ></text>
</g>
<g >
<title>update_process_times (76,404,427 samples, 0.01%)</title><rect x="933.7" y="389" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="936.65" y="399.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (89,160,177 samples, 0.01%)</title><rect x="1143.0" y="501" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1146.01" y="511.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (77,677,794 samples, 0.01%)</title><rect x="73.6" y="405" width="0.1" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="76.60" y="415.5" ></text>
</g>
<g >
<title>ItemPointerSet (5,252,318,095 samples, 0.64%)</title><rect x="761.9" y="517" width="7.5" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="764.92" y="527.5" ></text>
</g>
<g >
<title>page_fault (1,186,716,279 samples, 0.14%)</title><rect x="45.1" y="725" width="1.7" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="48.10" y="735.5" ></text>
</g>
<g >
<title>heap_prune_record_unchanged_lp_normal (896,865,081 samples, 0.11%)</title><rect x="1063.0" y="517" width="1.3" height="15.0" fill="rgb(221,76,18)" rx="2" ry="2" />
<text x="1065.98" y="527.5" ></text>
</g>
<g >
<title>page_add_file_rmap (289,036,469 samples, 0.04%)</title><rect x="578.9" y="213" width="0.4" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="581.93" y="223.5" ></text>
</g>
<g >
<title>rebalance_domains (140,183,602 samples, 0.02%)</title><rect x="1189.7" y="517" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="1192.70" y="527.5" ></text>
</g>
<g >
<title>system_call_fastpath (129,399,723 samples, 0.02%)</title><rect x="51.7" y="309" width="0.2" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="54.71" y="319.5" ></text>
</g>
<g >
<title>shmem_fault (760,194,814 samples, 0.09%)</title><rect x="59.6" y="309" width="1.1" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="62.64" y="319.5" ></text>
</g>
<g >
<title>BufferIsValid (1,258,646,007 samples, 0.15%)</title><rect x="1091.4" y="453" width="1.8" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1094.43" y="463.5" ></text>
</g>
<g >
<title>PageGetItemId (359,397,888 samples, 0.04%)</title><rect x="166.7" y="485" width="0.5" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="169.73" y="495.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (545,736,451 samples, 0.07%)</title><rect x="1140.0" y="469" width="0.8" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="1143.05" y="479.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (190,081,086 samples, 0.02%)</title><rect x="770.3" y="501" width="0.3" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="773.31" y="511.5" ></text>
</g>
<g >
<title>shmem_getpage_gfp (72,870,510 samples, 0.01%)</title><rect x="75.5" y="325" width="0.1" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="78.51" y="335.5" ></text>
</g>
<g >
<title>do_sync_write (302,519,142 samples, 0.04%)</title><rect x="10.4" y="597" width="0.5" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="13.43" y="607.5" ></text>
</g>
<g >
<title>GetVictimBuffer (8,376,972,519 samples, 1.02%)</title><rect x="61.5" y="437" width="12.0" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="64.47" y="447.5" ></text>
</g>
<g >
<title>do_parallel_lazy_scan_heap (22,573,331,038 samples, 2.74%)</title><rect x="11.0" y="677" width="32.3" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="14.02" y="687.5" >do..</text>
</g>
<g >
<title>tick_do_update_jiffies64 (94,199,503 samples, 0.01%)</title><rect x="1183.2" y="725" width="0.1" height="15.0" fill="rgb(208,14,3)" rx="2" ry="2" />
<text x="1186.17" y="735.5" ></text>
</g>
<g >
<title>main (495,367,615 samples, 0.06%)</title><rect x="10.3" y="757" width="0.7" height="15.0" fill="rgb(243,179,42)" rx="2" ry="2" />
<text x="13.27" y="767.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (271,714,266 samples, 0.03%)</title><rect x="236.0" y="517" width="0.3" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="238.95" y="527.5" ></text>
</g>
<g >
<title>LWLockAcquire (1,441,347,497 samples, 0.17%)</title><rect x="318.5" y="453" width="2.0" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="321.46" y="463.5" ></text>
</g>
<g >
<title>pick_next_task_fair (116,656,498 samples, 0.01%)</title><rect x="311.2" y="277" width="0.1" height="15.0" fill="rgb(242,170,40)" rx="2" ry="2" />
<text x="314.17" y="287.5" ></text>
</g>
<g >
<title>dsa_get_total_size (118,829,265 samples, 0.01%)</title><rect x="601.5" y="533" width="0.1" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="604.46" y="543.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (17,407,587,303 samples, 2.11%)</title><rect x="1034.1" y="469" width="24.9" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="1037.12" y="479.5" >h..</text>
</g>
<g >
<title>pg_atomic_read_u32_impl (80,991,435 samples, 0.01%)</title><rect x="225.5" y="421" width="0.1" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="228.47" y="431.5" ></text>
</g>
<g >
<title>mem_cgroup_page_lruvec (162,476,310 samples, 0.02%)</title><rect x="440.2" y="149" width="0.2" height="15.0" fill="rgb(212,32,7)" rx="2" ry="2" />
<text x="443.20" y="159.5" ></text>
</g>
<g >
<title>hash_search (84,772,866 samples, 0.01%)</title><rect x="12.1" y="501" width="0.2" height="15.0" fill="rgb(216,55,13)" rx="2" ry="2" />
<text x="15.13" y="511.5" ></text>
</g>
<g >
<title>heap_prune_chain (5,056,190,885 samples, 0.61%)</title><rect x="32.8" y="613" width="7.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="35.78" y="623.5" ></text>
</g>
<g >
<title>tick_program_event (220,321,226 samples, 0.03%)</title><rect x="1187.8" y="677" width="0.3" height="15.0" fill="rgb(241,166,39)" rx="2" ry="2" />
<text x="1190.77" y="687.5" ></text>
</g>
<g >
<title>ParallelWorkerMain (124,784,586,574 samples, 15.14%)</title><rect x="54.7" y="629" width="178.6" height="15.0" fill="rgb(253,221,53)" rx="2" ry="2" />
<text x="57.67" y="639.5" >ParallelWorkerMain</text>
</g>
<g >
<title>get_futex_key (435,364,263 samples, 0.05%)</title><rect x="48.8" y="661" width="0.6" height="15.0" fill="rgb(252,216,51)" rx="2" ry="2" />
<text x="51.82" y="671.5" ></text>
</g>
<g >
<title>pg_atomic_sub_fetch_u32_impl (182,633,630 samples, 0.02%)</title><rect x="321.2" y="421" width="0.3" height="15.0" fill="rgb(225,94,22)" rx="2" ry="2" />
<text x="324.22" y="431.5" ></text>
</g>
<g >
<title>tick_sched_handle (120,082,497 samples, 0.01%)</title><rect x="933.6" y="405" width="0.2" height="15.0" fill="rgb(219,68,16)" rx="2" ry="2" />
<text x="936.59" y="415.5" ></text>
</g>
<g >
<title>__mem_cgroup_commit_charge (817,128,077 samples, 0.10%)</title><rect x="97.2" y="133" width="1.1" height="15.0" fill="rgb(212,32,7)" rx="2" ry="2" />
<text x="100.16" y="143.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (128,189,296 samples, 0.02%)</title><rect x="990.7" y="437" width="0.2" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="993.75" y="447.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (166,198,298 samples, 0.02%)</title><rect x="1143.3" y="485" width="0.2" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="1146.30" y="495.5" ></text>
</g>
<g >
<title>retint_userspace_restore_args (155,169,508 samples, 0.02%)</title><rect x="57.7" y="389" width="0.2" height="15.0" fill="rgb(215,46,11)" rx="2" ry="2" />
<text x="60.66" y="399.5" ></text>
</g>
<g >
<title>radix_tree_descend (190,671,547 samples, 0.02%)</title><rect x="349.4" y="277" width="0.2" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="352.37" y="287.5" ></text>
</g>
<g >
<title>spin_delay (1,619,860,846 samples, 0.20%)</title><rect x="313.3" y="405" width="2.3" height="15.0" fill="rgb(240,162,38)" rx="2" ry="2" />
<text x="316.32" y="415.5" ></text>
</g>
<g >
<title>page_verify_redirects (7,203,449,051 samples, 0.87%)</title><rect x="854.5" y="501" width="10.3" height="15.0" fill="rgb(214,43,10)" rx="2" ry="2" />
<text x="857.49" y="511.5" ></text>
</g>
<g >
<title>mdreadv (319,415,208 samples, 0.04%)</title><rect x="49.5" y="757" width="0.5" height="15.0" fill="rgb(239,159,38)" rx="2" ry="2" />
<text x="52.51" y="767.5" ></text>
</g>
<g >
<title>lazy_scan_new_or_empty (140,622,587 samples, 0.02%)</title><rect x="621.8" y="549" width="0.2" height="15.0" fill="rgb(248,201,48)" rx="2" ry="2" />
<text x="624.77" y="559.5" ></text>
</g>
<g >
<title>handle_mm_fault (30,313,051,968 samples, 3.68%)</title><rect x="85.7" y="245" width="43.4" height="15.0" fill="rgb(234,135,32)" rx="2" ry="2" />
<text x="88.67" y="255.5" >hand..</text>
</g>
<g >
<title>__hrtimer_run_queues (73,285,622 samples, 0.01%)</title><rect x="96.9" y="101" width="0.1" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="99.88" y="111.5" ></text>
</g>
<g >
<title>MarkBufferDirtyHint (2,113,199,118 samples, 0.26%)</title><rect x="226.2" y="453" width="3.0" height="15.0" fill="rgb(234,136,32)" rx="2" ry="2" />
<text x="229.16" y="463.5" ></text>
</g>
<g >
<title>wake_up_q (155,771,975 samples, 0.02%)</title><rect x="596.8" y="277" width="0.3" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="599.83" y="287.5" ></text>
</g>
<g >
<title>ReadBuffer_common (7,888,684,788 samples, 0.96%)</title><rect x="11.1" y="629" width="11.3" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="14.14" y="639.5" ></text>
</g>
<g >
<title>xfs_file_aio_read (797,197,097 samples, 0.10%)</title><rect x="21.2" y="469" width="1.2" height="15.0" fill="rgb(224,90,21)" rx="2" ry="2" />
<text x="24.22" y="479.5" ></text>
</g>
<g >
<title>s_lock (28,144,310,941 samples, 3.41%)</title><rect x="277.5" y="421" width="40.3" height="15.0" fill="rgb(211,29,7)" rx="2" ry="2" />
<text x="280.52" y="431.5" >s_l..</text>
</g>
<g >
<title>vm_extend (93,639,631 samples, 0.01%)</title><rect x="1145.3" y="517" width="0.1" height="15.0" fill="rgb(247,195,46)" rx="2" ry="2" />
<text x="1148.27" y="527.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (80,712,318 samples, 0.01%)</title><rect x="301.0" y="341" width="0.1" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="304.00" y="351.5" ></text>
</g>
<g >
<title>visibilitymap_set (5,583,768,820 samples, 0.68%)</title><rect x="1135.8" y="533" width="8.0" height="15.0" fill="rgb(220,73,17)" rx="2" ry="2" />
<text x="1138.78" y="543.5" ></text>
</g>
<g >
<title>FileReadV (142,578,174 samples, 0.02%)</title><rect x="51.7" y="341" width="0.2" height="15.0" fill="rgb(222,81,19)" rx="2" ry="2" />
<text x="54.69" y="351.5" ></text>
</g>
<g >
<title>do_read_fault.isra.63 (248,307,353 samples, 0.03%)</title><rect x="57.3" y="325" width="0.4" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="60.31" y="335.5" ></text>
</g>
<g >
<title>__libc_pread64 (104,848,999 samples, 0.01%)</title><rect x="334.4" y="453" width="0.2" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" />
<text x="337.42" y="463.5" ></text>
</g>
<g >
<title>pg_atomic_sub_fetch_u32_impl (131,575,020 samples, 0.02%)</title><rect x="1142.6" y="469" width="0.2" height="15.0" fill="rgb(225,94,22)" rx="2" ry="2" />
<text x="1145.60" y="479.5" ></text>
</g>
<g >
<title>do_shared_fault.isra.64 (136,590,186,189 samples, 16.57%)</title><rect x="389.8" y="245" width="195.6" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="392.82" y="255.5" >do_shared_fault.isra.64</text>
</g>
<g >
<title>heap_prune_record_unused (136,873,054 samples, 0.02%)</title><rect x="1060.8" y="501" width="0.2" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="1063.84" y="511.5" ></text>
</g>
<g >
<title>free_page_and_swap_cache (517,871,509 samples, 0.06%)</title><rect x="1164.6" y="565" width="0.7" height="15.0" fill="rgb(215,49,11)" rx="2" ry="2" />
<text x="1167.58" y="575.5" ></text>
</g>
<g >
<title>call_string_check_hook (71,216,627 samples, 0.01%)</title><rect x="55.0" y="581" width="0.1" height="15.0" fill="rgb(236,144,34)" rx="2" ry="2" />
<text x="58.02" y="591.5" ></text>
</g>
<g >
<title>__do_fault.isra.61 (130,815,869,069 samples, 15.87%)</title><rect x="390.6" y="229" width="187.3" height="15.0" fill="rgb(227,102,24)" rx="2" ry="2" />
<text x="393.65" y="239.5" >__do_fault.isra.61</text>
</g>
<g >
<title>DataChecksumsEnabled (70,930,903 samples, 0.01%)</title><rect x="50.3" y="277" width="0.1" height="15.0" fill="rgb(226,96,23)" rx="2" ry="2" />
<text x="53.30" y="287.5" ></text>
</g>
<g >
<title>ReadBufferExtended (187,539,598 samples, 0.02%)</title><rect x="22.8" y="597" width="0.2" height="15.0" fill="rgb(242,171,40)" rx="2" ry="2" />
<text x="25.75" y="607.5" ></text>
</g>
<g >
<title>up_read (342,688,447 samples, 0.04%)</title><rect x="131.3" y="309" width="0.5" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="134.34" y="319.5" ></text>
</g>
<g >
<title>LWLockAcquire (160,900,720 samples, 0.02%)</title><rect x="73.5" y="437" width="0.2" height="15.0" fill="rgb(209,20,4)" rx="2" ry="2" />
<text x="76.49" y="447.5" ></text>
</g>
<g >
<title>rwsem_wake (174,896,789 samples, 0.02%)</title><rect x="596.8" y="293" width="0.3" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="599.81" y="303.5" ></text>
</g>
<g >
<title>sem_post@@GLIBC_2.2.5 (131,267,901 samples, 0.02%)</title><rect x="320.8" y="421" width="0.2" height="15.0" fill="rgb(214,41,9)" rx="2" ry="2" />
<text x="323.78" y="431.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32 (122,520,356 samples, 0.01%)</title><rect x="235.4" y="485" width="0.2" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="238.45" y="495.5" ></text>
</g>
<g >
<title>ReleaseBuffer (76,446,006 samples, 0.01%)</title><rect x="133.3" y="501" width="0.1" height="15.0" fill="rgb(220,71,17)" rx="2" ry="2" />
<text x="136.29" y="511.5" ></text>
</g>
<g >
<title>HeapTupleSatisfiesVacuumHorizon (1,463,672,283 samples, 0.18%)</title><rect x="40.5" y="597" width="2.1" height="15.0" fill="rgb(207,13,3)" rx="2" ry="2" />
<text x="43.51" y="607.5" ></text>
</g>
<g >
<title>PageIsVerifiedExtended (77,882,144 samples, 0.01%)</title><rect x="74.8" y="485" width="0.2" height="15.0" fill="rgb(251,215,51)" rx="2" ry="2" />
<text x="77.84" y="495.5" ></text>
</g>
<g >
<title>do_lazy_scan_heap (636,588,717,219 samples, 77.24%)</title><rect x="234.0" y="565" width="911.4" height="15.0" fill="rgb(221,75,18)" rx="2" ry="2" />
<text x="237.04" y="575.5" >do_lazy_scan_heap</text>
</g>
<g >
<title>xfs_file_buffered_aio_write (597,755,680 samples, 0.07%)</title><rect x="50.6" y="133" width="0.9" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="53.62" y="143.5" ></text>
</g>
<g >
<title>shmem_fault (2,001,318,628 samples, 0.24%)</title><rect x="268.6" y="325" width="2.8" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="271.57" y="335.5" ></text>
</g>
<g >
<title>mark_buffer_dirty (215,333,607 samples, 0.03%)</title><rect x="14.1" y="245" width="0.3" height="15.0" fill="rgb(240,163,39)" rx="2" ry="2" />
<text x="17.12" y="255.5" ></text>
</g>
<g >
<title>run_rebalance_domains (203,779,720 samples, 0.02%)</title><rect x="1189.6" y="533" width="0.3" height="15.0" fill="rgb(232,126,30)" rx="2" ry="2" />
<text x="1192.63" y="543.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (159,713,281 samples, 0.02%)</title><rect x="828.8" y="421" width="0.3" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="831.83" y="431.5" ></text>
</g>
<g >
<title>tag_hash (91,291,506 samples, 0.01%)</title><rect x="615.2" y="373" width="0.2" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="618.25" y="383.5" ></text>
</g>
<g >
<title>__schedule (588,202,006 samples, 0.07%)</title><rect x="16.4" y="261" width="0.8" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="19.35" y="271.5" ></text>
</g>
<g >
<title>BlockIdSet (134,762,803 samples, 0.02%)</title><rect x="160.9" y="501" width="0.2" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="163.89" y="511.5" ></text>
</g>
<g >
<title>update_process_times (100,961,491 samples, 0.01%)</title><rect x="1034.0" y="357" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="1036.96" y="367.5" ></text>
</g>
<g >
<title>do_read_fault.isra.63 (854,347,269 samples, 0.10%)</title><rect x="59.6" y="341" width="1.2" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="62.62" y="351.5" ></text>
</g>
<g >
<title>__alloc_pages_nodemask (3,905,384,102 samples, 0.47%)</title><rect x="570.0" y="149" width="5.6" height="15.0" fill="rgb(228,108,25)" rx="2" ry="2" />
<text x="573.05" y="159.5" ></text>
</g>
<g >
<title>dput (207,161,830 samples, 0.03%)</title><rect x="336.6" y="389" width="0.3" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="339.60" y="399.5" ></text>
</g>
<g >
<title>WaitReadBuffersCanStartIO (146,203,233 samples, 0.02%)</title><rect x="76.2" y="485" width="0.2" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="79.16" y="495.5" ></text>
</g>
<g >
<title>PinBufferForBlock (225,838,527 samples, 0.03%)</title><rect x="23.4" y="517" width="0.3" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="26.39" y="527.5" ></text>
</g>
<g >
<title>system_call_fastpath (109,531,290 samples, 0.01%)</title><rect x="73.9" y="389" width="0.2" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="76.93" y="399.5" ></text>
</g>
<g >
<title>__inc_zone_page_state (81,189,101 samples, 0.01%)</title><rect x="99.3" y="149" width="0.1" height="15.0" fill="rgb(209,22,5)" rx="2" ry="2" />
<text x="102.31" y="159.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (132,829,552 samples, 0.02%)</title><rect x="1144.6" y="517" width="0.2" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="1147.61" y="527.5" ></text>
</g>
<g >
<title>sys_nanosleep (94,502,853 samples, 0.01%)</title><rect x="71.8" y="341" width="0.2" height="15.0" fill="rgb(248,200,48)" rx="2" ry="2" />
<text x="74.83" y="351.5" ></text>
</g>
<g >
<title>__switch_to (352,258,769 samples, 0.04%)</title><rect x="1161.1" y="773" width="0.5" height="15.0" fill="rgb(205,2,0)" rx="2" ry="2" />
<text x="1164.10" y="783.5" ></text>
</g>
<g >
<title>PageGetItemId (3,037,372,577 samples, 0.37%)</title><rect x="697.5" y="517" width="4.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="700.48" y="527.5" ></text>
</g>
<g >
<title>iomap_write_actor (1,258,576,371 samples, 0.15%)</title><rect x="12.8" y="309" width="1.8" height="15.0" fill="rgb(232,125,30)" rx="2" ry="2" />
<text x="15.77" y="319.5" ></text>
</g>
<g >
<title>PageGetItemId (579,037,268 samples, 0.07%)</title><rect x="149.1" y="501" width="0.8" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="152.12" y="511.5" ></text>
</g>
<g >
<title>__find_get_page (620,921,777 samples, 0.08%)</title><rect x="79.2" y="293" width="0.9" height="15.0" fill="rgb(229,114,27)" rx="2" ry="2" />
<text x="82.22" y="303.5" ></text>
</g>
<g >
<title>BufferAlloc (59,632,114,198 samples, 7.24%)</title><rect x="237.6" y="469" width="85.4" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="240.63" y="479.5" >BufferAlloc</text>
</g>
<g >
<title>MarkBufferDirty (139,688,583 samples, 0.02%)</title><rect x="137.0" y="517" width="0.2" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="139.98" y="527.5" ></text>
</g>
<g >
<title>BufferIsValid (92,825,077 samples, 0.01%)</title><rect x="224.7" y="421" width="0.2" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="227.75" y="431.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (95,971,567 samples, 0.01%)</title><rect x="617.1" y="341" width="0.1" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="620.06" y="351.5" ></text>
</g>
<g >
<title>BufferGetBlock (89,798,587 samples, 0.01%)</title><rect x="667.6" y="501" width="0.2" height="15.0" fill="rgb(242,172,41)" rx="2" ry="2" />
<text x="670.63" y="511.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (79,157,899 samples, 0.01%)</title><rect x="619.8" y="373" width="0.1" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="622.83" y="383.5" ></text>
</g>
<g >
<title>table_parallel_vacuum_scan (637,236,586,138 samples, 77.31%)</title><rect x="233.7" y="597" width="912.3" height="15.0" fill="rgb(240,165,39)" rx="2" ry="2" />
<text x="236.67" y="607.5" >table_parallel_vacuum_scan</text>
</g>
<g >
<title>get_page_from_freelist (2,979,438,510 samples, 0.36%)</title><rect x="570.9" y="133" width="4.3" height="15.0" fill="rgb(252,218,52)" rx="2" ry="2" />
<text x="573.92" y="143.5" ></text>
</g>
<g >
<title>pg_atomic_compare_exchange_u32_impl (417,280,752 samples, 0.05%)</title><rect x="1140.2" y="453" width="0.6" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="1143.23" y="463.5" ></text>
</g>
<g >
<title>iomap_apply (157,026,362 samples, 0.02%)</title><rect x="50.6" y="101" width="0.3" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="53.63" y="111.5" ></text>
</g>
<g >
<title>__set_page_dirty (194,182,657 samples, 0.02%)</title><rect x="14.1" y="229" width="0.3" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="17.15" y="239.5" ></text>
</g>
<g >
<title>selinux_file_permission (145,028,512 samples, 0.02%)</title><rect x="599.4" y="373" width="0.2" height="15.0" fill="rgb(249,204,48)" rx="2" ry="2" />
<text x="602.35" y="383.5" ></text>
</g>
<g >
<title>ReadBuffer_common (53,497,394,339 samples, 6.49%)</title><rect x="55.8" y="517" width="76.6" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="58.78" y="527.5" >ReadBuff..</text>
</g>
<g >
<title>radix_tree_descend (251,125,417 samples, 0.03%)</title><rect x="99.9" y="117" width="0.3" height="15.0" fill="rgb(243,175,41)" rx="2" ry="2" />
<text x="102.89" y="127.5" ></text>
</g>
<g >
<title>set_page_dirty (250,383,969 samples, 0.03%)</title><rect x="128.6" y="213" width="0.4" height="15.0" fill="rgb(231,123,29)" rx="2" ry="2" />
<text x="131.60" y="223.5" ></text>
</g>
<g >
<title>ResourceOwnerForgetBuffer (164,220,958 samples, 0.02%)</title><rect x="605.5" y="501" width="0.2" height="15.0" fill="rgb(247,193,46)" rx="2" ry="2" />
<text x="608.49" y="511.5" ></text>
</g>
<g >
<title>LWLockAttemptLock (1,336,748,520 samples, 0.16%)</title><rect x="601.9" y="485" width="2.0" height="15.0" fill="rgb(235,138,33)" rx="2" ry="2" />
<text x="604.95" y="495.5" ></text>
</g>
<g >
<title>lazy_scan_prune (13,601,458,906 samples, 1.65%)</title><rect x="23.8" y="645" width="19.5" height="15.0" fill="rgb(243,178,42)" rx="2" ry="2" />
<text x="26.78" y="655.5" ></text>
</g>
<g >
<title>PinBufferForBlock (398,251,233 samples, 0.05%)</title><rect x="43.4" y="741" width="0.5" height="15.0" fill="rgb(241,168,40)" rx="2" ry="2" />
<text x="46.36" y="751.5" ></text>
</g>
<g >
<title>block_write_end (301,601,796 samples, 0.04%)</title><rect x="14.0" y="277" width="0.4" height="15.0" fill="rgb(213,38,9)" rx="2" ry="2" />
<text x="17.00" y="287.5" ></text>
</g>
<g >
<title>xfs_file_buffered_aio_write (4,576,999,294 samples, 0.56%)</title><rect x="12.7" y="357" width="6.6" height="15.0" fill="rgb(243,176,42)" rx="2" ry="2" />
<text x="15.72" y="367.5" ></text>
</g>
<g >
<title>scheduler_tick (74,180,775 samples, 0.01%)</title><rect x="621.3" y="341" width="0.1" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="624.34" y="351.5" ></text>
</g>
<g >
<title>GetPrivateRefCount (116,993,009 samples, 0.01%)</title><rect x="625.4" y="517" width="0.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="628.35" y="527.5" ></text>
</g>
<g >
<title>exit_mmap (10,483,653,966 samples, 1.27%)</title><rect x="1146.0" y="693" width="15.0" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="1148.99" y="703.5" ></text>
</g>
<g >
<title>heap_vac_scan_next_block (1,797,995,109 samples, 0.22%)</title><rect x="133.2" y="533" width="2.6" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="136.23" y="543.5" ></text>
</g>
<g >
<title>ConditionVariableBroadcast (642,734,508 samples, 0.08%)</title><rect x="75.0" y="469" width="0.9" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="77.99" y="479.5" ></text>
</g>
<g >
<title>shmem_fault (28,372,881,604 samples, 3.44%)</title><rect x="87.0" y="197" width="40.7" height="15.0" fill="rgb(236,143,34)" rx="2" ry="2" />
<text x="90.04" y="207.5" >shm..</text>
</g>
<g >
<title>sys_pwrite64 (707,254,508 samples, 0.09%)</title><rect x="50.5" y="197" width="1.0" height="15.0" fill="rgb(238,156,37)" rx="2" ry="2" />
<text x="53.47" y="207.5" ></text>
</g>
<g >
<title>LWLockWakeup (157,903,922 samples, 0.02%)</title><rect x="320.7" y="437" width="0.3" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
<text x="323.74" y="447.5" ></text>
</g>
<g >
<title>shmem_add_to_page_cache.isra.26 (18,810,627,991 samples, 2.28%)</title><rect x="98.6" y="165" width="27.0" height="15.0" fill="rgb(250,207,49)" rx="2" ry="2" />
<text x="101.63" y="175.5" >s..</text>
</g>
<g >
<title>TransactionIdPrecedes (4,804,527,150 samples, 0.58%)</title><rect x="1048.8" y="453" width="6.9" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="1051.83" y="463.5" ></text>
</g>
<g >
<title>PageGetItemId (632,391,766 samples, 0.08%)</title><rect x="180.4" y="469" width="0.9" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="183.39" y="479.5" ></text>
</g>
<g >
<title>file_update_time (643,537,680 samples, 0.08%)</title><rect x="14.9" y="325" width="0.9" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="17.85" y="335.5" ></text>
</g>
<g >
<title>TransactionIdDidCommit (73,967,477 samples, 0.01%)</title><rect x="230.6" y="485" width="0.1" height="15.0" fill="rgb(216,51,12)" rx="2" ry="2" />
<text x="233.57" y="495.5" ></text>
</g>
<g >
<title>xfs_file_buffered_aio_read (781,770,891 samples, 0.09%)</title><rect x="21.2" y="453" width="1.2" height="15.0" fill="rgb(217,55,13)" rx="2" ry="2" />
<text x="24.25" y="463.5" ></text>
</g>
<g >
<title>__list_add (174,328,507 samples, 0.02%)</title><rect x="572.2" y="117" width="0.3" height="15.0" fill="rgb(235,141,33)" rx="2" ry="2" />
<text x="575.23" y="127.5" ></text>
</g>
<g >
<title>ReadBuffer_common (185,939,462 samples, 0.02%)</title><rect x="22.8" y="581" width="0.2" height="15.0" fill="rgb(213,40,9)" rx="2" ry="2" />
<text x="25.75" y="591.5" ></text>
</g>
<g >
<title>clockevents_program_event (209,766,194 samples, 0.03%)</title><rect x="1187.8" y="661" width="0.3" height="15.0" fill="rgb(244,182,43)" rx="2" ry="2" />
<text x="1190.77" y="671.5" ></text>
</g>
<g >
<title>zone_statistics (165,101,474 samples, 0.02%)</title><rect x="575.0" y="117" width="0.2" height="15.0" fill="rgb(232,125,29)" rx="2" ry="2" />
<text x="577.95" y="127.5" ></text>
</g>
<g >
<title>system_call_fastpath (314,355,073 samples, 0.04%)</title><rect x="10.4" y="645" width="0.5" height="15.0" fill="rgb(252,217,52)" rx="2" ry="2" />
<text x="13.42" y="655.5" ></text>
</g>
<g >
<title>pg_atomic_fetch_or_u32 (88,573,341 samples, 0.01%)</title><rect x="276.3" y="421" width="0.1" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="279.30" y="431.5" ></text>
</g>
<g >
<title>release_pages (223,385,334 samples, 0.03%)</title><rect x="440.4" y="149" width="0.4" height="15.0" fill="rgb(228,106,25)" rx="2" ry="2" />
<text x="443.45" y="159.5" ></text>
</g>
<g >
<title>ss_report_location (222,040,834 samples, 0.03%)</title><rect x="609.6" y="501" width="0.3" height="15.0" fill="rgb(249,202,48)" rx="2" ry="2" />
<text x="612.57" y="511.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (2,211,397,386 samples, 0.27%)</title><rect x="58.3" y="421" width="3.2" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="61.30" y="431.5" ></text>
</g>
<g >
<title>set_page_dirty (819,418,366 samples, 0.10%)</title><rect x="1157.4" y="629" width="1.2" height="15.0" fill="rgb(231,123,29)" rx="2" ry="2" />
<text x="1160.45" y="639.5" ></text>
</g>
<g >
<title>__pwrite_nocancel (4,885,135,977 samples, 0.59%)</title><rect x="12.3" y="453" width="7.0" height="15.0" fill="rgb(219,67,16)" rx="2" ry="2" />
<text x="15.34" y="463.5" ></text>
</g>
<g >
<title>PageGetItemId (4,430,694,609 samples, 0.54%)</title><rect x="777.9" y="517" width="6.3" height="15.0" fill="rgb(246,192,46)" rx="2" ry="2" />
<text x="780.88" y="527.5" ></text>
</g>
<g >
<title>BufTableHashCode (1,411,021,184 samples, 0.17%)</title><rect x="613.4" y="389" width="2.0" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="616.36" y="399.5" ></text>
</g>
<g >
<title>heap_prune_satisfies_vacuum (184,225,118 samples, 0.02%)</title><rect x="54.3" y="389" width="0.3" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="57.29" y="399.5" ></text>
</g>
<g >
<title>GetPrivateRefCountEntry (266,032,186 samples, 0.03%)</title><rect x="224.9" y="421" width="0.4" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="227.88" y="431.5" ></text>
</g>
<g >
<title>process_pm_pmsignal (637,265,816,681 samples, 77.32%)</title><rect x="233.6" y="725" width="912.4" height="15.0" fill="rgb(254,228,54)" rx="2" ry="2" />
<text x="236.63" y="735.5" >process_pm_pmsignal</text>
</g>
<g >
<title>htsv_get_valid_status (1,297,094,185 samples, 0.16%)</title><rect x="1061.0" y="501" width="1.9" height="15.0" fill="rgb(251,212,50)" rx="2" ry="2" />
<text x="1064.04" y="511.5" ></text>
</g>
<g >
<title>get_hash_value (1,265,120,321 samples, 0.15%)</title><rect x="613.4" y="373" width="1.8" height="15.0" fill="rgb(211,27,6)" rx="2" ry="2" />
<text x="616.44" y="383.5" ></text>
</g>
<g >
<title>scheduler_tick (90,492,093 samples, 0.01%)</title><rect x="393.6" y="69" width="0.2" height="15.0" fill="rgb(246,190,45)" rx="2" ry="2" />
<text x="396.65" y="79.5" ></text>
</g>
<g >
<title>auditsys (81,869,134 samples, 0.01%)</title><rect x="309.9" y="373" width="0.1" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="312.85" y="383.5" ></text>
</g>
<g >
<title>sys_pread64 (129,399,723 samples, 0.02%)</title><rect x="51.7" y="293" width="0.2" height="15.0" fill="rgb(212,35,8)" rx="2" ry="2" />
<text x="54.71" y="303.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (131,208,502 samples, 0.02%)</title><rect x="682.4" y="469" width="0.1" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="685.35" y="479.5" ></text>
</g>
<g >
<title>pgstat_count_io_op_n (431,293,237 samples, 0.05%)</title><rect x="332.1" y="485" width="0.6" height="15.0" fill="rgb(232,128,30)" rx="2" ry="2" />
<text x="335.11" y="495.5" ></text>
</g>
<g >
<title>set_next_entity (71,827,163 samples, 0.01%)</title><rect x="1182.6" y="677" width="0.1" height="15.0" fill="rgb(232,125,29)" rx="2" ry="2" />
<text x="1185.61" y="687.5" ></text>
</g>
<g >
<title>file_update_time (456,508,965 samples, 0.06%)</title><rect x="127.9" y="213" width="0.7" height="15.0" fill="rgb(210,27,6)" rx="2" ry="2" />
<text x="130.92" y="223.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (200,799,339 samples, 0.02%)</title><rect x="828.8" y="437" width="0.3" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="831.83" y="447.5" ></text>
</g>
<g >
<title>mmput (10,483,653,966 samples, 1.27%)</title><rect x="1146.0" y="709" width="15.0" height="15.0" fill="rgb(226,99,23)" rx="2" ry="2" />
<text x="1148.99" y="719.5" ></text>
</g>
<g >
<title>__hrtimer_run_queues (315,709,195 samples, 0.04%)</title><rect x="445.1" y="117" width="0.4" height="15.0" fill="rgb(237,150,35)" rx="2" ry="2" />
<text x="448.07" y="127.5" ></text>
</g>
<g >
<title>do_shared_fault.isra.64 (29,497,082,559 samples, 3.58%)</title><rect x="86.8" y="229" width="42.2" height="15.0" fill="rgb(245,185,44)" rx="2" ry="2" />
<text x="89.81" y="239.5" >do_..</text>
</g>
<g >
<title>page_fault (328,458,750 samples, 0.04%)</title><rect x="57.2" y="389" width="0.5" height="15.0" fill="rgb(243,177,42)" rx="2" ry="2" />
<text x="60.19" y="399.5" ></text>
</g>
<g >
<title>hrtimer_interrupt (129,480,975 samples, 0.02%)</title><rect x="682.4" y="453" width="0.1" height="15.0" fill="rgb(228,109,26)" rx="2" ry="2" />
<text x="685.35" y="463.5" ></text>
</g>
<g >
<title>generic_file_aio_read (35,237,479,854 samples, 4.28%)</title><rect x="79.0" y="325" width="50.4" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="81.96" y="335.5" >gener..</text>
</g>
<g >
<title>tick_sched_timer (125,874,696 samples, 0.02%)</title><rect x="682.4" y="421" width="0.1" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="685.36" y="431.5" ></text>
</g>
<g >
<title>__do_page_fault (3,447,512,278 samples, 0.42%)</title><rect x="266.9" y="389" width="4.9" height="15.0" fill="rgb(239,158,37)" rx="2" ry="2" />
<text x="269.88" y="399.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (160,828,911 samples, 0.02%)</title><rect x="770.6" y="501" width="0.2" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="773.61" y="511.5" ></text>
</g>
<g >
<title>BufferAlloc (12,993,680,265 samples, 1.58%)</title><rect x="56.0" y="453" width="18.6" height="15.0" fill="rgb(252,220,52)" rx="2" ry="2" />
<text x="59.04" y="463.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (245,327,453 samples, 0.03%)</title><rect x="933.5" y="469" width="0.3" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="936.47" y="479.5" ></text>
</g>
<g >
<title>__find_lock_page (135,896,440 samples, 0.02%)</title><rect x="57.4" y="261" width="0.2" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="60.37" y="271.5" ></text>
</g>
<g >
<title>smp_apic_timer_interrupt (206,275,440 samples, 0.03%)</title><rect x="709.2" y="501" width="0.3" height="15.0" fill="rgb(221,74,17)" rx="2" ry="2" />
<text x="712.19" y="511.5" ></text>
</g>
<g >
<title>wake_up_q (99,261,607 samples, 0.01%)</title><rect x="131.2" y="261" width="0.1" height="15.0" fill="rgb(237,151,36)" rx="2" ry="2" />
<text x="134.18" y="271.5" ></text>
</g>
<g >
<title>queued_spin_lock_slowpath (640,204,755 samples, 0.08%)</title><rect x="95.9" y="149" width="0.9" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="98.87" y="159.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (200,799,339 samples, 0.02%)</title><rect x="828.8" y="453" width="0.3" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="831.83" y="463.5" ></text>
</g>
<g >
<title>BufferIsValid (1,132,572,914 samples, 0.14%)</title><rect x="1099.7" y="469" width="1.6" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="1102.72" y="479.5" ></text>
</g>
<g >
<title>hash_search_with_hash_value (205,061,762 samples, 0.02%)</title><rect x="11.6" y="533" width="0.3" height="15.0" fill="rgb(249,205,49)" rx="2" ry="2" />
<text x="14.60" y="543.5" ></text>
</g>
<g >
<title>BufTableHashCode (293,916,447 samples, 0.04%)</title><rect x="134.3" y="373" width="0.4" height="15.0" fill="rgb(215,47,11)" rx="2" ry="2" />
<text x="137.27" y="383.5" ></text>
</g>
<g >
<title>get_next_timer_interrupt (76,093,774 samples, 0.01%)</title><rect x="1184.9" y="677" width="0.1" height="15.0" fill="rgb(254,229,54)" rx="2" ry="2" />
<text x="1187.90" y="687.5" ></text>
</g>
<g >
<title>vfs_write (4,721,878,079 samples, 0.57%)</title><rect x="12.6" y="405" width="6.7" height="15.0" fill="rgb(250,209,50)" rx="2" ry="2" />
<text x="15.57" y="415.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (4,099,787,855 samples, 0.50%)</title><rect x="984.9" y="485" width="5.8" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="987.86" y="495.5" ></text>
</g>
<g >
<title>ItemPointerSet (231,346,316 samples, 0.03%)</title><rect x="28.8" y="613" width="0.3" height="15.0" fill="rgb(237,147,35)" rx="2" ry="2" />
<text x="31.76" y="623.5" ></text>
</g>
<g >
<title>LWLockHeldByMe (905,157,760 samples, 0.11%)</title><rect x="1117.3" y="453" width="1.3" height="15.0" fill="rgb(252,219,52)" rx="2" ry="2" />
<text x="1120.28" y="463.5" ></text>
</g>
<g >
<title>pg_atomic_read_u32 (1,164,123,826 samples, 0.14%)</title><rect x="1097.8" y="453" width="1.6" height="15.0" fill="rgb(248,202,48)" rx="2" ry="2" />
<text x="1100.77" y="463.5" ></text>
</g>
<g >
<title>generic_file_aio_read (108,562,124 samples, 0.01%)</title><rect x="51.7" y="213" width="0.2" height="15.0" fill="rgb(216,53,12)" rx="2" ry="2" />
<text x="54.73" y="223.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (81,334,197 samples, 0.01%)</title><rect x="87.4" y="181" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="90.38" y="191.5" ></text>
</g>
<g >
<title>do_sync_write (4,667,300,246 samples, 0.57%)</title><rect x="12.6" y="389" width="6.7" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="15.60" y="399.5" ></text>
</g>
<g >
<title>up_read (93,745,128 samples, 0.01%)</title><rect x="587.1" y="341" width="0.2" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="590.12" y="351.5" ></text>
</g>
<g >
<title>__schedule (98,781,414 samples, 0.01%)</title><rect x="48.6" y="645" width="0.1" height="15.0" fill="rgb(227,103,24)" rx="2" ry="2" />
<text x="51.57" y="655.5" ></text>
</g>
<g >
<title>LockBuffer (80,265,410 samples, 0.01%)</title><rect x="43.1" y="613" width="0.2" height="15.0" fill="rgb(235,142,34)" rx="2" ry="2" />
<text x="46.14" y="623.5" ></text>
</g>
<g >
<title>check_preempt_curr (192,581,581 samples, 0.02%)</title><rect x="1180.9" y="677" width="0.3" height="15.0" fill="rgb(231,122,29)" rx="2" ry="2" />
<text x="1183.94" y="687.5" ></text>
</g>
<g >
<title>copy_user_enhanced_fast_string (14,635,789,779 samples, 1.78%)</title><rect x="349.9" y="309" width="21.0" height="15.0" fill="rgb(238,155,37)" rx="2" ry="2" />
<text x="352.92" y="319.5" ></text>
</g>
<g >
<title>MarkBufferDirty (73,475,188 samples, 0.01%)</title><rect x="232.8" y="501" width="0.1" height="15.0" fill="rgb(238,152,36)" rx="2" ry="2" />
<text x="235.76" y="511.5" ></text>
</g>
<g >
<title>vm_readbuf (1,434,602,771 samples, 0.17%)</title><rect x="133.7" y="485" width="2.1" height="15.0" fill="rgb(224,88,21)" rx="2" ry="2" />
<text x="136.75" y="495.5" ></text>
</g>
<g >
<title>TerminateBufferIO (3,770,178,884 samples, 0.46%)</title><rect x="325.4" y="501" width="5.4" height="15.0" fill="rgb(239,160,38)" rx="2" ry="2" />
<text x="328.40" y="511.5" ></text>
</g>
<g >
<title>vacuum_delay_point (112,464,995 samples, 0.01%)</title><rect x="1144.1" y="549" width="0.2" height="15.0" fill="rgb(208,17,4)" rx="2" ry="2" />
<text x="1147.10" y="559.5" ></text>
</g>
<g >
<title>lookup_page_cgroup (218,104,246 samples, 0.03%)</title><rect x="450.7" y="133" width="0.3" height="15.0" fill="rgb(228,107,25)" rx="2" ry="2" />
<text x="453.66" y="143.5" ></text>
</g>
<g >
<title>_raw_qspin_lock_irq (17,281,462,928 samples, 2.10%)</title><rect x="100.8" y="149" width="24.8" height="15.0" fill="rgb(251,214,51)" rx="2" ry="2" />
<text x="103.81" y="159.5" >_..</text>
</g>
<g >
<title>pg_atomic_fetch_or_u32_impl (581,773,591 samples, 0.07%)</title><rect x="321.9" y="421" width="0.8" height="15.0" fill="rgb(253,224,53)" rx="2" ry="2" />
<text x="324.92" y="431.5" ></text>
</g>
<g >
<title>TransactionIdPrecedes (386,075,659 samples, 0.05%)</title><rect x="39.2" y="549" width="0.5" height="15.0" fill="rgb(226,98,23)" rx="2" ry="2" />
<text x="42.18" y="559.5" ></text>
</g>
<g >
<title>iomap_write_begin (113,094,523 samples, 0.01%)</title><rect x="10.5" y="501" width="0.2" height="15.0" fill="rgb(211,30,7)" rx="2" ry="2" />
<text x="13.54" y="511.5" ></text>
</g>
<g >
<title>perform_spin_delay (2,282,751,802 samples, 0.28%)</title><rect x="271.9" y="421" width="3.2" height="15.0" fill="rgb(247,196,46)" rx="2" ry="2" />
<text x="274.87" y="431.5" ></text>
</g>
<g >
<title>update_time (254,983,261 samples, 0.03%)</title><rect x="586.5" y="293" width="0.4" height="15.0" fill="rgb(211,31,7)" rx="2" ry="2" />
<text x="589.50" y="303.5" ></text>
</g>
<g >
<title>try_to_wake_up (128,474,798 samples, 0.02%)</title><rect x="51.3" y="37" width="0.2" height="15.0" fill="rgb(220,70,16)" rx="2" ry="2" />
<text x="54.30" y="47.5" ></text>
</g>
<g >
<title>heap_tuple_should_freeze (3,112,739,937 samples, 0.38%)</title><rect x="213.4" y="453" width="4.5" height="15.0" fill="rgb(247,194,46)" rx="2" ry="2" />
<text x="216.44" y="463.5" ></text>
</g>
<g >
<title>BufferGetPage (2,664,606,174 samples, 0.32%)</title><rect x="1106.0" y="453" width="3.9" height="15.0" fill="rgb(253,220,52)" rx="2" ry="2" />
<text x="1109.04" y="463.5" ></text>
</g>
<g >
<title>BufTableLookup (169,301,111 samples, 0.02%)</title><rect x="134.7" y="373" width="0.2" height="15.0" fill="rgb(224,89,21)" rx="2" ry="2" />
<text x="137.70" y="383.5" ></text>
</g>
<g >
<title>tick_sched_timer (110,974,429 samples, 0.01%)</title><rect x="1086.9" y="389" width="0.2" height="15.0" fill="rgb(254,227,54)" rx="2" ry="2" />
<text x="1089.95" y="399.5" ></text>
</g>
<g >
<title>resched_curr (73,591,642 samples, 0.01%)</title><rect x="1181.2" y="677" width="0.1" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="1184.22" y="687.5" ></text>
</g>
<g >
<title>local_apic_timer_interrupt (342,711,918 samples, 0.04%)</title><rect x="784.3" y="485" width="0.5" height="15.0" fill="rgb(213,37,9)" rx="2" ry="2" />
<text x="787.33" y="495.5" ></text>
</g>
<g >
<title>__write_nocancel (335,326,487 samples, 0.04%)</title><rect x="10.4" y="661" width="0.5" height="15.0" fill="rgb(243,175,42)" rx="2" ry="2" />
<text x="13.39" y="671.5" ></text>
</g>
<g >
<title>set_cpu_sd_state_idle (84,511,405 samples, 0.01%)</title><rect x="1186.6" y="709" width="0.2" height="15.0" fill="rgb(211,29,6)" rx="2" ry="2" />
<text x="1189.65" y="719.5" ></text>
</g>
<g >
<title>do_read_fault.isra.63 (1,608,472,116 samples, 0.20%)</title><rect x="244.3" y="341" width="2.3" height="15.0" fill="rgb(216,52,12)" rx="2" ry="2" />
<text x="247.31" y="351.5" ></text>
</g>
<g >
<title>BufferIsValid (243,620,002 samples, 0.03%)</title><rect x="224.0" y="437" width="0.4" height="15.0" fill="rgb(206,5,1)" rx="2" ry="2" />
<text x="227.03" y="447.5" ></text>
</g>
<g >
<title>TransactionIdFollows (2,590,304,802 samples, 0.31%)</title><rect x="701.9" y="517" width="3.7" height="15.0" fill="rgb(222,79,18)" rx="2" ry="2" />
<text x="704.88" y="527.5" ></text>
</g>
<g >
<title>GetVictimBuffer (920,252,455 samples, 0.11%)</title><rect x="50.3" y="325" width="1.3" height="15.0" fill="rgb(209,18,4)" rx="2" ry="2" />
<text x="53.29" y="335.5" ></text>
</g>
<g >
<title>apic_timer_interrupt (75,510,927 samples, 0.01%)</title><rect x="697.4" y="501" width="0.1" height="15.0" fill="rgb(205,1,0)" rx="2" ry="2" />
<text x="700.37" y="511.5" ></text>
</g>
<g >
<title>maybe_start_bgworkers (124,943,333,272 samples, 15.16%)</title><rect x="54.6" y="693" width="178.9" height="15.0" fill="rgb(240,161,38)" rx="2" ry="2" />
<text x="57.63" y="703.5" >maybe_start_bgworkers</text>
</g>
<g >
<title>native_queued_spin_lock_slowpath (16,850,311,132 samples, 2.04%)</title><rect x="101.4" y="117" width="24.2" height="15.0" fill="rgb(238,153,36)" rx="2" ry="2" />
<text x="104.43" y="127.5" >n..</text>
</g>
<g >
<title>heap_prune_record_unused (1,565,040,447 samples, 0.19%)</title><rect x="936.7" y="485" width="2.3" height="15.0" fill="rgb(227,105,25)" rx="2" ry="2" />
<text x="939.74" y="495.5" ></text>
</g>
<g >
<title>down_read (5,430,949,971 samples, 0.66%)</title><rect x="587.4" y="325" width="7.8" height="15.0" fill="rgb(246,188,45)" rx="2" ry="2" />
<text x="590.39" y="335.5" ></text>
</g>
<g >
<title>__radix_tree_lookup (384,526,653 samples, 0.05%)</title><rect x="60.0" y="229" width="0.6" height="15.0" fill="rgb(253,222,53)" rx="2" ry="2" />
<text x="63.05" y="239.5" ></text>
</g>
</g>
</svg>
Sorry for the very late reply.
On Tue, Jul 30, 2024 at 8:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Sawada-san,
Thank you for testing!
I tried to profile the vacuuming with the larger case (40 workers for the 20G table)
and attached FlameGraph showed the result. IIUC, I cannot find bottlenecks.2.
I compared parallel heap scan and found that it does not have compute_workerAPI.
Can you clarify the reason why there is an inconsistency?
(I feel it is intentional because the calculation logic seems to depend on theheap structure,
so should we add the API for table scan as well?)
There is room to consider a better API design, but yes, the reason is
that the calculation logic depends on table AM implementation. For
example, I thought it might make sense to consider taking the number
of all-visible pages into account for the calculation of the number of
parallel workers as we don't want to launch many workers on the table
where most pages are all-visible. Which might not work for other table
AMs.Okay, thanks for confirming. I wanted to ask others as well.
I'm updating the patch to implement parallel heap vacuum and will
share the updated patch. It might take time as it requires to
implement shared iteration support in radx tree.Here are other preliminary comments for v2 patch. This does not contain
cosmetic ones.1.
Shared data structure PHVShared does not contain the mutex lock. Is it intentional
because they are accessed by leader only after parallel workers exit?
Yes, the fields in PHVShared are read-only for workers. Since no
concurrent reads/writes happen on these fields we don't need to
protect them.
2.
Per my understanding, the vacuuming goes like below steps.a. paralell workers are launched for scanning pages
b. leader waits until scans are done
c. leader does vacuum alone (you may extend here...)
d. parallel workers are launched again to cleanup indecesIf so, can we reuse parallel workers for the cleanup? Or, this is painful
engineering than the benefit?
I've not thought of this idea but I think it's possible from a
technical perspective. It saves overheads of relaunching workers but
I'm not sure how much it would help for a better performance and I'm
concerned it would make the code complex. For example, different
numbers of workers might be required for table vacuuming and index
vacuuming. So we would end up increasing or decreasing workers.
3.
According to LaunchParallelWorkers(), the bgw_name and bgw_type are hardcoded as
"parallel worker ..." Can we extend this to improve the trackability in the
pg_stat_activity?
It would be a good improvement for better trackability. But I think we
should do that in a separate patch as it's not just a problem for
parallel heap vacuum.
4.
I'm not the expert TidStore, but as you said TidStoreLockExclusive() might be a
bottleneck when tid is added to the shared TidStore. My another primitive idea
is that to prepare per-worker TidStores (in the PHVScanWorkerState or LVRelCounters?)
and gather after the heap scanning. If you extend like parallel workers do vacuuming,
the gathering may not be needed: they can access own TidStore and clean up.
One downside is that the memory consumption may be quite large.
Interesting idea. Suppose we support parallel heap vacuum as well, we
wouldn't need locks and to support shared-iteration on TidStore. I
think each worker should use a fraction of maintenance_work_mem.
However, one downside would be that we need to check as many TidStore
as workers during index vacuuming.
FYI I've implemented the parallel heap vacuum part and am doing some
benchmark tests. I'll share the updated patches along with test
results this week.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Oct 22, 2024 at 4:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Sorry for the very late reply.
On Tue, Jul 30, 2024 at 8:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Sawada-san,
Thank you for testing!
I tried to profile the vacuuming with the larger case (40 workers for the 20G table)
and attached FlameGraph showed the result. IIUC, I cannot find bottlenecks.2.
I compared parallel heap scan and found that it does not have compute_workerAPI.
Can you clarify the reason why there is an inconsistency?
(I feel it is intentional because the calculation logic seems to depend on theheap structure,
so should we add the API for table scan as well?)
There is room to consider a better API design, but yes, the reason is
that the calculation logic depends on table AM implementation. For
example, I thought it might make sense to consider taking the number
of all-visible pages into account for the calculation of the number of
parallel workers as we don't want to launch many workers on the table
where most pages are all-visible. Which might not work for other table
AMs.Okay, thanks for confirming. I wanted to ask others as well.
I'm updating the patch to implement parallel heap vacuum and will
share the updated patch. It might take time as it requires to
implement shared iteration support in radx tree.Here are other preliminary comments for v2 patch. This does not contain
cosmetic ones.1.
Shared data structure PHVShared does not contain the mutex lock. Is it intentional
because they are accessed by leader only after parallel workers exit?Yes, the fields in PHVShared are read-only for workers. Since no
concurrent reads/writes happen on these fields we don't need to
protect them.2.
Per my understanding, the vacuuming goes like below steps.a. paralell workers are launched for scanning pages
b. leader waits until scans are done
c. leader does vacuum alone (you may extend here...)
d. parallel workers are launched again to cleanup indecesIf so, can we reuse parallel workers for the cleanup? Or, this is painful
engineering than the benefit?I've not thought of this idea but I think it's possible from a
technical perspective. It saves overheads of relaunching workers but
I'm not sure how much it would help for a better performance and I'm
concerned it would make the code complex. For example, different
numbers of workers might be required for table vacuuming and index
vacuuming. So we would end up increasing or decreasing workers.3.
According to LaunchParallelWorkers(), the bgw_name and bgw_type are hardcoded as
"parallel worker ..." Can we extend this to improve the trackability in the
pg_stat_activity?It would be a good improvement for better trackability. But I think we
should do that in a separate patch as it's not just a problem for
parallel heap vacuum.4.
I'm not the expert TidStore, but as you said TidStoreLockExclusive() might be a
bottleneck when tid is added to the shared TidStore. My another primitive idea
is that to prepare per-worker TidStores (in the PHVScanWorkerState or LVRelCounters?)
and gather after the heap scanning. If you extend like parallel workers do vacuuming,
the gathering may not be needed: they can access own TidStore and clean up.
One downside is that the memory consumption may be quite large.Interesting idea. Suppose we support parallel heap vacuum as well, we
wouldn't need locks and to support shared-iteration on TidStore. I
think each worker should use a fraction of maintenance_work_mem.
However, one downside would be that we need to check as many TidStore
as workers during index vacuuming.
On further thoughts, I think this idea doesn't go well. The index
vacuuming is the most time-consuming phase among vacuum phases, I
think it would not be a good idea to make it slower even if we can do
parallel heap scan and heap vacuum without any locking. Also, merging
multiple TidStore to one is not straightforward since the block ranges
that each worker processes are overlapped.
FYI I've implemented the parallel heap vacuum part and am doing some
benchmark tests. I'll share the updated patches along with test
results this week.
Please find the attached patches. From the previous version, I made a
lot of changes including bug fixes, addressing review comments, and
adding parallel heap vacuum support. Parallel vacuum related
infrastructure are implemented in vacuumparallel.c, and lazyvacuum.c
now uses ParallelVacuumState for parallel heap scan/vacuum, index
bulkdelete/cleanup, or both. Parallel vacuum workers launch at the
beginning of each phase, and exit at the end of each phase. Since
different numbers of workers could be used for heap scan/vacuum and
index bulkdelete/cleanup, it's possible that only either heap
scan/vacuum or index bulkdelete/cleanup is parallelized.
In order to implement parallel heap vacuum, I extended radix tree and
tidstore to support shared iteration. The shared iteration works only
with a shared tidstore but a non-shared iteration works with a local
tidstore as well as a shared tidstore. For example, if a table is
large and has one index, we use only parallel heap scan/vacuum. In
this case, we store dead item TIDs into a shared tidstore during
parallel heap scan, but during parallel index bulk-deletion we perform
a non-shared iteration on the shared tidstore, which is more efficient
as it doesn't acquire any locks during the iteration.
I've done benchmark tests with a 10GB unlogged table (created on a
tmpfs tablespace) having 4 btree indexes while changing parallel
degrees. I restarted postgres server before each run to ensure that
data is not on the shared memory. I avoided disk I/O during lazy
vacuum as much as possible. Here is comparison between HEAD and
patched (took median of 5 runs):
+----------+-----------+-----------+-------------+
| parallel | HEAD | patched | improvement |
+----------+-----------+-----------+-------------+
| 0 | 53079.53 | 53468.734 | 1.007 |
| 1 | 48101.46 | 35712.613 | 0.742 |
| 2 | 37767.902 | 23566.426 | 0.624 |
| 4 | 38005.836 | 20192.055 | 0.531 |
| 8 | 37754.47 | 18614.717 | 0.493 |
+----------+-----------+-----------+-------------+
Here is the breakdowns of execution times of each vacuum phase (from
left, heap scan, index bulkdel, and heap vacuum):
- HEAD
parallel 0: 53079.530 (15886, 28039, 9270)
parallel 1: 48101.460 (15931, 23247, 9215)
parallel 2: 37767.902 (15259, 12888, 9479)
parallel 4: 38005.836 (16097, 12683, 9217)
parallel 8: 37754.470 (16016, 12535, 9306)
- Patched
parallel 0: 53468.734 (15990, 28296, 9465)
parallel 1: 35712.613 ( 8254, 23569, 3700)
parallel 2: 23566.426 ( 6180, 12760, 3283)
parallel 4: 20192.055 ( 4058, 12776, 2154)
parallel 8: 18614.717 ( 2797, 13244, 1579)
The index bulkdel phase is saturated with parallel 2 as one worker is
assigned to one index. On HEAD, there is no further performance gain
with more than 'parallel 4'. On the other hand, on Patched, it got
faster even at 'parallel 4' and 'parallel 8' since other two phases
were also done with parallel workers.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0004-Support-parallel-heap-vacuum-during-lazy-vacuum.patchapplication/octet-stream; name=v3-0004-Support-parallel-heap-vacuum-during-lazy-vacuum.patchDownload
From dd9f54e11877f7de08b084eac1701b35859e0fbc Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:37:45 -0700
Subject: [PATCH v3 4/4] Support parallel heap vacuum during lazy vacuum.
This commit further extends parallel vacuum to perform the heap vacuum
phase with parallel workers. It leverages the shared TidStore iteration.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/heap/vacuumlazy.c | 157 ++++++++++++++++++---------
1 file changed, 106 insertions(+), 51 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fd6c054901..6c22ca5a62 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -160,6 +160,7 @@ typedef struct LVRelScanStats
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber vacuumed_pages; /* # pages vacuumed in one second pass */
/* Counters that follow are only for scanned_pages */
int64 tuples_deleted; /* # deleted from table */
@@ -192,6 +193,9 @@ typedef struct PHVShared
struct VacuumCutoffs cutoffs;
GlobalVisState vistest;
+ dsa_pointer shared_iter_handle;
+ bool do_heap_vacuum;
+
/* per-worker scan stats for parallel heap vacuum scan */
LVRelScanStats worker_scan_stats[FLEXIBLE_ARRAY_MEMBER];
} PHVShared;
@@ -353,6 +357,7 @@ static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
static void lazy_vacuum(LVRelState *vacrel);
static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
static void lazy_vacuum_heap_rel(LVRelState *vacrel);
+static void do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter);
static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
Buffer buffer, OffsetNumber *deadoffsets,
int num_offsets, Buffer vmbuffer);
@@ -531,6 +536,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
scan_stats->lpdead_item_pages = 0;
scan_stats->missed_dead_pages = 0;
scan_stats->nonempty_pages = 0;
+ scan_stats->vacuumed_pages = 0;
/* Initialize remaining counters (be tidy) */
scan_stats->tuples_deleted = 0;
@@ -2363,46 +2369,14 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
return allindexes;
}
-/*
- * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
- *
- * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
- * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
- *
- * We may also be able to truncate the line pointer array of the heap pages we
- * visit. If there is a contiguous group of LP_UNUSED items at the end of the
- * array, it can be reclaimed as free space. These LP_UNUSED items usually
- * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
- * each page to LP_UNUSED, and then consider if it's possible to truncate the
- * page's line pointer array).
- *
- * Note: the reason for doing this as a second pass is we cannot remove the
- * tuples until we've removed their index entries, and we want to process
- * index entry removal in batches as large as possible.
- */
static void
-lazy_vacuum_heap_rel(LVRelState *vacrel)
+do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter)
{
- BlockNumber vacuumed_pages = 0;
Buffer vmbuffer = InvalidBuffer;
- LVSavedErrInfo saved_err_info;
- TidStoreIter *iter;
- TidStoreIterResult *iter_result;
-
- Assert(vacrel->do_index_vacuuming);
- Assert(vacrel->do_index_cleanup);
- Assert(vacrel->num_index_scans > 0);
- /* Report that we are now vacuuming the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-
- /* Update error traceback information */
- update_vacuum_error_info(vacrel, &saved_err_info,
- VACUUM_ERRCB_PHASE_VACUUM_HEAP,
- InvalidBlockNumber, InvalidOffsetNumber);
+ /* LVSavedErrInfo saved_err_info; */
+ TidStoreIterResult *iter_result;
- iter = TidStoreBeginIterate(vacrel->dead_items);
while ((iter_result = TidStoreIterateNext(iter)) != NULL)
{
BlockNumber blkno;
@@ -2440,26 +2414,88 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
UnlockReleaseBuffer(buf);
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
- vacuumed_pages++;
+ vacrel->scan_stats->vacuumed_pages++;
}
- TidStoreEndIterate(iter);
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
+}
+
+/*
+ * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
+ *
+ * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
+ * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
+ *
+ * We may also be able to truncate the line pointer array of the heap pages we
+ * visit. If there is a contiguous group of LP_UNUSED items at the end of the
+ * array, it can be reclaimed as free space. These LP_UNUSED items usually
+ * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
+ * each page to LP_UNUSED, and then consider if it's possible to truncate the
+ * page's line pointer array).
+ *
+ * Note: the reason for doing this as a second pass is we cannot remove the
+ * tuples until we've removed their index entries, and we want to process
+ * index entry removal in batches as large as possible.
+ */
+static void
+lazy_vacuum_heap_rel(LVRelState *vacrel)
+{
+ LVSavedErrInfo saved_err_info;
+ TidStoreIter *iter;
+
+ Assert(vacrel->do_index_vacuuming);
+ Assert(vacrel->do_index_cleanup);
+ Assert(vacrel->num_index_scans > 0);
+
+ /* Report that we are now vacuuming the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
+
+ /* Update error traceback information */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ vacrel->scan_stats->vacuumed_pages = 0;
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ iter = TidStoreBeginIterateShared(vacrel->dead_items);
+
+ phvstate->shared->do_heap_vacuum = true;
+ phvstate->shared->shared_iter_handle = TidStoreGetSharedIterHandle(iter);
+
+ /* launch workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+ }
+ else
+ iter = TidStoreBeginIterate(vacrel->dead_items);
+
+ /* do the real work */
+ do_lazy_vacuum_heap_rel(vacrel, iter);
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+
+ TidStoreEndIterate(iter);
+
/*
* We set all LP_DEAD items from the first heap pass to LP_UNUSED during
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
(vacrel->dead_items_info->num_items == vacrel->scan_stats->lpdead_items &&
- vacuumed_pages == vacrel->scan_stats->lpdead_item_pages));
+ vacrel->scan_stats->vacuumed_pages == vacrel->scan_stats->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) vacrel->dead_items_info->num_items,
- vacuumed_pages)));
+ vacrel->scan_stats->vacuumed_pages)));
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3563,7 +3599,6 @@ heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
PHVScanWorkerState *scanstate;
LVRelScanStats *scan_stats;
ErrorContextCallback errcallback;
- bool scan_done;
phvstate = palloc(sizeof(PHVState));
@@ -3625,25 +3660,44 @@ heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
vacrel.relname = pstrdup(RelationGetRelationName(rel));
vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
errcallback.callback = vacuum_error_callback;
errcallback.arg = &vacrel;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
- scan_done = do_lazy_scan_heap(&vacrel);
+ if (shared->do_heap_vacuum)
+ {
+ TidStoreIter *iter;
+
+ iter = TidStoreAttachIterateShared(vacrel.dead_items, shared->shared_iter_handle);
+
+ /* Join parallel heap vacuum */
+ vacrel.phase = VACUUM_ERRCB_PHASE_VACUUM_HEAP;
+ do_lazy_vacuum_heap_rel(&vacrel, iter);
+
+ TidStoreEndIterate(iter);
+ }
+ else
+ {
+ bool scan_done;
+
+ /* Join parallel heap scan */
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in
+ * its scan state. Since this scan state might not be used in the next
+ * heap scan, we remember that it might have some unconsumed blocks so
+ * that the leader complete the scans after the heap scan phase
+ * finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+ }
/* Pop the error context stack */
error_context_stack = errcallback.previous;
-
- /*
- * If the leader or a worker finishes the heap scan because dead_items
- * TIDs is close to the limit, it might have some allocated blocks in its
- * scan state. Since this scan state might not be used in the next heap
- * scan, we remember that it might have some unconsumed blocks so that the
- * leader complete the scans after the heap scan phase finishes.
- */
- phvstate->myscanstate->maybe_have_blocks = !scan_done;
}
/*
@@ -3771,6 +3825,7 @@ do_parallel_lazy_scan_heap(LVRelState *vacrel)
Assert(!IsParallelWorker());
/* launcher workers */
+ vacrel->phvstate->shared->do_heap_vacuum = false;
vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
/* initialize parallel scan description to join as a worker */
--
2.43.5
v3-0003-Support-shared-itereation-on-TidStore.patchapplication/octet-stream; name=v3-0003-Support-shared-itereation-on-TidStore.patchDownload
From 09b7bcd6c8e3fbc9438c6edf1aac75a55b3909be Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:34:57 -0700
Subject: [PATCH v3 3/4] Support shared itereation on TidStore.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/common/tidstore.c | 59 ++++++++++++++++++++++++++++
src/include/access/tidstore.h | 3 ++
2 files changed, 62 insertions(+)
diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index a7179759d6..637d26012d 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -483,6 +483,7 @@ TidStoreBeginIterate(TidStore *ts)
iter = palloc0(sizeof(TidStoreIter));
iter->ts = ts;
+ /* begin iteration on the radix tree */
if (TidStoreIsShared(ts))
iter->tree_iter.shared = shared_ts_begin_iterate(ts->tree.shared);
else
@@ -533,6 +534,56 @@ TidStoreEndIterate(TidStoreIter *iter)
pfree(iter);
}
+/*
+ * Prepare to iterate through a shared TidStore in shared mode. This function
+ * is aimed to start the iteration on the given TidStore with parallel workers.
+ *
+ * The TidStoreIter struct is created in the caller's memory context, and it
+ * will be freed in TidStoreEndIterate.
+ *
+ * The caller is responsible for locking TidStore until the iteration is
+ * finished.
+ */
+TidStoreIter *
+TidStoreBeginIterateShared(TidStore *ts)
+{
+ TidStoreIter *iter;
+
+ if (!TidStoreIsShared(ts))
+ elog(ERROR, "cannot begin shared iteration on local TidStore");
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* begin the shared iteration on radix tree */
+ iter->tree_iter.shared =
+ (shared_ts_iter *) shared_ts_begin_iterate_shared(ts->tree.shared);
+
+ return iter;
+}
+
+/*
+ * Attach to the shared TidStore iterator. 'iter_handle' is the dsa_pointer
+ * returned by TidStoreGetSharedIterHandle(). The returned object is allocated
+ * in backend-local memory using CurrentMemoryContext.
+ */
+TidStoreIter *
+TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle)
+{
+ TidStoreIter *iter;
+
+ Assert(TidStoreIsShared(ts));
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* Attach to the shared iterator */
+ iter->tree_iter.shared = shared_ts_attach_iterate_shared(ts->tree.shared,
+ iter_handle);
+
+ return iter;
+}
+
/*
* Return the memory usage of TidStore.
*/
@@ -564,6 +615,14 @@ TidStoreGetHandle(TidStore *ts)
return (dsa_pointer) shared_ts_get_handle(ts->tree.shared);
}
+dsa_pointer
+TidStoreGetSharedIterHandle(TidStoreIter *iter)
+{
+ Assert(TidStoreIsShared(iter->ts));
+
+ return (dsa_pointer) shared_ts_get_iter_handle(iter->tree_iter.shared);
+}
+
/*
* Given a TidStoreIterResult returned by TidStoreIterateNext(), extract the
* offset numbers. Returns the number of offsets filled in, if <=
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index d95cabd7b5..0c79a101fd 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -37,6 +37,9 @@ extern void TidStoreDetach(TidStore *ts);
extern void TidStoreLockExclusive(TidStore *ts);
extern void TidStoreLockShare(TidStore *ts);
extern void TidStoreUnlock(TidStore *ts);
+extern TidStoreIter *TidStoreBeginIterateShared(TidStore *ts);
+extern TidStoreIter *TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle);
+extern dsa_pointer TidStoreGetSharedIterHandle(TidStoreIter *iter);
extern void TidStoreDestroy(TidStore *ts);
extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
int num_offsets);
--
2.43.5
v3-0001-Support-parallel-heap-scan-during-lazy-vacuum.patchapplication/octet-stream; name=v3-0001-Support-parallel-heap-scan-during-lazy-vacuum.patchDownload
From a8c8a2bbf943b157eb6f0e754cb9aaa432e5bce3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 1 Jul 2024 15:17:46 +0900
Subject: [PATCH v3 1/4] Support parallel heap scan during lazy vacuum.
Commit 40d964ec99 allowed vacuum command to process indexes in
parallel. This change extends the parallel vacuum to support parallel
heap scan during lazy vacuum.
---
src/backend/access/heap/heapam_handler.c | 6 +
src/backend/access/heap/vacuumlazy.c | 1135 ++++++++++++++++++----
src/backend/commands/vacuumparallel.c | 311 +++++-
src/backend/storage/ipc/procarray.c | 9 -
src/include/access/heapam.h | 8 +
src/include/access/tableam.h | 87 ++
src/include/commands/vacuum.h | 8 +-
src/include/utils/snapmgr.h | 14 +-
8 files changed, 1313 insertions(+), 265 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8c59b77b64..c8602f4d30 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2625,6 +2625,12 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_data = heapam_relation_copy_data,
.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
.relation_vacuum = heap_vacuum_rel,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,
+ .parallel_vacuum_estimate = heap_parallel_vacuum_estimate,
+ .parallel_vacuum_initialize = heap_parallel_vacuum_initialize,
+ .parallel_vacuum_scan_worker = heap_parallel_vacuum_scan_worker,
+
.scan_analyze_next_block = heapam_scan_analyze_next_block,
.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
.index_build_range_scan = heapam_index_build_range_scan,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d82aa3d489..fd6c054901 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -49,6 +49,7 @@
#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
+#include "optimizer/paths.h"
#include "pgstat.h"
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
@@ -117,10 +118,24 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * Macro to check if we are in a parallel vacuum. If true, we are in the
- * parallel mode and the DSM segment is initialized.
+ * DSM keys for heap parallel vacuum scan. Unlike other parallel execution code, we
+ * we don't need to worry about DSM keys conflicting with plan_node_id, but need to
+ * avoid conflicting with DSM keys used in vacuumparallel.c.
+ */
+#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003
+
+/*
+ * Macros to check if we are in parallel heap vacuuming, parallel index vacuuming,
+ * or both. If ParallelVacuumIsActive() is true, we are in the parallel mode, meaning
+ * that we have dead items TIDs on shared memory area.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
+#define ParallelIndexVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_index((vacrel)->pvs) > 0)
+#define ParallelHeapVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_table((vacrel)->pvs) > 0)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -133,6 +148,108 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;
+/*
+ * Relation statistics collected during heap scanning and need to be shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelScanStats
+{
+ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation truncation */
+ BlockNumber frozen_pages; /* # pages with newly frozen tuples */
+ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
+ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+
+ /* Counters that follow are only for scanned_pages */
+ int64 tuples_deleted; /* # deleted from table */
+ int64 tuples_frozen; /* # newly frozen */
+ int64 lpdead_items; /* # deleted from indexes */
+ int64 live_tuples; /* # live tuples remaining */
+ int64 recently_dead_tuples; /* # dead, but not yet removable */
+ int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid. */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+} LVRelScanStats;
+
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
+ /* The initial values shared by the leader process */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+
+ /* VACUUM operation's cutoffs for freezing and pruning */
+ struct VacuumCutoffs cutoffs;
+ GlobalVisState vistest;
+
+ /* per-worker scan stats for parallel heap vacuum scan */
+ LVRelScanStats worker_scan_stats[FLEXIBLE_ARRAY_MEMBER];
+} PHVShared;
+#define SizeOfPHVShared (offsetof(PHVShared, worker_scan_stats))
+
+/* Per-worker scan state for parallel heap vacuum scan */
+typedef struct PHVScanWorkerState
+{
+ bool initialized;
+
+ /* per-worker parallel table scan state */
+ ParallelBlockTableScanWorkerData state;
+
+ /*
+ * True if a parallel vacuum scan worker allocated blocks in state but
+ * might have not scanned all of them. The leader process will take over
+ * for scanning these remaining blocks.
+ */
+ bool maybe_have_blocks;
+
+ /* current block number being processed */
+ pg_atomic_uint32 cur_blkno;
+} PHVScanWorkerState;
+
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
+
+ /*
+ * Points to all per-worker scan state array stored on DSM area.
+ *
+ * During parallel heap scan, each worker allocates some chunks of blocks
+ * to scan in its scan state, and could exit while leaving some chunks
+ * un-scanned if the size of dead_items TIDs is close to overrunning the
+ * the available space. We store scan states on shared memory area so that
+ * workers can resume heap scans from the previous point.
+ */
+ PHVScanWorkerState *scanstates;
+
+ /* Assigned per-worker scan state */
+ PHVScanWorkerState *myscanstate;
+
+ /*
+ * All blocks up to this value has been scanned, i.e. minimum of cur_blkno
+ * among all PHVScanWorkerState. It's updated by
+ * parallel_heap_vacuum_compute_min_blkno().
+ */
+ BlockNumber min_blkno;
+
+ /* The number of workers launched for parallel heap vacuum */
+ int nworkers_launched;
+} PHVState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -144,6 +261,9 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
+ /* Parallel heap vacuum state and sizes for each struct */
+ PHVState *phvstate;
+
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
@@ -159,10 +279,6 @@ typedef struct LVRelState
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
- /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
- TransactionId NewRelfrozenXid;
- MultiXactId NewRelminMxid;
- bool skippedallvis;
/* Error reporting state */
char *dbname;
@@ -188,12 +304,10 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
- BlockNumber removed_pages; /* # pages removed by relation truncation */
- BlockNumber frozen_pages; /* # pages with newly frozen tuples */
- BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
- BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber next_fsm_block_to_vacuum;
+
+ /* Statistics collected during heap scan */
+ LVRelScanStats *scan_stats;
/* Statistics output by us, for table */
double new_rel_tuples; /* new estimated total # of tuples */
@@ -203,13 +317,6 @@ typedef struct LVRelState
/* Instrumentation counters */
int num_index_scans;
- /* Counters that follow are only for scanned_pages */
- int64 tuples_deleted; /* # deleted from table */
- int64 tuples_frozen; /* # newly frozen */
- int64 lpdead_items; /* # deleted from indexes */
- int64 live_tuples; /* # live tuples remaining */
- int64 recently_dead_tuples; /* # dead, but not yet removable */
- int64 missed_dead_tuples; /* # removable, but not removed */
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
@@ -229,6 +336,7 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
+static bool do_lazy_scan_heap(LVRelState *vacrel);
static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
bool *all_visible_according_to_vm);
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
@@ -271,6 +379,12 @@ static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
+
+static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
+static void parallel_heap_vacuum_compute_min_blkno(LVRelState *vacrel);
+static void parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel);
+static void parallel_heap_complete_unfinised_scan(LVRelState *vacrel);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -296,6 +410,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
+ LVRelScanStats *scan_stats;
bool verbose,
instrument,
skipwithvm,
@@ -406,14 +521,28 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
+ vacrel->next_fsm_block_to_vacuum = 0;
+
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->frozen_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
- /* dead_items_alloc allocates vacrel->dead_items later on */
+ scan_stats = palloc(sizeof(LVRelScanStats));
+ scan_stats->scanned_pages = 0;
+ scan_stats->removed_pages = 0;
+ scan_stats->frozen_pages = 0;
+ scan_stats->lpdead_item_pages = 0;
+ scan_stats->missed_dead_pages = 0;
+ scan_stats->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ scan_stats->tuples_deleted = 0;
+ scan_stats->tuples_frozen = 0;
+ scan_stats->lpdead_items = 0;
+ scan_stats->live_tuples = 0;
+ scan_stats->recently_dead_tuples = 0;
+ scan_stats->missed_dead_tuples = 0;
+
+ vacrel->scan_stats = scan_stats;
+
+ vacrel->num_index_scans = 0;
/* Allocate/initialize output statistics state */
vacrel->new_rel_tuples = 0;
@@ -421,14 +550,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
- /* Initialize remaining counters (be tidy) */
- vacrel->num_index_scans = 0;
- vacrel->tuples_deleted = 0;
- vacrel->tuples_frozen = 0;
- vacrel->lpdead_items = 0;
- vacrel->live_tuples = 0;
- vacrel->recently_dead_tuples = 0;
- vacrel->missed_dead_tuples = 0;
+ /* dead_items_alloc allocates vacrel->dead_items later on */
/*
* Get cutoffs that determine which deleted tuples are considered DEAD,
@@ -450,9 +572,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
/* Initialize state used to track oldest extant XID/MXID */
- vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
- vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- vacrel->skippedallvis = false;
+ vacrel->scan_stats->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
+ vacrel->scan_stats->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+ vacrel->scan_stats->skippedallvis = false;
skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
{
@@ -533,15 +655,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
* Non-aggressive VACUUMs may advance them by any amount, or not at all.
*/
- Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
+ Assert(vacrel->scan_stats->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
vacrel->cutoffs.relfrozenxid,
- vacrel->NewRelfrozenXid));
- Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
+ vacrel->scan_stats->NewRelfrozenXid));
+ Assert(vacrel->scan_stats->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
vacrel->cutoffs.relminmxid,
- vacrel->NewRelminMxid));
- if (vacrel->skippedallvis)
+ vacrel->scan_stats->NewRelminMxid));
+ if (vacrel->scan_stats->skippedallvis)
{
/*
* Must keep original relfrozenxid in a non-aggressive VACUUM that
@@ -549,8 +671,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* values will have missed unfrozen XIDs from the pages we skipped.
*/
Assert(!vacrel->aggressive);
- vacrel->NewRelfrozenXid = InvalidTransactionId;
- vacrel->NewRelminMxid = InvalidMultiXactId;
+ vacrel->scan_stats->NewRelfrozenXid = InvalidTransactionId;
+ vacrel->scan_stats->NewRelminMxid = InvalidMultiXactId;
}
/*
@@ -571,7 +693,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
*/
vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
new_rel_allvisible, vacrel->nindexes > 0,
- vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
+ vacrel->scan_stats->NewRelfrozenXid, vacrel->scan_stats->NewRelminMxid,
&frozenxid_updated, &minmulti_updated, false);
/*
@@ -587,8 +709,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
- vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples);
+ vacrel->scan_stats->recently_dead_tuples +
+ vacrel->scan_stats->missed_dead_tuples);
pgstat_progress_end_command();
if (instrument)
@@ -661,21 +783,21 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
- vacrel->removed_pages,
+ vacrel->scan_stats->removed_pages,
new_rel_pages,
- vacrel->scanned_pages,
+ vacrel->scan_stats->scanned_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->scanned_pages / orig_rel_pages);
+ 100.0 * vacrel->scan_stats->scanned_pages / orig_rel_pages);
appendStringInfo(&buf,
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
- (long long) vacrel->tuples_deleted,
+ (long long) vacrel->scan_stats->tuples_deleted,
(long long) vacrel->new_rel_tuples,
- (long long) vacrel->recently_dead_tuples);
- if (vacrel->missed_dead_tuples > 0)
+ (long long) vacrel->scan_stats->recently_dead_tuples);
+ if (vacrel->scan_stats->missed_dead_tuples > 0)
appendStringInfo(&buf,
_("tuples missed: %lld dead from %u pages not removed due to cleanup lock contention\n"),
- (long long) vacrel->missed_dead_tuples,
- vacrel->missed_dead_pages);
+ (long long) vacrel->scan_stats->missed_dead_tuples,
+ vacrel->scan_stats->missed_dead_pages);
diff = (int32) (ReadNextTransactionId() -
vacrel->cutoffs.OldestXmin);
appendStringInfo(&buf,
@@ -683,25 +805,25 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->cutoffs.OldestXmin, diff);
if (frozenxid_updated)
{
- diff = (int32) (vacrel->NewRelfrozenXid -
+ diff = (int32) (vacrel->scan_stats->NewRelfrozenXid -
vacrel->cutoffs.relfrozenxid);
appendStringInfo(&buf,
_("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
- vacrel->NewRelfrozenXid, diff);
+ vacrel->scan_stats->NewRelfrozenXid, diff);
}
if (minmulti_updated)
{
- diff = (int32) (vacrel->NewRelminMxid -
+ diff = (int32) (vacrel->scan_stats->NewRelminMxid -
vacrel->cutoffs.relminmxid);
appendStringInfo(&buf,
_("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
- vacrel->NewRelminMxid, diff);
+ vacrel->scan_stats->NewRelminMxid, diff);
}
appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %lld tuples frozen\n"),
- vacrel->frozen_pages,
+ vacrel->scan_stats->frozen_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->frozen_pages / orig_rel_pages,
- (long long) vacrel->tuples_frozen);
+ 100.0 * vacrel->scan_stats->frozen_pages / orig_rel_pages,
+ (long long) vacrel->scan_stats->tuples_frozen);
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
@@ -721,10 +843,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
appendStringInfo(&buf, msgfmt,
- vacrel->lpdead_item_pages,
+ vacrel->scan_stats->lpdead_item_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
- (long long) vacrel->lpdead_items);
+ 100.0 * vacrel->scan_stats->lpdead_item_pages / orig_rel_pages,
+ (long long) vacrel->scan_stats->lpdead_items);
for (int i = 0; i < vacrel->nindexes; i++)
{
IndexBulkDeleteResult *istat = vacrel->indstats[i];
@@ -825,14 +947,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- blkno,
- next_fsm_block_to_vacuum = 0;
- bool all_visible_according_to_vm;
-
- TidStore *dead_items = vacrel->dead_items;
+ BlockNumber rel_pages = vacrel->rel_pages;
VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
- Buffer vmbuffer = InvalidBuffer;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -852,6 +968,72 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap with parallel workers.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);
+
+ /* report that everything is now scanned */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, rel_pages);
+
+ /* now we can compute the new value for pg_class.reltuples */
+ vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
+ vacrel->scan_stats->scanned_pages,
+ vacrel->scan_stats->live_tuples);
+
+ /*
+ * Also compute the total number of surviving heap entries. In the
+ * (unlikely) scenario that new_live_tuples is -1, take it as zero.
+ */
+ vacrel->new_rel_tuples =
+ Max(vacrel->new_live_tuples, 0) + vacrel->scan_stats->recently_dead_tuples +
+ vacrel->scan_stats->missed_dead_tuples;
+
+ /*
+ * Do index vacuuming (call each index's ambulkdelete routine), then do
+ * related heap vacuuming
+ */
+ if (dead_items_info->num_items > 0)
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the remainder of the Free Space Map. We must do this whether or
+ * not there were indexes, and whether or not we bypassed index vacuuming.
+ */
+ if (rel_pages > vacrel->next_fsm_block_to_vacuum)
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ rel_pages);
+
+ /* report all blocks vacuumed */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, rel_pages);
+
+ /* Do final index cleanup (call each index's amvacuumcleanup routine) */
+ if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
+ lazy_cleanup_all_indexes(vacrel);
+}
+
+/*
+ * Workhorse for lazy_scan_heap().
+ *
+ * Return true if we processed all blocks, otherwise false if we exit from this function
+ * while not completing the heap scan due to full of dead item TIDs. In serial heap scan
+ * case, this function always returns true. In parallel heap vacuum scan, this function
+ * is called by both worker processes and the leader process, and could return false.
+ */
+static bool
+do_lazy_scan_heap(LVRelState *vacrel)
+{
+ bool all_visible_according_to_vm;
+ TidStore *dead_items = vacrel->dead_items;
+ VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
+ BlockNumber blkno;
+ Buffer vmbuffer = InvalidBuffer;
+ bool scan_done = true;
+
while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
{
Buffer buf;
@@ -859,13 +1041,20 @@ lazy_scan_heap(LVRelState *vacrel)
bool has_lpdead_items;
bool got_cleanup_lock = false;
- vacrel->scanned_pages++;
+ vacrel->scan_stats->scanned_pages++;
/* Report as block scanned, update error traceback information */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
blkno, InvalidOffsetNumber);
+ /*
+ * If parallel vacuum scan is enabled, advertise the current block
+ * number
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ pg_atomic_write_u32(&(vacrel->phvstate->myscanstate->cur_blkno), (uint32) blkno);
+
vacuum_delay_point();
/*
@@ -877,46 +1066,10 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (!IsParallelWorker() &&
+ vacrel->scan_stats->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
- /*
- * Consider if we definitely have enough space to process TIDs on page
- * already. If we are close to overrunning the available space for
- * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
- * this page.
- */
- if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
- {
- /*
- * Before beginning index vacuuming, we release any pin we may
- * hold on the visibility map page. This isn't necessary for
- * correctness, but we do it anyway to avoid holding the pin
- * across a lengthy, unrelated operation.
- */
- if (BufferIsValid(vmbuffer))
- {
- ReleaseBuffer(vmbuffer);
- vmbuffer = InvalidBuffer;
- }
-
- /* Perform a round of index and heap vacuuming */
- vacrel->consider_bypass_optimization = false;
- lazy_vacuum(vacrel);
-
- /*
- * Vacuum the Free Space Map to make newly-freed space visible on
- * upper-level FSM pages. Note we have not yet processed blkno.
- */
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
-
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
- }
-
/*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
@@ -1005,9 +1158,10 @@ lazy_scan_heap(LVRelState *vacrel)
* revisit this page. Since updating the FSM is desirable but not
* absolutely required, that's OK.
*/
- if (vacrel->nindexes == 0
- || !vacrel->do_index_vacuuming
- || !has_lpdead_items)
+ if (!IsParallelWorker() &&
+ (vacrel->nindexes == 0
+ || !vacrel->do_index_vacuuming
+ || !has_lpdead_items))
{
Size freespace = PageGetHeapFreeSpace(page);
@@ -1021,57 +1175,172 @@ lazy_scan_heap(LVRelState *vacrel)
* held the cleanup lock and lazy_scan_prune() was called.
*/
if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
- blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ blkno - vacrel->next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
+ BlockNumber fsm_vac_up_to;
+
+ /*
+ * If parallel heap vacuum scan is active, compute the minimum
+ * block number we scanned so far.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ parallel_heap_vacuum_compute_min_blkno(vacrel);
+ fsm_vac_up_to = vacrel->phvstate->min_blkno;
+ }
+ else
+ {
+ /* blkno is already processed */
+ fsm_vac_up_to = blkno + 1;
+ }
+
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ fsm_vac_up_to);
+ vacrel->next_fsm_block_to_vacuum = fsm_vac_up_to;
}
}
else
UnlockReleaseBuffer(buf);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
+ {
+ /*
+ * Before beginning index vacuuming, we release any pin we may
+ * hold on the visibility map page. This isn't necessary for
+ * correctness, but we do it anyway to avoid holding the pin
+ * across a lengthy, unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ /* Remember we might have some unprocessed blocks */
+ scan_done = false;
+
+ /*
+ * Pause the heap scan without invoking index and heap
+ * vacuuming. The leader process also skips FSM vacuum since
+ * some blocks before blkno might have not processed yet. The
+ * leader will wait for all workers to finish and perform
+ * index and heap vacuuming, and then perform FSM vacuum.
+ */
+ break;
+ }
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = blkno;
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ continue;
+ }
}
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
- /* report that everything is now scanned */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ return scan_done;
+}
- /* now we can compute the new value for pg_class.reltuples */
- vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scanned_pages,
- vacrel->live_tuples);
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
+{
+ PHVState *phvstate = vacrel->phvstate;
+ BlockNumber next_block;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 mapbits = 0;
- /*
- * Also compute the total number of surviving heap entries. In the
- * (unlikely) scenario that new_live_tuples is -1, take it as zero.
- */
- vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples;
+ Assert(ParallelHeapVacuumIsActive(vacrel));
- /*
- * Do index vacuuming (call each index's ambulkdelete routine), then do
- * related heap vacuuming
- */
- if (dead_items_info->num_items > 0)
- lazy_vacuum(vacrel);
+ for (;;)
+ {
+ next_block = table_block_parallelscan_nextpage(vacrel->rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
- /*
- * Vacuum the remainder of the Free Space Map. We must do this whether or
- * not there were indexes, and whether or not we bypassed index vacuuming.
- */
- if (blkno > next_fsm_block_to_vacuum)
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
+ /* Have we reached the end of the table? */
+ if (!BlockNumberIsValid(next_block) || next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
- /* report all blocks vacuumed */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
- /* Do final index cleanup (call each index's amvacuumcleanup routine) */
- if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
- lazy_cleanup_all_indexes(vacrel);
+ /* We always treat the last block as unsafe to skip */
+ if (next_block == vacrel->rel_pages - 1)
+ break;
+
+ mapbits = visibilitymap_get_status(vacrel->rel, next_block, &vmbuffer);
+
+ /*
+ * A block is unskippable if it is not all visible according to the
+ * visibility map.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
+ break;
+ }
+
+ /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
+ if (!vacrel->skipwithvm)
+ break;
+
+ /*
+ * Aggressive VACUUM caller can't skip pages just because they are
+ * all-visible.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;
+
+ /*
+ * All-visible block is safe to skip in non-aggressive case. But
+ * remember that the final range contains such a block for later.
+ */
+ vacrel->scan_stats->skippedallvis = true;
+ }
+ }
+
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
+
+ *blkno = next_block;
+ *all_visible_according_to_vm = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ return true;
}
/*
@@ -1098,6 +1367,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
BlockNumber next_block;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ return heap_vac_scan_next_block_parallel(vacrel, blkno, all_visible_according_to_vm);
+
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1147,7 +1419,7 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
next_block = vacrel->next_unskippable_block;
if (skipsallvis)
- vacrel->skippedallvis = true;
+ vacrel->scan_stats->skippedallvis = true;
}
}
@@ -1220,11 +1492,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/*
* Caller must scan the last page to determine whether it has tuples
- * (caller must have the opportunity to set vacrel->nonempty_pages).
- * This rule avoids having lazy_truncate_heap() take access-exclusive
- * lock on rel to attempt a truncation that fails anyway, just because
- * there are tuples on the last page (it is likely that there will be
- * tuples on other nearby pages as well, but those can be skipped).
+ * (caller must have the opportunity to set
+ * vacrel->scan_stats->nonempty_pages). This rule avoids having
+ * lazy_truncate_heap() take access-exclusive lock on rel to attempt a
+ * truncation that fails anyway, just because there are tuples on the
+ * last page (it is likely that there will be tuples on other nearby
+ * pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
@@ -1449,10 +1722,10 @@ lazy_scan_prune(LVRelState *vacrel,
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
- &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+ &vacrel->scan_stats->NewRelfrozenXid, &vacrel->scan_stats->NewRelminMxid);
- Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
- Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+ Assert(MultiXactIdIsValid(vacrel->scan_stats->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->scan_stats->NewRelfrozenXid));
if (presult.nfrozen > 0)
{
@@ -1461,7 +1734,7 @@ lazy_scan_prune(LVRelState *vacrel,
* nfrozen == 0, since it only counts pages with newly frozen tuples
* (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->frozen_pages++;
+ vacrel->scan_stats->frozen_pages++;
}
/*
@@ -1496,7 +1769,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if (presult.lpdead_items > 0)
{
- vacrel->lpdead_item_pages++;
+ vacrel->scan_stats->lpdead_item_pages++;
/*
* deadoffsets are collected incrementally in
@@ -1511,15 +1784,15 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += presult.live_tuples;
- vacrel->recently_dead_tuples += presult.recently_dead_tuples;
+ vacrel->scan_stats->tuples_deleted += presult.ndeleted;
+ vacrel->scan_stats->tuples_frozen += presult.nfrozen;
+ vacrel->scan_stats->lpdead_items += presult.lpdead_items;
+ vacrel->scan_stats->live_tuples += presult.live_tuples;
+ vacrel->scan_stats->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_stats->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
@@ -1669,8 +1942,8 @@ lazy_scan_noprune(LVRelState *vacrel,
missed_dead_tuples;
bool hastup;
HeapTupleHeader tupleheader;
- TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ TransactionId NoFreezePageRelfrozenXid = vacrel->scan_stats->NewRelfrozenXid;
+ MultiXactId NoFreezePageRelminMxid = vacrel->scan_stats->NewRelminMxid;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1797,8 +2070,8 @@ lazy_scan_noprune(LVRelState *vacrel,
* this particular page until the next VACUUM. Remember its details now.
* (lazy_scan_prune expects a clean slate, so we have to do this last.)
*/
- vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = NoFreezePageRelminMxid;
+ vacrel->scan_stats->NewRelfrozenXid = NoFreezePageRelfrozenXid;
+ vacrel->scan_stats->NewRelminMxid = NoFreezePageRelminMxid;
/* Save any LP_DEAD items found on the page in dead_items */
if (vacrel->nindexes == 0)
@@ -1825,25 +2098,25 @@ lazy_scan_noprune(LVRelState *vacrel,
* indexes will be deleted during index vacuuming (and then marked
* LP_UNUSED in the heap)
*/
- vacrel->lpdead_item_pages++;
+ vacrel->scan_stats->lpdead_item_pages++;
dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
- vacrel->lpdead_items += lpdead_items;
+ vacrel->scan_stats->lpdead_items += lpdead_items;
}
/*
* Finally, add relevant page-local counts to whole-VACUUM counts
*/
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
- vacrel->missed_dead_tuples += missed_dead_tuples;
+ vacrel->scan_stats->live_tuples += live_tuples;
+ vacrel->scan_stats->recently_dead_tuples += recently_dead_tuples;
+ vacrel->scan_stats->missed_dead_tuples += missed_dead_tuples;
if (missed_dead_tuples > 0)
- vacrel->missed_dead_pages++;
+ vacrel->scan_stats->missed_dead_pages++;
/* Can't truncate this page */
if (hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_stats->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
@@ -1872,7 +2145,7 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(vacrel->lpdead_item_pages > 0);
+ Assert(vacrel->scan_stats->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
@@ -1906,7 +2179,7 @@ lazy_vacuum(LVRelState *vacrel)
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
- Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
+ Assert(vacrel->scan_stats->lpdead_items == vacrel->dead_items_info->num_items);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -1933,7 +2206,7 @@ lazy_vacuum(LVRelState *vacrel)
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
- bypass = (vacrel->lpdead_item_pages < threshold &&
+ bypass = (vacrel->scan_stats->lpdead_item_pages < threshold &&
(TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L)));
}
@@ -2026,7 +2299,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2071,7 +2344,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
* place).
*/
Assert(vacrel->num_index_scans > 0 ||
- vacrel->dead_items_info->num_items == vacrel->lpdead_items);
+ vacrel->dead_items_info->num_items == vacrel->scan_stats->lpdead_items);
Assert(allindexes || VacuumFailsafeActive);
/*
@@ -2180,8 +2453,8 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
- (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
- vacuumed_pages == vacrel->lpdead_item_pages));
+ (vacrel->dead_items_info->num_items == vacrel->scan_stats->lpdead_items &&
+ vacuumed_pages == vacrel->scan_stats->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
@@ -2334,7 +2607,7 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
vacrel->do_index_cleanup = false;
vacrel->do_rel_truncate = false;
- /* Reset the progress counters */
+ /* Reset the progress scan_stats */
pgstat_progress_update_multi_param(2, progress_index, progress_val);
ereport(WARNING,
@@ -2362,7 +2635,7 @@ static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
double reltuples = vacrel->new_rel_tuples;
- bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
+ bool estimated_count = vacrel->scan_stats->scanned_pages < vacrel->rel_pages;
const int progress_start_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_INDEXES_TOTAL
@@ -2385,7 +2658,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2409,7 +2682,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
estimated_count);
}
- /* Reset the progress counters */
+ /* Reset the progress scan_stats */
pgstat_progress_update_multi_param(2, progress_end_index, progress_end_val);
}
@@ -2543,7 +2816,7 @@ should_attempt_truncation(LVRelState *vacrel)
if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
return false;
- possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
+ possibly_freeable = vacrel->rel_pages - vacrel->scan_stats->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
@@ -2569,7 +2842,7 @@ lazy_truncate_heap(LVRelState *vacrel)
/* Update error traceback information one last time */
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
- vacrel->nonempty_pages, InvalidOffsetNumber);
+ vacrel->scan_stats->nonempty_pages, InvalidOffsetNumber);
/*
* Loop until no more truncating can be done.
@@ -2670,7 +2943,7 @@ lazy_truncate_heap(LVRelState *vacrel)
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
- vacrel->removed_pages += orig_rel_pages - new_rel_pages;
+ vacrel->scan_stats->removed_pages += orig_rel_pages - new_rel_pages;
vacrel->rel_pages = new_rel_pages;
ereport(vacrel->verbose ? INFO : DEBUG2,
@@ -2678,7 +2951,7 @@ lazy_truncate_heap(LVRelState *vacrel)
vacrel->relname,
orig_rel_pages, new_rel_pages)));
orig_rel_pages = new_rel_pages;
- } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
+ } while (new_rel_pages > vacrel->scan_stats->nonempty_pages && lock_waiter_detected);
}
/*
@@ -2706,7 +2979,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
- while (blkno > vacrel->nonempty_pages)
+ while (blkno > vacrel->scan_stats->nonempty_pages)
{
Buffer buf;
Page page;
@@ -2818,7 +3091,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
- return vacrel->nonempty_pages;
+ return vacrel->scan_stats->nonempty_pages;
}
/*
@@ -2836,12 +3109,8 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- /*
- * Initialize state for a parallel vacuum. As of now, only one worker can
- * be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
- */
- if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
+ /* Initialize state for a parallel vacuum */
+ if (nworkers >= 0)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -2859,11 +3128,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
+ {
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes. Note
+ * that only one worker can be used for an index, we invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
+ }
/*
* If parallel mode started, dead_items and dead_items_info spaces are
@@ -2904,9 +3182,19 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
};
int64 prog_val[2];
+ /*
+ * Protect both dead_items and dead_items_info from concurrent updates in
+ * parallel heap scan cases.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreLockExclusive(dead_items);
+
TidStoreSetBlockOffsets(dead_items, blkno, offsets, num_offsets);
vacrel->dead_items_info->num_items += num_offsets;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreUnlock(dead_items);
+
/* update the progress information */
prog_val[0] = vacrel->dead_items_info->num_items;
prog_val[1] = TidStoreMemoryUsage(dead_items);
@@ -3108,6 +3396,453 @@ update_relstats_all_indexes(LVRelState *vacrel)
}
}
+/*
+ * Compute the number of parallel workers for parallel vacuum heap scan.
+ *
+ * The calculation logic is borrowed from compute_parallel_worker().
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ int parallel_workers = 0;
+ int heap_parallel_threshold;
+ int heap_pages;
+
+ if (nrequested == 0)
+ {
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. This probably needs to be a good deal more
+ * sophisticated, but we need something here for now. Note that the
+ * upper limit of the min_parallel_table_scan_size GUC is chosen to
+ * prevent overflow here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
+ {
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
+ }
+ }
+ else
+ parallel_workers = nrequested;
+
+ return parallel_workers;
+}
+
+/* Estimate shared memory sizes required for parallel heap vacuum */
+static inline void
+heap_parallel_estimate_shared_memory_size(Relation rel, int nworkers, Size *pscan_len,
+ Size *shared_len, Size *pscanwork_len)
+{
+ Size size = 0;
+
+ size = add_size(size, SizeOfPHVShared);
+ size = add_size(size, mul_size(sizeof(LVRelScanStats), nworkers));
+ *shared_len = size;
+
+ *pscan_len = table_block_parallelscan_estimate(rel);
+
+ *pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+}
+
+/*
+ * Compute the amount of space we'll need in the parallel heap vacuum
+ * DSM, and inform pcxt->estimator about our needs.
+ *
+ * nworkers is the number of workers for the table vacuum. Note that it could
+ * be different than pcxt->nworkers since it is the maximum of number of
+ * workers for table vacuum and index vacuum.
+ */
+void
+heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ /* space for PHVShared */
+ shm_toc_estimate_chunk(&pcxt->estimator, shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ pscan_len = table_block_parallelscan_estimate(rel);
+ shm_toc_estimate_chunk(&pcxt->estimator, pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+ shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up shared memory for parallel heap vacuum.
+ */
+void
+heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ LVRelState *vacrel = (LVRelState *) state;
+ PHVState *phvstate = vacrel->phvstate;
+ ParallelBlockTableScanDesc pscan;
+ PHVScanWorkerState *pscanwork;
+ PHVShared *shared;
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ phvstate = (PHVState *) palloc(sizeof(PHVState));
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ shared = shm_toc_allocate(pcxt->toc, shared_len);
+
+ /* Prepare the shared information */
+
+ MemSet(shared, 0, shared_len);
+ shared->aggressive = vacrel->aggressive;
+ shared->skipwithvm = vacrel->skipwithvm;
+ shared->cutoffs = vacrel->cutoffs;
+ shared->NewRelfrozenXid = vacrel->scan_stats->NewRelfrozenXid;
+ shared->NewRelminMxid = vacrel->scan_stats->NewRelminMxid;
+ shared->skippedallvis = vacrel->scan_stats->skippedallvis;
+
+ /*
+ * XXX: we copy the contents of vistest to the shared area, but in order
+ * to do that, we need to either expose GlobalVisTest or to provide
+ * functions to copy contents of GlobalVisTest to somewhere. Currently we
+ * do the former but not sure it's the best choice.
+ *
+ * Alternative idea is to have each worker determine cutoff and have their
+ * own vistest. But we need to carefully consider it since parallel
+ * workers end up having different cutoff and horizon.
+ */
+ shared->vistest = *vacrel->vistest;
+
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_SHARED, shared);
+
+ phvstate->shared = shared;
+
+ /* prepare the parallel block table scan description */
+ pscan = shm_toc_allocate(pcxt->toc, pscan_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC, pscan);
+
+ /* initialize parallel scan description */
+ table_block_parallelscan_initialize(rel, (ParallelTableScanDesc) pscan);
+ phvstate->pscandesc = pscan;
+
+ /* prepare the workers' parallel block table scan state */
+ pscanwork = shm_toc_allocate(pcxt->toc, pscanwork_len);
+ MemSet(pscanwork, 0, pscanwork_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC_WORKER, pscanwork);
+ phvstate->scanstates = pscanwork;
+
+ vacrel->phvstate = phvstate;
+}
+
+/*
+ * Main function for parallel heap vacuum workers.
+ */
+void
+heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ LVRelState vacrel = {0};
+ PHVState *phvstate;
+ PHVShared *shared;
+ ParallelBlockTableScanDesc pscandesc;
+ PHVScanWorkerState *scanstate;
+ LVRelScanStats *scan_stats;
+ ErrorContextCallback errcallback;
+ bool scan_done;
+
+ phvstate = palloc(sizeof(PHVState));
+
+ pscandesc = (ParallelBlockTableScanDesc) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC,
+ false);
+ phvstate->pscandesc = pscandesc;
+
+ shared = (PHVShared *) shm_toc_lookup(pwcxt->toc, LV_PARALLEL_SCAN_SHARED,
+ false);
+ phvstate->shared = shared;
+
+ scanstate = (PHVScanWorkerState *) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC_WORKER,
+ false);
+
+ phvstate->myscanstate = &(scanstate[ParallelWorkerNumber]);
+ scan_stats = &(shared->worker_scan_stats[ParallelWorkerNumber]);
+
+ /* Prepare LVRelState */
+ vacrel.rel = rel;
+ vacrel.indrels = parallel_vacuum_get_table_indexes(pvs, &vacrel.nindexes);
+ vacrel.pvs = pvs;
+ vacrel.phvstate = phvstate;
+ vacrel.aggressive = shared->aggressive;
+ vacrel.skipwithvm = shared->skipwithvm;
+ vacrel.cutoffs = shared->cutoffs;
+ vacrel.vistest = &(shared->vistest);
+ vacrel.dead_items = parallel_vacuum_get_dead_items(pvs,
+ &vacrel.dead_items_info);
+ vacrel.rel_pages = RelationGetNumberOfBlocks(rel);
+ vacrel.scan_stats = scan_stats;
+
+ /* initialize per-worker relation statistics */
+ MemSet(scan_stats, 0, sizeof(LVRelScanStats));
+
+ /* Set fields necessary for heap scan */
+ vacrel.scan_stats->NewRelfrozenXid = shared->NewRelfrozenXid;
+ vacrel.scan_stats->NewRelminMxid = shared->NewRelminMxid;
+ vacrel.scan_stats->skippedallvis = shared->skippedallvis;
+
+ /* Initialize the per-worker scan state if not yet */
+ if (!phvstate->myscanstate->initialized)
+ {
+ table_block_parallelscan_startblock_init(rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
+
+ pg_atomic_init_u32(&(phvstate->myscanstate->cur_blkno), 0);
+ phvstate->myscanstate->maybe_have_blocks = false;
+ phvstate->myscanstate->initialized = true;
+ }
+
+ /*
+ * Setup error traceback support for ereport() for parallel table vacuum
+ * workers
+ */
+ vacrel.dbname = get_database_name(MyDatabaseId);
+ vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
+ vacrel.relname = pstrdup(RelationGetRelationName(rel));
+ vacrel.indname = NULL;
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = &vacrel;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in its
+ * scan state. Since this scan state might not be used in the next heap
+ * scan, we remember that it might have some unconsumed blocks so that the
+ * leader complete the scans after the heap scan phase finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+}
+
+/*
+ * Complete parallel heaps scans that have remaining blocks in their
+ * chunks.
+ */
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)
+{
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+
+ nworkers = parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
+ for (int i = 0; i < nworkers; i++)
+ {
+ PHVScanWorkerState *wstate = &(vacrel->phvstate->scanstates[i]);
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ if (!wstate->maybe_have_blocks)
+
+ continue;
+
+ /* Attache the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ Assert(scan_done);
+ }
+
+ /*
+ * We don't need to gather the scan statistics here because statistics
+ * have already been accumulated the leaders statistics directly.
+ */
+}
+
+/*
+ * Compute the minimum block number we have scanned so far and update
+ * vacrel->min_blkno.
+ */
+static void
+parallel_heap_vacuum_compute_min_blkno(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+
+ /*
+ * We check all worker scan states here to compute the minimum block
+ * number among all scan states.
+ */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ PHVScanWorkerState *wstate = &(phvstate->scanstates[i]);
+ BlockNumber blkno;
+
+ /* Skip if no worker has been initialized the scan state */
+ if (!wstate->initialized)
+ continue;
+
+ blkno = pg_atomic_read_u32(&(wstate->cur_blkno));
+ if (blkno < phvstate->min_blkno)
+ phvstate->min_blkno = blkno;
+ }
+}
+
+/*
+ * Accumulate relation scan_stats that parallel workers collected into the
+ * leader's counters.
+ */
+static void
+parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* Gather the scan statistics that workers collected */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanStats *ss = &(phvstate->shared->worker_scan_stats[i]);
+
+ vacrel->scan_stats->scanned_pages += ss->scanned_pages;
+ vacrel->scan_stats->removed_pages += ss->removed_pages;
+ vacrel->scan_stats->frozen_pages += ss->frozen_pages;
+ vacrel->scan_stats->lpdead_item_pages += ss->lpdead_item_pages;
+ vacrel->scan_stats->missed_dead_pages += ss->missed_dead_pages;
+ vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
+ vacrel->scan_stats->tuples_deleted += ss->tuples_deleted;
+ vacrel->scan_stats->tuples_frozen += ss->tuples_frozen;
+ vacrel->scan_stats->lpdead_items += ss->lpdead_items;
+ vacrel->scan_stats->live_tuples += ss->live_tuples;
+ vacrel->scan_stats->recently_dead_tuples += ss->recently_dead_tuples;
+ vacrel->scan_stats->missed_dead_tuples += ss->missed_dead_tuples;
+
+ if (ss->nonempty_pages < vacrel->scan_stats->nonempty_pages)
+ vacrel->scan_stats->nonempty_pages = ss->nonempty_pages;
+
+ if (TransactionIdPrecedes(ss->NewRelfrozenXid, vacrel->scan_stats->NewRelfrozenXid))
+ vacrel->scan_stats->NewRelfrozenXid = ss->NewRelfrozenXid;
+
+ if (MultiXactIdPrecedesOrEquals(ss->NewRelminMxid, vacrel->scan_stats->NewRelminMxid))
+ vacrel->scan_stats->NewRelminMxid = ss->NewRelminMxid;
+
+ if (!vacrel->scan_stats->skippedallvis && ss->skippedallvis)
+ vacrel->scan_stats->skippedallvis = true;
+ }
+
+ /* Also, compute the minimum block number we scanned so far */
+ parallel_heap_vacuum_compute_min_blkno(vacrel);
+}
+
+/*
+ * A parallel variant of do_lazy_scan_heap(). The leader process launches parallel
+ * workers to scan the heap in parallel.
+ */
+static void
+do_parallel_lazy_scan_heap(LVRelState *vacrel)
+{
+ PHVScanWorkerState *scanstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* launcher workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ /* initialize parallel scan description to join as a worker */
+ scanstate = palloc(sizeof(PHVScanWorkerState));
+ table_block_parallelscan_startblock_init(vacrel->rel, &(scanstate->state),
+ vacrel->phvstate->pscandesc);
+ vacrel->phvstate->myscanstate = scanstate;
+
+ for (;;)
+ {
+ bool scan_done;
+
+ /*
+ * Scan the table until either we are close to overrunning the
+ * available space for dead_items TIDs or we reach the end of the
+ * table.
+ */
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* stop parallel workers and gather the collected stats */
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+ parallel_heap_vacuum_gather_scan_stats(vacrel);
+
+ /*
+ * If the heap scan paused in the middle of the table due to full of
+ * dead_items TIDs, perform a round of index and heap vacuuming.
+ */
+ if (!scan_done)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_blkno should have already been updated when gathering
+ * statistics
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ break;
+ }
+
+ /*
+ * The parallel heap vacuum finished, but it's possible that some workers
+ * have allocated blocks but not processed yet. This can happen for
+ * example when workers exit because of full of dead_items TIDs and the
+ * leader process could launch fewer workers in the next cycle.
+ */
+ parallel_heap_complete_unfinised_scan(vacrel);
+}
+
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 4fd6574e12..1101e799f9 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -6,15 +6,24 @@
* This file contains routines that are intended to support setting up, using,
* and tearing down a ParallelVacuumState.
*
- * In a parallel vacuum, we perform both index bulk deletion and index cleanup
- * with parallel worker processes. Individual indexes are processed by one
- * vacuum process. ParallelVacuumState contains shared information as well as
- * the memory space for storing dead items allocated in the DSA area. We
- * launch parallel worker processes at the start of parallel index
- * bulk-deletion and index cleanup and once all indexes are processed, the
- * parallel worker processes exit. Each time we process indexes in parallel,
- * the parallel context is re-initialized so that the same DSM can be used for
- * multiple passes of index bulk-deletion and index cleanup.
+ * In a parallel vacuum, we perform table scan or both index bulk deletion and
+ * index cleanup or all of them with parallel worker processes. Different
+ * numbers of workers are launched for the table vacuuming and index processing.
+ * ParallelVacuumState contains shared information as well as the memory space
+ * for storing dead items allocated in the DSA area.
+ *
+ * When initializing parallel table vacuum scan, we invoke table AM routines for
+ * estimating DSM sizes and initializing DSM memory. Parallel table vacuum
+ * workers invoke the table AM routine for vacuuming the table.
+ *
+ * For processing indexes in parallel, individual indexes are processed by one
+ * vacuum process. We launch parallel worker processes at the start of parallel index
+ * bulk-deletion and index cleanup and once all indexes are processed, the parallel
+ * worker processes exit.
+ *
+ * Each time we process table or indexes in parallel, the parallel context is
+ * re-initialized so that the same DSM can be used for multiple passes of table vacuum
+ * or index bulk-deletion and index cleanup.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -28,6 +37,7 @@
#include "access/amapi.h"
#include "access/table.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
@@ -65,6 +75,12 @@ typedef struct PVShared
int elevel;
uint64 queryid;
+ /*
+ * True if the caller wants parallel workers to invoke vacuum table scan
+ * callback.
+ */
+ bool do_vacuum_table_scan;
+
/*
* Fields for both index vacuum and cleanup.
*
@@ -164,6 +180,9 @@ struct ParallelVacuumState
/* NULL for worker processes */
ParallelContext *pcxt;
+ /* Passed to parallel table scan workers. NULL for leader process */
+ ParallelWorkerContext *pwcxt;
+
/* Parent Heap Relation */
Relation heaprel;
@@ -193,6 +212,16 @@ struct ParallelVacuumState
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /*
+ * The number of workers for parallel table scan/vacuuming and index
+ * vacuuming, respectively.
+ */
+ int nworkers_for_table;
+ int nworkers_for_index;
+
+ /* How many times parallel table vacuum scan is called? */
+ int num_table_scans;
+
/*
* False if the index is totally unsuitable target for all parallel
* processing. For example, the index could be <
@@ -221,8 +250,9 @@ struct ParallelVacuumState
PVIndVacStatus status;
};
-static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum);
+static void parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum);
static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
@@ -242,7 +272,7 @@ static void parallel_vacuum_error_callback(void *arg);
ParallelVacuumState *
parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
int nrequested_workers, int vac_work_mem,
- int elevel, BufferAccessStrategy bstrategy)
+ int elevel, BufferAccessStrategy bstrategy, void *state)
{
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -256,6 +286,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
Size est_shared_len;
int nindexes_mwm = 0;
int parallel_workers = 0;
+ int nworkers_table;
+ int nworkers_index;
int querylen;
/*
@@ -263,15 +295,17 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
* relation
*/
Assert(nrequested_workers >= 0);
- Assert(nindexes > 0);
/*
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
- nrequested_workers,
- will_parallel_vacuum);
+ parallel_vacuum_compute_workers(rel, indrels, nindexes, nrequested_workers,
+ &nworkers_table, &nworkers_index,
+ will_parallel_vacuum);
+
+ parallel_workers = Max(nworkers_table, nworkers_index);
+
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- return NULL */
@@ -285,6 +319,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
pvs->will_parallel_vacuum = will_parallel_vacuum;
pvs->bstrategy = bstrategy;
pvs->heaprel = rel;
+ pvs->nworkers_for_table = nworkers_table;
+ pvs->nworkers_for_index = nworkers_index;
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
@@ -327,6 +363,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
else
querylen = 0; /* keep compiler quiet */
+ /* Estimate AM-specific space for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_estimate(rel, pcxt, nworkers_table, state);
+
InitializeParallelDSM(pcxt);
/* Prepare index vacuum stats */
@@ -419,6 +459,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+ /* Prepare AM-specific DSM for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_initialize(rel, pcxt, nworkers_table, state);
+
/* Success -- return parallel vacuum state */
return pvs;
}
@@ -534,33 +578,47 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
}
/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers.
- * The index is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
+ * Compute the number of parallel worker processes to request for table
+ * vacuum and index vacuum/cleanup.
+ *
+ * For parallel table vacuum, it asks AM-specific routine to compute the
+ * number of parallel worker processes. The result is set to *nworkers_table.
*
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
+ * For parallel index vacuum, The index is eligible for parallel vacuum iff
+ * its size is greater than min_parallel_index_scan_size as invoking workers
+ * for very small indexes can hurt performance. nrequested is the number of
+ * parallel workers that user requested. If nrequested is 0, we compute the
+ * parallel degree based on nindexes, that is the number of indexes that
+ * support parallel vacuum. This function also sets will_parallel_vacuum to
+ * remember indexes that participate in parallel vacuum.
*/
-static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum)
+static void
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
+
+ *nworkers_table = 0;
+ *nworkers_index = 0;
/*
* We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
+ return;
+
+ /*
+ * Compute the number of workers for parallel table scan. Cap by
+ * max_parallel_maintenance_workers.
+ */
+ parallel_workers_table = Min(table_paralle_vacuum_compute_workers(rel, nrequested),
+ max_parallel_maintenance_workers);
/*
* Compute the number of indexes that can participate in parallel vacuum.
@@ -591,17 +649,18 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
nindexes_parallel--;
/* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Compute the parallel degree for parallel index vacuum */
+ parallel_workers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers_index = Min(parallel_workers_index, max_parallel_maintenance_workers);
+ }
- return parallel_workers;
+ *nworkers_table = parallel_workers_table;
+ *nworkers_index = parallel_workers_index;
}
/*
@@ -671,7 +730,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -980,6 +1039,139 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
return true;
}
+/*
+ * Prepare DSM and shared vacuum delays, and launch parallel workers for parallel
+ * table vacuum. Return the number of parallel workers launched.
+ *
+ * The caller must call parallel_vacuum_table_scan_end() to finish the parallel
+ * table vacuum.
+ */
+int
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->nworkers_for_table == 0)
+ return 0;
+
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ pvs->shared->do_vacuum_table_scan = true;
+
+ if (pvs->num_table_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * The number of workers might vary between table vacuum and index
+ * processing
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, pvs->nworkers_for_table);
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have already
+ * accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+
+ /* Include the worker count for the leader itself */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+ }
+
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for table processing (planned: %d)",
+ "launched %d parallel vacuum workers for table processing (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, pvs->nworkers_for_table)));
+
+ return pvs->pcxt->nworkers_launched;
+}
+
+/*
+ * Wait for all workers for parallel table vacuum scan, and gather statistics.
+ */
+void
+parallel_vacuum_table_scan_end(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->nworkers_for_table == 0)
+ return;
+
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ /* Decrement the worker count for the leader itself */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->shared->do_vacuum_table_scan = false;
+ pvs->num_table_scans++;
+}
+
+/* Return the array of indexes associated to the given table to be vacuumed */
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
+{
+ *nindexes = pvs->nindexes;
+
+ return pvs->indrels;
+}
+
+/* Return the number of workers for parallel table vacuum */
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->nworkers_for_table;
+}
+
+/* Return the number of workers for parallel index processing */
+int
+parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs)
+{
+ return pvs->nworkers_for_index;
+}
+
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ Assert(VacuumActiveNWorkers);
+
+ /* Increment the active worker before starting the table vacuum */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Do table vacuum scan */
+ table_parallel_vacuum_scan(pvs->heaprel, pvs, pvs->pwcxt);
+
+ /*
+ * We have completed the table vacuum so decrement the active worker
+ * count.
+ */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
/*
* Perform work within a launched parallel process.
*
@@ -999,7 +1191,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
WalUsage *wal_usage;
int nindexes;
char *sharedquery;
- ErrorContextCallback errcallback;
/*
* A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
@@ -1031,7 +1222,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
* matched to the leader's one.
*/
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
if (shared->maintenance_work_mem_worker > 0)
maintenance_work_mem = shared->maintenance_work_mem_worker;
@@ -1062,6 +1252,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.relname = pstrdup(RelationGetRelationName(rel));
pvs.heaprel = rel;
+ pvs.pwcxt = palloc(sizeof(ParallelWorkerContext));
+ pvs.pwcxt->toc = toc;
+ pvs.pwcxt->seg = seg;
+
/* These fields will be filled during index vacuum or cleanup */
pvs.indname = NULL;
pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
@@ -1070,17 +1264,29 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.bstrategy = GetAccessStrategyWithSize(BAS_VACUUM,
shared->ring_nbuffers * (BLCKSZ / 1024));
- /* Setup error traceback support for ereport() */
- errcallback.callback = parallel_vacuum_error_callback;
- errcallback.arg = &pvs;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
/* Prepare to track buffer usage during parallel execution */
InstrStartParallelQuery();
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ ErrorContextCallback errcallback;
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+ }
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
@@ -1090,9 +1296,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
TidStoreDetach(dead_items);
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
vac_close_indexes(nindexes, indrels, RowExclusiveLock);
table_close(rel, ShareUpdateExclusiveLock);
FreeAccessStrategy(pvs.bstrategy);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 36610a1c7e..5b2b08a844 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -164,15 +164,6 @@ typedef struct ProcArrayStruct
*
* The typedef is in the header.
*/
-struct GlobalVisState
-{
- /* XIDs >= are considered running by some backend */
- FullTransactionId definitely_needed;
-
- /* XIDs < are not considered to be running by any backend */
- FullTransactionId maybe_needed;
-};
-
/*
* Result of ComputeXidHorizons().
*/
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b951466ced..e81513c2db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -21,6 +21,7 @@
#include "access/skey.h"
#include "access/table.h" /* for backward compatibility */
#include "access/tableam.h"
+#include "commands/vacuum.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -400,6 +401,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
+extern int heap_parallel_vacuum_compute_workers(Relation rel, int requested);
+extern void heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da661289c1..fc48f74828 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
#include "access/relscan.h"
#include "access/sdir.h"
#include "access/xact.h"
+#include "commands/vacuum.h"
#include "executor/tuptable.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
@@ -655,6 +656,46 @@ typedef struct TableAmRoutine
struct VacuumParams *params,
BufferAccessStrategy bstrategy);
+ /* ------------------------------------------------------------------------
+ * Callbacks for parallel table vacuum.
+ * ------------------------------------------------------------------------
+ */
+
+ /*
+ * Compute the number of parallel workers for parallel table vacuum. The
+ * function must return 0 to disable parallel table vacuum.
+ */
+ int (*parallel_vacuum_compute_workers) (Relation rel, int requested);
+
+ /*
+ * Compute the amount of DSM space AM need in the parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_estimate) (Relation rel,
+ ParallelContext *pcxt,
+ int nworkers,
+ void *state);
+
+ /*
+ * Initialize DSM space for parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_initialize) (Relation rel,
+ ParallelContext *pctx,
+ int nworkers,
+ void *state);
+
+ /*
+ * This callback is called for parallel table vacuum workers.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_scan_worker) (Relation rel,
+ ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
+
/*
* Prepare to analyze block `blockno` of `scan`. The scan has been started
* with table_beginscan_analyze(). See also
@@ -1710,6 +1751,52 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
}
+/* ----------------------------------------------------------------------------
+ * Parallel vacuum related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the number of parallel workers for a parallel vacuum scan of this
+ * relation.
+ */
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
+
+/*
+ * Estimate the size of shared memory needed for a parallel vacuum scan of this
+ * of this relation.
+ */
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Initialize shared memory area for a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Start a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_scan(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_scan_worker(rel, pvs, pwcxt);
+}
+
/*
* Prepare to analyze the next block in the read stream. The scan needs to
* have been started with table_beginscan_analyze(). Note that this routine
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 759f9a87d3..a225f31429 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -360,7 +360,8 @@ extern void VacuumUpdateCosts(void);
extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
int nindexes, int nrequested_workers,
int vac_work_mem, int elevel,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy,
+ void *state);
extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
@@ -372,6 +373,11 @@ extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
int num_index_scans,
bool estimated_count);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs);
+extern Relation *parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 9398a84051..6ccb19a29f 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -102,8 +102,20 @@ extern char *ExportSnapshot(Snapshot snapshot);
/*
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
+ *
+ * XXX the struct definition is temporarily moved from procarray.c for
+ * parallel table vacuum development. We need to find a suitable way for
+ * parallel table vacuum workers to share the GlobalVisState.
*/
-typedef struct GlobalVisState GlobalVisState;
+typedef struct GlobalVisState
+{
+ /* XIDs >= are considered running by some backend */
+ FullTransactionId definitely_needed;
+
+ /* XIDs < are not considered to be running by any backend */
+ FullTransactionId maybe_needed;
+} GlobalVisState;
+
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
--
2.43.5
v3-0002-raidxtree.h-support-shared-iteration.patchapplication/octet-stream; name=v3-0002-raidxtree.h-support-shared-iteration.patchDownload
From b8254de5f092f9b51c0a2537858813c59adc560f Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:29:51 -0700
Subject: [PATCH v3 2/4] raidxtree.h: support shared iteration.
This commit supports a shared iteration operation on a radix tree with
multiple processes. The radix tree must be in shared mode to start a
shared itereation. Parallel workers can attach the shared iteration
using the iterator handle given by the leader process. Same as the
normal interation, it's guarnteed that the shared iteration returns
key-values in an ascending order.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
---
src/include/lib/radixtree.h | 221 +++++++++++++++++++++++++++++++-----
1 file changed, 190 insertions(+), 31 deletions(-)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 88bf695e3f..b93553200d 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -177,6 +177,9 @@
#define RT_ATTACH RT_MAKE_NAME(attach)
#define RT_DETACH RT_MAKE_NAME(detach)
#define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
+#define RT_BEGIN_ITERATE_SHARED RT_MAKE_NAME(begin_iterate_shared)
+#define RT_ATTACH_ITERATE_SHARED RT_MAKE_NAME(attach_iterate_shared)
+#define RT_GET_ITER_HANDLE RT_MAKE_NAME(get_iter_handle)
#define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
#define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
#define RT_UNLOCK RT_MAKE_NAME(unlock)
@@ -236,15 +239,19 @@
#define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
#define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
#define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
+#define RT_INITIALIZE_ITER RT_MAKE_NAME(initialize_iter)
#define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
#define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
/* type declarations */
#define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
#define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
+#define RT_ITER_CONTROL RT_MAKE_NAME(iter_control)
#define RT_ITER RT_MAKE_NAME(iter)
#ifdef RT_SHMEM
#define RT_HANDLE RT_MAKE_NAME(handle)
+#define RT_ITER_CONTROL_SHARED RT_MAKE_NAME(iter_control_shared)
+#define RT_ITER_HANDLE RT_MAKE_NAME(iter_handle)
#endif
#define RT_NODE RT_MAKE_NAME(node)
#define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
@@ -270,6 +277,7 @@ typedef struct RT_ITER RT_ITER;
#ifdef RT_SHMEM
typedef dsa_pointer RT_HANDLE;
+typedef dsa_pointer RT_ITER_HANDLE;
#endif
#ifdef RT_SHMEM
@@ -687,6 +695,7 @@ typedef struct RT_RADIX_TREE_CONTROL
RT_HANDLE handle;
uint32 magic;
LWLock lock;
+ int tranche_id;
#endif
RT_PTR_ALLOC root;
@@ -740,11 +749,9 @@ typedef struct RT_NODE_ITER
int idx;
} RT_NODE_ITER;
-/* state for iterating over the whole radix tree */
-struct RT_ITER
+/* Contain the iteration state data */
+typedef struct RT_ITER_CONTROL
{
- RT_RADIX_TREE *tree;
-
/*
* A stack to track iteration for each level. Level 0 is the lowest (or
* leaf) level
@@ -755,8 +762,36 @@ struct RT_ITER
/* The key constructed during iteration */
uint64 key;
-};
+} RT_ITER_CONTROL;
+
+#ifdef RT_SHMEM
+/* Contain the shared iteration state data */
+typedef struct RT_ITER_CONTROL_SHARED
+{
+ /* Actual shared iteration state data */
+ RT_ITER_CONTROL common;
+
+ /* protect the control data */
+ LWLock lock;
+
+ RT_ITER_HANDLE handle;
+ pg_atomic_uint32 refcnt;
+} RT_ITER_CONTROL_SHARED;
+#endif
+
+/* state for iterating over the whole radix tree */
+struct RT_ITER
+{
+ RT_RADIX_TREE *tree;
+ /* pointing to either local memory or DSA */
+ RT_ITER_CONTROL *ctl;
+
+#ifdef RT_SHMEM
+ /* True if the iterator is for shared iteration */
+ bool shared;
+#endif
+};
/* verification (available only in assert-enabled builds) */
static void RT_VERIFY_NODE(RT_NODE * node);
@@ -1848,6 +1883,7 @@ RT_CREATE(MemoryContext ctx)
tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
tree->ctl->handle = dp;
tree->ctl->magic = RT_RADIX_TREE_MAGIC;
+ tree->ctl->tranche_id = tranche_id;
LWLockInitialize(&tree->ctl->lock, tranche_id);
#else
tree->ctl = (RT_RADIX_TREE_CONTROL *) palloc0(sizeof(RT_RADIX_TREE_CONTROL));
@@ -1900,6 +1936,9 @@ RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
dsa_pointer control;
tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE));
+ tree->iter_context = AllocSetContextCreate(CurrentMemoryContext,
+ RT_STR(RT_PREFIX) "_radix_tree iter context",
+ ALLOCSET_SMALL_SIZES);
/* Find the control object in shared memory */
control = handle;
@@ -2072,35 +2111,86 @@ RT_FREE(RT_RADIX_TREE * tree)
/***************** ITERATION *****************/
+/* Common routine to initialize the given iterator */
+static void
+RT_INITIALIZE_ITER(RT_RADIX_TREE * tree, RT_ITER * iter)
+{
+ RT_CHILD_PTR root;
+
+ iter->tree = tree;
+
+ Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
+ root.alloc = iter->tree->ctl->root;
+ RT_PTR_SET_LOCAL(tree, &root);
+
+ iter->ctl->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+
+ /* Set the root to start */
+ iter->ctl->cur_level = iter->ctl->top_level;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = root;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
+}
+
/*
* Create and return the iterator for the given radix tree.
*
- * Taking a lock in shared mode during the iteration is the caller's
- * responsibility.
+ * Taking a lock on a radix tree in shared mode during the iteration is the
+ * caller's responsibility.
*/
RT_SCOPE RT_ITER *
RT_BEGIN_ITERATE(RT_RADIX_TREE * tree)
{
RT_ITER *iter;
- RT_CHILD_PTR root;
iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
sizeof(RT_ITER));
- iter->tree = tree;
+ iter->ctl = (RT_ITER_CONTROL *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER_CONTROL));
- Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
- root.alloc = iter->tree->ctl->root;
- RT_PTR_SET_LOCAL(tree, &root);
+ RT_INITIALIZE_ITER(tree, iter);
- iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+#ifdef RT_SHMEM
+ /* we will non-shared iteration on a shared radix tree */
+ iter->shared = false;
+#endif
- /* Set the root to start */
- iter->cur_level = iter->top_level;
- iter->node_iters[iter->cur_level].node = root;
- iter->node_iters[iter->cur_level].idx = 0;
+ return iter;
+}
+
+#ifdef RT_SHMEM
+/*
+ * Create and return the shared iterator for the given shard radix tree.
+ *
+ * Taking a lock on a radix tree in shared mode during the shared iteration to
+ * prevent concurrent writes is the caller's responsibility.
+ */
+RT_SCOPE RT_ITER *
+RT_BEGIN_ITERATE_SHARED(RT_RADIX_TREE * tree)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl_shared;
+ dsa_pointer dp;
+
+ /* The radix tree must be in shared mode */
+ Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
+
+ dp = dsa_allocate(tree->dsa, sizeof(RT_ITER_CONTROL_SHARED));
+ ctl_shared = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, dp);
+ ctl_shared->handle = dp;
+ LWLockInitialize(&ctl_shared->lock, tree->ctl->tranche_id);
+ pg_atomic_init_u32(&ctl_shared->refcnt, 1);
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+
+ iter->ctl = (RT_ITER_CONTROL *) ctl_shared;
+ iter->shared = true;
+
+ RT_INITIALIZE_ITER(tree, iter);
return iter;
}
+#endif
/*
* Scan the inner node and return the next child pointer if one exists, otherwise
@@ -2114,12 +2204,18 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
RT_CHILD_PTR node;
RT_PTR_ALLOC *slot = NULL;
+ node_iter = &(iter->ctl->node_iters[level]);
+ node = node_iter->node;
+
#ifdef RT_SHMEM
- Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
-#endif
- node_iter = &(iter->node_iters[level]);
- node = node_iter->node;
+ /*
+ * Since the iterator is shared, the local pointer of the node might be
+ * set by other backends, we need to make sure to use the local pointer.
+ */
+ if (iter->shared)
+ RT_PTR_SET_LOCAL(iter->tree, &node);
+#endif
Assert(node.local != NULL);
@@ -2192,8 +2288,8 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
}
/* Update the key */
- iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
- iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
+ iter->ctl->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
+ iter->ctl->key |= (((uint64) key_chunk) << (level * RT_SPAN));
return slot;
}
@@ -2207,18 +2303,29 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
{
RT_PTR_ALLOC *slot = NULL;
- while (iter->cur_level <= iter->top_level)
+#ifdef RT_SHMEM
+ /* Prevent the shared iterator from being updated concurrently */
+ if (iter->shared)
+ LWLockAcquire(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock, LW_EXCLUSIVE);
+#endif
+
+ while (iter->ctl->cur_level <= iter->ctl->top_level)
{
RT_CHILD_PTR node;
- slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
+ slot = RT_NODE_ITERATE_NEXT(iter, iter->ctl->cur_level);
- if (iter->cur_level == 0 && slot != NULL)
+ if (iter->ctl->cur_level == 0 && slot != NULL)
{
/* Found a value at the leaf node */
- *key_p = iter->key;
+ *key_p = iter->ctl->key;
node.alloc = *slot;
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
if (RT_CHILDPTR_IS_VALUE(*slot))
return (RT_VALUE_TYPE *) slot;
else
@@ -2234,17 +2341,23 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
node.alloc = *slot;
RT_PTR_SET_LOCAL(iter->tree, &node);
- iter->cur_level--;
- iter->node_iters[iter->cur_level].node = node;
- iter->node_iters[iter->cur_level].idx = 0;
+ iter->ctl->cur_level--;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = node;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
}
else
{
/* Not found the child slot, move up the tree */
- iter->cur_level++;
+ iter->ctl->cur_level++;
}
+
}
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
/* We've visited all nodes, so the iteration finished */
return NULL;
}
@@ -2255,9 +2368,45 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
RT_SCOPE void
RT_END_ITERATE(RT_ITER * iter)
{
+#ifdef RT_SHMEM
+ RT_ITER_CONTROL_SHARED *ctl = (RT_ITER_CONTROL_SHARED *) iter->ctl;;
+
+ if (iter->shared &&
+ pg_atomic_sub_fetch_u32(&ctl->refcnt, 1) == 0)
+ dsa_free(iter->tree->dsa, ctl->handle);
+#endif
pfree(iter);
}
+#ifdef RT_SHMEM
+RT_SCOPE RT_ITER_HANDLE
+RT_GET_ITER_HANDLE(RT_ITER * iter)
+{
+ Assert(iter->shared);
+ return ((RT_ITER_CONTROL_SHARED *) iter->ctl)->handle;
+
+}
+
+RT_SCOPE RT_ITER *
+RT_ATTACH_ITERATE_SHARED(RT_RADIX_TREE * tree, RT_ITER_HANDLE handle)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl;
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+ iter->tree = tree;
+ ctl = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, handle);
+ iter->ctl = (RT_ITER_CONTROL *) ctl;
+ iter->shared = true;
+
+ /* For every iterator, increase the refcnt by 1 */
+ pg_atomic_add_fetch_u32(&ctl->refcnt, 1);
+
+ return iter;
+}
+#endif
+
/***************** DELETION *****************/
#ifdef RT_USE_DELETE
@@ -2957,7 +3106,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_PTR_ALLOC
#undef RT_INVALID_PTR_ALLOC
#undef RT_HANDLE
+#undef RT_ITER_HANDLE
+#undef RT_ITER_CONTROL
+#undef RT_ITER_HANDLE
#undef RT_ITER
+#undef RT_SHARED_ITER
#undef RT_NODE
#undef RT_NODE_ITER
#undef RT_NODE_KIND_4
@@ -2994,6 +3147,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_LOCK_SHARE
#undef RT_UNLOCK
#undef RT_GET_HANDLE
+#undef RT_BEGIN_ITERATE_SHARED
+#undef RT_ATTACH_ITERATE_SHARED
+#undef RT_GET_ITER_HANDLE
+#undef RT_ATTACH_ITER
+#undef RT_GET_ITER_HANDLE
#undef RT_FIND
#undef RT_SET
#undef RT_BEGIN_ITERATE
@@ -3050,5 +3208,6 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_SHRINK_NODE_256
#undef RT_NODE_DELETE
#undef RT_NODE_INSERT
+#undef RT_INITIALIZE_ITER
#undef RT_NODE_ITERATE_NEXT
#undef RT_VERIFY_NODE
--
2.43.5
On Fri, Oct 25, 2024 at 12:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Oct 22, 2024 at 4:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Sorry for the very late reply.
On Tue, Jul 30, 2024 at 8:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:Dear Sawada-san,
Thank you for testing!
I tried to profile the vacuuming with the larger case (40 workers for the 20G table)
and attached FlameGraph showed the result. IIUC, I cannot find bottlenecks.2.
I compared parallel heap scan and found that it does not have compute_workerAPI.
Can you clarify the reason why there is an inconsistency?
(I feel it is intentional because the calculation logic seems to depend on theheap structure,
so should we add the API for table scan as well?)
There is room to consider a better API design, but yes, the reason is
that the calculation logic depends on table AM implementation. For
example, I thought it might make sense to consider taking the number
of all-visible pages into account for the calculation of the number of
parallel workers as we don't want to launch many workers on the table
where most pages are all-visible. Which might not work for other table
AMs.Okay, thanks for confirming. I wanted to ask others as well.
I'm updating the patch to implement parallel heap vacuum and will
share the updated patch. It might take time as it requires to
implement shared iteration support in radx tree.Here are other preliminary comments for v2 patch. This does not contain
cosmetic ones.1.
Shared data structure PHVShared does not contain the mutex lock. Is it intentional
because they are accessed by leader only after parallel workers exit?Yes, the fields in PHVShared are read-only for workers. Since no
concurrent reads/writes happen on these fields we don't need to
protect them.2.
Per my understanding, the vacuuming goes like below steps.a. paralell workers are launched for scanning pages
b. leader waits until scans are done
c. leader does vacuum alone (you may extend here...)
d. parallel workers are launched again to cleanup indecesIf so, can we reuse parallel workers for the cleanup? Or, this is painful
engineering than the benefit?I've not thought of this idea but I think it's possible from a
technical perspective. It saves overheads of relaunching workers but
I'm not sure how much it would help for a better performance and I'm
concerned it would make the code complex. For example, different
numbers of workers might be required for table vacuuming and index
vacuuming. So we would end up increasing or decreasing workers.3.
According to LaunchParallelWorkers(), the bgw_name and bgw_type are hardcoded as
"parallel worker ..." Can we extend this to improve the trackability in the
pg_stat_activity?It would be a good improvement for better trackability. But I think we
should do that in a separate patch as it's not just a problem for
parallel heap vacuum.4.
I'm not the expert TidStore, but as you said TidStoreLockExclusive() might be a
bottleneck when tid is added to the shared TidStore. My another primitive idea
is that to prepare per-worker TidStores (in the PHVScanWorkerState or LVRelCounters?)
and gather after the heap scanning. If you extend like parallel workers do vacuuming,
the gathering may not be needed: they can access own TidStore and clean up.
One downside is that the memory consumption may be quite large.Interesting idea. Suppose we support parallel heap vacuum as well, we
wouldn't need locks and to support shared-iteration on TidStore. I
think each worker should use a fraction of maintenance_work_mem.
However, one downside would be that we need to check as many TidStore
as workers during index vacuuming.On further thoughts, I think this idea doesn't go well. The index
vacuuming is the most time-consuming phase among vacuum phases, I
think it would not be a good idea to make it slower even if we can do
parallel heap scan and heap vacuum without any locking. Also, merging
multiple TidStore to one is not straightforward since the block ranges
that each worker processes are overlapped.FYI I've implemented the parallel heap vacuum part and am doing some
benchmark tests. I'll share the updated patches along with test
results this week.Please find the attached patches. From the previous version, I made a
lot of changes including bug fixes, addressing review comments, and
adding parallel heap vacuum support. Parallel vacuum related
infrastructure are implemented in vacuumparallel.c, and lazyvacuum.c
now uses ParallelVacuumState for parallel heap scan/vacuum, index
bulkdelete/cleanup, or both. Parallel vacuum workers launch at the
beginning of each phase, and exit at the end of each phase. Since
different numbers of workers could be used for heap scan/vacuum and
index bulkdelete/cleanup, it's possible that only either heap
scan/vacuum or index bulkdelete/cleanup is parallelized.In order to implement parallel heap vacuum, I extended radix tree and
tidstore to support shared iteration. The shared iteration works only
with a shared tidstore but a non-shared iteration works with a local
tidstore as well as a shared tidstore. For example, if a table is
large and has one index, we use only parallel heap scan/vacuum. In
this case, we store dead item TIDs into a shared tidstore during
parallel heap scan, but during parallel index bulk-deletion we perform
a non-shared iteration on the shared tidstore, which is more efficient
as it doesn't acquire any locks during the iteration.I've done benchmark tests with a 10GB unlogged table (created on a
tmpfs tablespace) having 4 btree indexes while changing parallel
degrees. I restarted postgres server before each run to ensure that
data is not on the shared memory. I avoided disk I/O during lazy
vacuum as much as possible. Here is comparison between HEAD and
patched (took median of 5 runs):+----------+-----------+-----------+-------------+ | parallel | HEAD | patched | improvement | +----------+-----------+-----------+-------------+ | 0 | 53079.53 | 53468.734 | 1.007 | | 1 | 48101.46 | 35712.613 | 0.742 | | 2 | 37767.902 | 23566.426 | 0.624 | | 4 | 38005.836 | 20192.055 | 0.531 | | 8 | 37754.47 | 18614.717 | 0.493 | +----------+-----------+-----------+-------------+Here is the breakdowns of execution times of each vacuum phase (from
left, heap scan, index bulkdel, and heap vacuum):- HEAD
parallel 0: 53079.530 (15886, 28039, 9270)
parallel 1: 48101.460 (15931, 23247, 9215)
parallel 2: 37767.902 (15259, 12888, 9479)
parallel 4: 38005.836 (16097, 12683, 9217)
parallel 8: 37754.470 (16016, 12535, 9306)- Patched
parallel 0: 53468.734 (15990, 28296, 9465)
parallel 1: 35712.613 ( 8254, 23569, 3700)
parallel 2: 23566.426 ( 6180, 12760, 3283)
parallel 4: 20192.055 ( 4058, 12776, 2154)
parallel 8: 18614.717 ( 2797, 13244, 1579)The index bulkdel phase is saturated with parallel 2 as one worker is
assigned to one index. On HEAD, there is no further performance gain
with more than 'parallel 4'. On the other hand, on Patched, it got
faster even at 'parallel 4' and 'parallel 8' since other two phases
were also done with parallel workers.
I've attached new version patches that fixes failures reported by
cfbot. I hope these changes make cfbot happy.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0004-Support-parallel-heap-vacuum-during-lazy-vacuum.patchapplication/octet-stream; name=v4-0004-Support-parallel-heap-vacuum-during-lazy-vacuum.patchDownload
From 92cd53dff4e9a3da1278e7b666c15c03132c434d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:37:45 -0700
Subject: [PATCH v4 4/4] Support parallel heap vacuum during lazy vacuum.
This commit further extends parallel vacuum to perform the heap vacuum
phase with parallel workers. It leverages the shared TidStore iteration.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/heap/vacuumlazy.c | 175 +++++++++++++++++++--------
1 file changed, 122 insertions(+), 53 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 10991666e0b..1ab34732833 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -158,6 +158,7 @@ typedef struct LVRelScanStats
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber vacuumed_pages; /* # pages vacuumed in one second pass */
/* Counters that follow are only for scanned_pages */
int64 tuples_deleted; /* # deleted from table */
@@ -186,11 +187,15 @@ typedef struct PHVShared
MultiXactId NewRelminMxid;
bool skippedallvis;
+ bool do_index_vacuuming;
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState vistest;
+ dsa_pointer shared_iter_handle;
+ bool do_heap_vacuum;
+
/* per-worker scan stats for parallel heap vacuum scan */
LVRelScanStats worker_scan_stats[FLEXIBLE_ARRAY_MEMBER];
} PHVShared;
@@ -352,6 +357,7 @@ static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
static void lazy_vacuum(LVRelState *vacrel);
static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
static void lazy_vacuum_heap_rel(LVRelState *vacrel);
+static void do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter);
static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
Buffer buffer, OffsetNumber *deadoffsets,
int num_offsets, Buffer vmbuffer);
@@ -530,6 +536,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
scan_stats->lpdead_item_pages = 0;
scan_stats->missed_dead_pages = 0;
scan_stats->nonempty_pages = 0;
+ scan_stats->vacuumed_pages = 0;
/* Initialize remaining counters (be tidy) */
scan_stats->tuples_deleted = 0;
@@ -2362,46 +2369,14 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
return allindexes;
}
-/*
- * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
- *
- * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
- * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
- *
- * We may also be able to truncate the line pointer array of the heap pages we
- * visit. If there is a contiguous group of LP_UNUSED items at the end of the
- * array, it can be reclaimed as free space. These LP_UNUSED items usually
- * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
- * each page to LP_UNUSED, and then consider if it's possible to truncate the
- * page's line pointer array).
- *
- * Note: the reason for doing this as a second pass is we cannot remove the
- * tuples until we've removed their index entries, and we want to process
- * index entry removal in batches as large as possible.
- */
static void
-lazy_vacuum_heap_rel(LVRelState *vacrel)
+do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter)
{
- BlockNumber vacuumed_pages = 0;
Buffer vmbuffer = InvalidBuffer;
- LVSavedErrInfo saved_err_info;
- TidStoreIter *iter;
- TidStoreIterResult *iter_result;
- Assert(vacrel->do_index_vacuuming);
- Assert(vacrel->do_index_cleanup);
- Assert(vacrel->num_index_scans > 0);
-
- /* Report that we are now vacuuming the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-
- /* Update error traceback information */
- update_vacuum_error_info(vacrel, &saved_err_info,
- VACUUM_ERRCB_PHASE_VACUUM_HEAP,
- InvalidBlockNumber, InvalidOffsetNumber);
+ /* LVSavedErrInfo saved_err_info; */
+ TidStoreIterResult *iter_result;
- iter = TidStoreBeginIterate(vacrel->dead_items);
while ((iter_result = TidStoreIterateNext(iter)) != NULL)
{
BlockNumber blkno;
@@ -2439,26 +2414,100 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
UnlockReleaseBuffer(buf);
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
- vacuumed_pages++;
+ vacrel->scan_stats->vacuumed_pages++;
}
- TidStoreEndIterate(iter);
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
+}
+
+/*
+ * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
+ *
+ * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
+ * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
+ *
+ * We may also be able to truncate the line pointer array of the heap pages we
+ * visit. If there is a contiguous group of LP_UNUSED items at the end of the
+ * array, it can be reclaimed as free space. These LP_UNUSED items usually
+ * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
+ * each page to LP_UNUSED, and then consider if it's possible to truncate the
+ * page's line pointer array).
+ *
+ * Note: the reason for doing this as a second pass is we cannot remove the
+ * tuples until we've removed their index entries, and we want to process
+ * index entry removal in batches as large as possible.
+ */
+static void
+lazy_vacuum_heap_rel(LVRelState *vacrel)
+{
+ LVSavedErrInfo saved_err_info;
+ TidStoreIter *iter;
+
+ Assert(vacrel->do_index_vacuuming);
+ Assert(vacrel->do_index_cleanup);
+ Assert(vacrel->num_index_scans > 0);
+
+ /* Report that we are now vacuuming the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
+
+ /* Update error traceback information */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ vacrel->scan_stats->vacuumed_pages = 0;
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ iter = TidStoreBeginIterateShared(vacrel->dead_items);
+
+ phvstate->shared->do_heap_vacuum = true;
+ phvstate->shared->shared_iter_handle = TidStoreGetSharedIterHandle(iter);
+
+ /* launch workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+ }
+ else
+ iter = TidStoreBeginIterate(vacrel->dead_items);
+
+ /* do the real work */
+ do_lazy_vacuum_heap_rel(vacrel, iter);
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+
+ /* Gather the heap vacuum statistics that workers collected */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanStats *ss = &(phvstate->shared->worker_scan_stats[i]);
+
+ vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
+ }
+ }
+
+ TidStoreEndIterate(iter);
+
/*
* We set all LP_DEAD items from the first heap pass to LP_UNUSED during
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
(vacrel->dead_items_info->num_items == vacrel->scan_stats->lpdead_items &&
- vacuumed_pages == vacrel->scan_stats->lpdead_item_pages));
+ vacrel->scan_stats->vacuumed_pages == vacrel->scan_stats->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) vacrel->dead_items_info->num_items,
- vacuumed_pages)));
+ vacrel->scan_stats->vacuumed_pages)));
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3514,6 +3563,7 @@ heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
shared->NewRelfrozenXid = vacrel->scan_stats->NewRelfrozenXid;
shared->NewRelminMxid = vacrel->scan_stats->NewRelminMxid;
shared->skippedallvis = vacrel->scan_stats->skippedallvis;
+ shared->do_index_vacuuming = vacrel->do_index_vacuuming;
/*
* XXX: we copy the contents of vistest to the shared area, but in order
@@ -3566,7 +3616,6 @@ heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
PHVScanWorkerState *scanstate;
LVRelScanStats *scan_stats;
ErrorContextCallback errcallback;
- bool scan_done;
phvstate = palloc(sizeof(PHVState));
@@ -3603,10 +3652,11 @@ heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
/* initialize per-worker relation statistics */
MemSet(scan_stats, 0, sizeof(LVRelScanStats));
- /* Set fields necessary for heap scan */
+ /* Set fields necessary for heap scan and vacuum */
vacrel.scan_stats->NewRelfrozenXid = shared->NewRelfrozenXid;
vacrel.scan_stats->NewRelminMxid = shared->NewRelminMxid;
vacrel.scan_stats->skippedallvis = shared->skippedallvis;
+ vacrel.do_index_vacuuming = shared->do_index_vacuuming;
/* Initialize the per-worker scan state if not yet */
if (!phvstate->myscanstate->initialized)
@@ -3628,25 +3678,44 @@ heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
vacrel.relname = pstrdup(RelationGetRelationName(rel));
vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
errcallback.callback = vacuum_error_callback;
errcallback.arg = &vacrel;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
- scan_done = do_lazy_scan_heap(&vacrel);
+ if (shared->do_heap_vacuum)
+ {
+ TidStoreIter *iter;
+
+ iter = TidStoreAttachIterateShared(vacrel.dead_items, shared->shared_iter_handle);
+
+ /* Join parallel heap vacuum */
+ vacrel.phase = VACUUM_ERRCB_PHASE_VACUUM_HEAP;
+ do_lazy_vacuum_heap_rel(&vacrel, iter);
+
+ TidStoreEndIterate(iter);
+ }
+ else
+ {
+ bool scan_done;
+
+ /* Join parallel heap scan */
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in
+ * its scan state. Since this scan state might not be used in the next
+ * heap scan, we remember that it might have some unconsumed blocks so
+ * that the leader complete the scans after the heap scan phase
+ * finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+ }
/* Pop the error context stack */
error_context_stack = errcallback.previous;
-
- /*
- * If the leader or a worker finishes the heap scan because dead_items
- * TIDs is close to the limit, it might have some allocated blocks in its
- * scan state. Since this scan state might not be used in the next heap
- * scan, we remember that it might have some unconsumed blocks so that the
- * leader complete the scans after the heap scan phase finishes.
- */
- phvstate->myscanstate->maybe_have_blocks = !scan_done;
}
/*
@@ -3736,7 +3805,6 @@ parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel)
vacrel->scan_stats->frozen_pages += ss->frozen_pages;
vacrel->scan_stats->lpdead_item_pages += ss->lpdead_item_pages;
vacrel->scan_stats->missed_dead_pages += ss->missed_dead_pages;
- vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
vacrel->scan_stats->tuples_deleted += ss->tuples_deleted;
vacrel->scan_stats->tuples_frozen += ss->tuples_frozen;
vacrel->scan_stats->lpdead_items += ss->lpdead_items;
@@ -3774,6 +3842,7 @@ do_parallel_lazy_scan_heap(LVRelState *vacrel)
Assert(!IsParallelWorker());
/* launcher workers */
+ vacrel->phvstate->shared->do_heap_vacuum = false;
vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
/* initialize parallel scan description to join as a worker */
--
2.43.5
v4-0001-Support-parallel-heap-scan-during-lazy-vacuum.patchapplication/octet-stream; name=v4-0001-Support-parallel-heap-scan-during-lazy-vacuum.patchDownload
From 00a4337e8bd74a4764d9b4ed854c6684e92cb4f6 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 1 Jul 2024 15:17:46 +0900
Subject: [PATCH v4 1/4] Support parallel heap scan during lazy vacuum.
Commit 40d964ec99 allowed vacuum command to process indexes in
parallel. This change extends the parallel vacuum to support parallel
heap scan during lazy vacuum.
---
src/backend/access/heap/heapam_handler.c | 6 +
src/backend/access/heap/vacuumlazy.c | 1140 ++++++++++++++++++----
src/backend/commands/vacuumparallel.c | 311 +++++-
src/backend/storage/ipc/procarray.c | 9 -
src/include/access/heapam.h | 8 +
src/include/access/tableam.h | 87 ++
src/include/commands/vacuum.h | 8 +-
src/include/utils/snapmgr.h | 14 +-
8 files changed, 1318 insertions(+), 265 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a8d95e0f1c1..c49eed81e24 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2659,6 +2659,12 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_data = heapam_relation_copy_data,
.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
.relation_vacuum = heap_vacuum_rel,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,
+ .parallel_vacuum_estimate = heap_parallel_vacuum_estimate,
+ .parallel_vacuum_initialize = heap_parallel_vacuum_initialize,
+ .parallel_vacuum_scan_worker = heap_parallel_vacuum_scan_worker,
+
.scan_analyze_next_block = heapam_scan_analyze_next_block,
.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
.index_build_range_scan = heapam_index_build_range_scan,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 793bd33cb4d..10991666e0b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -48,6 +48,7 @@
#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
+#include "optimizer/paths.h"
#include "pgstat.h"
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
@@ -115,10 +116,24 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * Macro to check if we are in a parallel vacuum. If true, we are in the
- * parallel mode and the DSM segment is initialized.
+ * DSM keys for heap parallel vacuum scan. Unlike other parallel execution code, we
+ * we don't need to worry about DSM keys conflicting with plan_node_id, but need to
+ * avoid conflicting with DSM keys used in vacuumparallel.c.
+ */
+#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003
+
+/*
+ * Macros to check if we are in parallel heap vacuuming, parallel index vacuuming,
+ * or both. If ParallelVacuumIsActive() is true, we are in the parallel mode, meaning
+ * that we have dead items TIDs on shared memory area.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
+#define ParallelIndexVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_index((vacrel)->pvs) > 0)
+#define ParallelHeapVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_table((vacrel)->pvs) > 0)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -131,6 +146,109 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;
+/*
+ * Relation statistics collected during heap scanning and need to be shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelScanStats
+{
+ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation truncation */
+ BlockNumber frozen_pages; /* # pages with newly frozen tuples */
+ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
+ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+
+ /* Counters that follow are only for scanned_pages */
+ int64 tuples_deleted; /* # deleted from table */
+ int64 tuples_frozen; /* # newly frozen */
+ int64 lpdead_items; /* # deleted from indexes */
+ int64 live_tuples; /* # live tuples remaining */
+ int64 recently_dead_tuples; /* # dead, but not yet removable */
+ int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid. */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+} LVRelScanStats;
+
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
+ /* The current oldest extant XID/MXID shared by the leader process */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+
+ bool skippedallvis;
+
+ /* VACUUM operation's cutoffs for freezing and pruning */
+ struct VacuumCutoffs cutoffs;
+ GlobalVisState vistest;
+
+ /* per-worker scan stats for parallel heap vacuum scan */
+ LVRelScanStats worker_scan_stats[FLEXIBLE_ARRAY_MEMBER];
+} PHVShared;
+#define SizeOfPHVShared (offsetof(PHVShared, worker_scan_stats))
+
+/* Per-worker scan state for parallel heap vacuum scan */
+typedef struct PHVScanWorkerState
+{
+ bool initialized;
+
+ /* per-worker parallel table scan state */
+ ParallelBlockTableScanWorkerData state;
+
+ /*
+ * True if a parallel vacuum scan worker allocated blocks in state but
+ * might have not scanned all of them. The leader process will take over
+ * for scanning these remaining blocks.
+ */
+ bool maybe_have_blocks;
+
+ /* current block number being processed */
+ pg_atomic_uint32 cur_blkno;
+} PHVScanWorkerState;
+
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
+
+ /*
+ * Points to all per-worker scan state array stored on DSM area.
+ *
+ * During parallel heap scan, each worker allocates some chunks of blocks
+ * to scan in its scan state, and could exit while leaving some chunks
+ * un-scanned if the size of dead_items TIDs is close to overrunning the
+ * the available space. We store scan states on shared memory area so that
+ * workers can resume heap scans from the previous point.
+ */
+ PHVScanWorkerState *scanstates;
+
+ /* Assigned per-worker scan state */
+ PHVScanWorkerState *myscanstate;
+
+ /*
+ * All blocks up to this value has been scanned, i.e. minimum of cur_blkno
+ * among all PHVScanWorkerState. It's updated by
+ * parallel_heap_vacuum_compute_min_blkno().
+ */
+ BlockNumber min_blkno;
+
+ /* The number of workers launched for parallel heap vacuum */
+ int nworkers_launched;
+} PHVState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -142,6 +260,9 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
+ /* Parallel heap vacuum state and sizes for each struct */
+ PHVState *phvstate;
+
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
@@ -157,10 +278,6 @@ typedef struct LVRelState
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
- /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
- TransactionId NewRelfrozenXid;
- MultiXactId NewRelminMxid;
- bool skippedallvis;
/* Error reporting state */
char *dbname;
@@ -186,12 +303,10 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
- BlockNumber removed_pages; /* # pages removed by relation truncation */
- BlockNumber frozen_pages; /* # pages with newly frozen tuples */
- BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
- BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber next_fsm_block_to_vacuum;
+
+ /* Statistics collected during heap scan */
+ LVRelScanStats *scan_stats;
/* Statistics output by us, for table */
double new_rel_tuples; /* new estimated total # of tuples */
@@ -201,13 +316,6 @@ typedef struct LVRelState
/* Instrumentation counters */
int num_index_scans;
- /* Counters that follow are only for scanned_pages */
- int64 tuples_deleted; /* # deleted from table */
- int64 tuples_frozen; /* # newly frozen */
- int64 lpdead_items; /* # deleted from indexes */
- int64 live_tuples; /* # live tuples remaining */
- int64 recently_dead_tuples; /* # dead, but not yet removable */
- int64 missed_dead_tuples; /* # removable, but not removed */
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
@@ -227,6 +335,7 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
+static bool do_lazy_scan_heap(LVRelState *vacrel);
static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
bool *all_visible_according_to_vm);
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
@@ -269,6 +378,12 @@ static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
+
+static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
+static void parallel_heap_vacuum_compute_min_blkno(LVRelState *vacrel);
+static void parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel);
+static void parallel_heap_complete_unfinised_scan(LVRelState *vacrel);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -294,6 +409,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
+ LVRelScanStats *scan_stats;
bool verbose,
instrument,
skipwithvm,
@@ -404,14 +520,28 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
+ vacrel->next_fsm_block_to_vacuum = 0;
+
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->frozen_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
- /* dead_items_alloc allocates vacrel->dead_items later on */
+ scan_stats = palloc(sizeof(LVRelScanStats));
+ scan_stats->scanned_pages = 0;
+ scan_stats->removed_pages = 0;
+ scan_stats->frozen_pages = 0;
+ scan_stats->lpdead_item_pages = 0;
+ scan_stats->missed_dead_pages = 0;
+ scan_stats->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ scan_stats->tuples_deleted = 0;
+ scan_stats->tuples_frozen = 0;
+ scan_stats->lpdead_items = 0;
+ scan_stats->live_tuples = 0;
+ scan_stats->recently_dead_tuples = 0;
+ scan_stats->missed_dead_tuples = 0;
+
+ vacrel->scan_stats = scan_stats;
+
+ vacrel->num_index_scans = 0;
/* Allocate/initialize output statistics state */
vacrel->new_rel_tuples = 0;
@@ -419,14 +549,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
- /* Initialize remaining counters (be tidy) */
- vacrel->num_index_scans = 0;
- vacrel->tuples_deleted = 0;
- vacrel->tuples_frozen = 0;
- vacrel->lpdead_items = 0;
- vacrel->live_tuples = 0;
- vacrel->recently_dead_tuples = 0;
- vacrel->missed_dead_tuples = 0;
+ /* dead_items_alloc allocates vacrel->dead_items later on */
/*
* Get cutoffs that determine which deleted tuples are considered DEAD,
@@ -448,9 +571,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
/* Initialize state used to track oldest extant XID/MXID */
- vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
- vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- vacrel->skippedallvis = false;
+ vacrel->scan_stats->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
+ vacrel->scan_stats->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+ vacrel->scan_stats->skippedallvis = false;
skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
{
@@ -531,15 +654,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
* Non-aggressive VACUUMs may advance them by any amount, or not at all.
*/
- Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
+ Assert(vacrel->scan_stats->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
vacrel->cutoffs.relfrozenxid,
- vacrel->NewRelfrozenXid));
- Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
+ vacrel->scan_stats->NewRelfrozenXid));
+ Assert(vacrel->scan_stats->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
vacrel->cutoffs.relminmxid,
- vacrel->NewRelminMxid));
- if (vacrel->skippedallvis)
+ vacrel->scan_stats->NewRelminMxid));
+ if (vacrel->scan_stats->skippedallvis)
{
/*
* Must keep original relfrozenxid in a non-aggressive VACUUM that
@@ -547,8 +670,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* values will have missed unfrozen XIDs from the pages we skipped.
*/
Assert(!vacrel->aggressive);
- vacrel->NewRelfrozenXid = InvalidTransactionId;
- vacrel->NewRelminMxid = InvalidMultiXactId;
+ vacrel->scan_stats->NewRelfrozenXid = InvalidTransactionId;
+ vacrel->scan_stats->NewRelminMxid = InvalidMultiXactId;
}
/*
@@ -569,7 +692,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
*/
vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
new_rel_allvisible, vacrel->nindexes > 0,
- vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
+ vacrel->scan_stats->NewRelfrozenXid, vacrel->scan_stats->NewRelminMxid,
&frozenxid_updated, &minmulti_updated, false);
/*
@@ -585,8 +708,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
- vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples);
+ vacrel->scan_stats->recently_dead_tuples +
+ vacrel->scan_stats->missed_dead_tuples);
pgstat_progress_end_command();
if (instrument)
@@ -659,21 +782,21 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
- vacrel->removed_pages,
+ vacrel->scan_stats->removed_pages,
new_rel_pages,
- vacrel->scanned_pages,
+ vacrel->scan_stats->scanned_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->scanned_pages / orig_rel_pages);
+ 100.0 * vacrel->scan_stats->scanned_pages / orig_rel_pages);
appendStringInfo(&buf,
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
- (long long) vacrel->tuples_deleted,
+ (long long) vacrel->scan_stats->tuples_deleted,
(long long) vacrel->new_rel_tuples,
- (long long) vacrel->recently_dead_tuples);
- if (vacrel->missed_dead_tuples > 0)
+ (long long) vacrel->scan_stats->recently_dead_tuples);
+ if (vacrel->scan_stats->missed_dead_tuples > 0)
appendStringInfo(&buf,
_("tuples missed: %lld dead from %u pages not removed due to cleanup lock contention\n"),
- (long long) vacrel->missed_dead_tuples,
- vacrel->missed_dead_pages);
+ (long long) vacrel->scan_stats->missed_dead_tuples,
+ vacrel->scan_stats->missed_dead_pages);
diff = (int32) (ReadNextTransactionId() -
vacrel->cutoffs.OldestXmin);
appendStringInfo(&buf,
@@ -681,25 +804,25 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->cutoffs.OldestXmin, diff);
if (frozenxid_updated)
{
- diff = (int32) (vacrel->NewRelfrozenXid -
+ diff = (int32) (vacrel->scan_stats->NewRelfrozenXid -
vacrel->cutoffs.relfrozenxid);
appendStringInfo(&buf,
_("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
- vacrel->NewRelfrozenXid, diff);
+ vacrel->scan_stats->NewRelfrozenXid, diff);
}
if (minmulti_updated)
{
- diff = (int32) (vacrel->NewRelminMxid -
+ diff = (int32) (vacrel->scan_stats->NewRelminMxid -
vacrel->cutoffs.relminmxid);
appendStringInfo(&buf,
_("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
- vacrel->NewRelminMxid, diff);
+ vacrel->scan_stats->NewRelminMxid, diff);
}
appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %lld tuples frozen\n"),
- vacrel->frozen_pages,
+ vacrel->scan_stats->frozen_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->frozen_pages / orig_rel_pages,
- (long long) vacrel->tuples_frozen);
+ 100.0 * vacrel->scan_stats->frozen_pages / orig_rel_pages,
+ (long long) vacrel->scan_stats->tuples_frozen);
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
@@ -719,10 +842,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
appendStringInfo(&buf, msgfmt,
- vacrel->lpdead_item_pages,
+ vacrel->scan_stats->lpdead_item_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
- (long long) vacrel->lpdead_items);
+ 100.0 * vacrel->scan_stats->lpdead_item_pages / orig_rel_pages,
+ (long long) vacrel->scan_stats->lpdead_items);
for (int i = 0; i < vacrel->nindexes; i++)
{
IndexBulkDeleteResult *istat = vacrel->indstats[i];
@@ -823,14 +946,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- blkno,
- next_fsm_block_to_vacuum = 0;
- bool all_visible_according_to_vm;
-
- TidStore *dead_items = vacrel->dead_items;
+ BlockNumber rel_pages = vacrel->rel_pages;
VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
- Buffer vmbuffer = InvalidBuffer;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -850,6 +967,72 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap with parallel workers.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);
+
+ /* report that everything is now scanned */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, rel_pages);
+
+ /* now we can compute the new value for pg_class.reltuples */
+ vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
+ vacrel->scan_stats->scanned_pages,
+ vacrel->scan_stats->live_tuples);
+
+ /*
+ * Also compute the total number of surviving heap entries. In the
+ * (unlikely) scenario that new_live_tuples is -1, take it as zero.
+ */
+ vacrel->new_rel_tuples =
+ Max(vacrel->new_live_tuples, 0) + vacrel->scan_stats->recently_dead_tuples +
+ vacrel->scan_stats->missed_dead_tuples;
+
+ /*
+ * Do index vacuuming (call each index's ambulkdelete routine), then do
+ * related heap vacuuming
+ */
+ if (dead_items_info->num_items > 0)
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the remainder of the Free Space Map. We must do this whether or
+ * not there were indexes, and whether or not we bypassed index vacuuming.
+ */
+ if (rel_pages > vacrel->next_fsm_block_to_vacuum)
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ rel_pages);
+
+ /* report all blocks vacuumed */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, rel_pages);
+
+ /* Do final index cleanup (call each index's amvacuumcleanup routine) */
+ if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
+ lazy_cleanup_all_indexes(vacrel);
+}
+
+/*
+ * Workhorse for lazy_scan_heap().
+ *
+ * Return true if we processed all blocks, otherwise false if we exit from this function
+ * while not completing the heap scan due to full of dead item TIDs. In serial heap scan
+ * case, this function always returns true. In parallel heap vacuum scan, this function
+ * is called by both worker processes and the leader process, and could return false.
+ */
+static bool
+do_lazy_scan_heap(LVRelState *vacrel)
+{
+ bool all_visible_according_to_vm;
+ TidStore *dead_items = vacrel->dead_items;
+ VacDeadItemsInfo *dead_items_info = vacrel->dead_items_info;
+ BlockNumber blkno;
+ Buffer vmbuffer = InvalidBuffer;
+ bool scan_done = true;
+
while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
{
Buffer buf;
@@ -857,13 +1040,20 @@ lazy_scan_heap(LVRelState *vacrel)
bool has_lpdead_items;
bool got_cleanup_lock = false;
- vacrel->scanned_pages++;
+ vacrel->scan_stats->scanned_pages++;
/* Report as block scanned, update error traceback information */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
blkno, InvalidOffsetNumber);
+ /*
+ * If parallel vacuum scan is enabled, advertise the current block
+ * number
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ pg_atomic_write_u32(&(vacrel->phvstate->myscanstate->cur_blkno), (uint32) blkno);
+
vacuum_delay_point();
/*
@@ -875,46 +1065,10 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (!IsParallelWorker() &&
+ vacrel->scan_stats->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
- /*
- * Consider if we definitely have enough space to process TIDs on page
- * already. If we are close to overrunning the available space for
- * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
- * this page.
- */
- if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
- {
- /*
- * Before beginning index vacuuming, we release any pin we may
- * hold on the visibility map page. This isn't necessary for
- * correctness, but we do it anyway to avoid holding the pin
- * across a lengthy, unrelated operation.
- */
- if (BufferIsValid(vmbuffer))
- {
- ReleaseBuffer(vmbuffer);
- vmbuffer = InvalidBuffer;
- }
-
- /* Perform a round of index and heap vacuuming */
- vacrel->consider_bypass_optimization = false;
- lazy_vacuum(vacrel);
-
- /*
- * Vacuum the Free Space Map to make newly-freed space visible on
- * upper-level FSM pages. Note we have not yet processed blkno.
- */
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
-
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
- }
-
/*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
@@ -1003,9 +1157,10 @@ lazy_scan_heap(LVRelState *vacrel)
* revisit this page. Since updating the FSM is desirable but not
* absolutely required, that's OK.
*/
- if (vacrel->nindexes == 0
- || !vacrel->do_index_vacuuming
- || !has_lpdead_items)
+ if (!IsParallelWorker() &&
+ (vacrel->nindexes == 0
+ || !vacrel->do_index_vacuuming
+ || !has_lpdead_items))
{
Size freespace = PageGetHeapFreeSpace(page);
@@ -1019,57 +1174,172 @@ lazy_scan_heap(LVRelState *vacrel)
* held the cleanup lock and lazy_scan_prune() was called.
*/
if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
- blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ blkno - vacrel->next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
+ BlockNumber fsm_vac_up_to;
+
+ /*
+ * If parallel heap vacuum scan is active, compute the minimum
+ * block number we scanned so far.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ parallel_heap_vacuum_compute_min_blkno(vacrel);
+ fsm_vac_up_to = vacrel->phvstate->min_blkno;
+ }
+ else
+ {
+ /* blkno is already processed */
+ fsm_vac_up_to = blkno + 1;
+ }
+
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ fsm_vac_up_to);
+ vacrel->next_fsm_block_to_vacuum = fsm_vac_up_to;
}
}
else
UnlockReleaseBuffer(buf);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(dead_items) > dead_items_info->max_bytes)
+ {
+ /*
+ * Before beginning index vacuuming, we release any pin we may
+ * hold on the visibility map page. This isn't necessary for
+ * correctness, but we do it anyway to avoid holding the pin
+ * across a lengthy, unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ /* Remember we might have some unprocessed blocks */
+ scan_done = false;
+
+ /*
+ * Pause the heap scan without invoking index and heap
+ * vacuuming. The leader process also skips FSM vacuum since
+ * some blocks before blkno might have not processed yet. The
+ * leader will wait for all workers to finish and perform
+ * index and heap vacuuming, and then perform FSM vacuum.
+ */
+ break;
+ }
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = blkno;
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ continue;
+ }
}
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
- /* report that everything is now scanned */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ return scan_done;
+}
- /* now we can compute the new value for pg_class.reltuples */
- vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scanned_pages,
- vacrel->live_tuples);
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
+{
+ PHVState *phvstate = vacrel->phvstate;
+ BlockNumber next_block;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 mapbits = 0;
- /*
- * Also compute the total number of surviving heap entries. In the
- * (unlikely) scenario that new_live_tuples is -1, take it as zero.
- */
- vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples;
+ Assert(ParallelHeapVacuumIsActive(vacrel));
- /*
- * Do index vacuuming (call each index's ambulkdelete routine), then do
- * related heap vacuuming
- */
- if (dead_items_info->num_items > 0)
- lazy_vacuum(vacrel);
+ for (;;)
+ {
+ next_block = table_block_parallelscan_nextpage(vacrel->rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
- /*
- * Vacuum the remainder of the Free Space Map. We must do this whether or
- * not there were indexes, and whether or not we bypassed index vacuuming.
- */
- if (blkno > next_fsm_block_to_vacuum)
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
+ /* Have we reached the end of the table? */
+ if (!BlockNumberIsValid(next_block) || next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
- /* report all blocks vacuumed */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
- /* Do final index cleanup (call each index's amvacuumcleanup routine) */
- if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
- lazy_cleanup_all_indexes(vacrel);
+ /* We always treat the last block as unsafe to skip */
+ if (next_block == vacrel->rel_pages - 1)
+ break;
+
+ mapbits = visibilitymap_get_status(vacrel->rel, next_block, &vmbuffer);
+
+ /*
+ * A block is unskippable if it is not all visible according to the
+ * visibility map.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
+ break;
+ }
+
+ /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
+ if (!vacrel->skipwithvm)
+ break;
+
+ /*
+ * Aggressive VACUUM caller can't skip pages just because they are
+ * all-visible.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;
+
+ /*
+ * All-visible block is safe to skip in non-aggressive case. But
+ * remember that the final range contains such a block for later.
+ */
+ vacrel->scan_stats->skippedallvis = true;
+ }
+ }
+
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
+
+ *blkno = next_block;
+ *all_visible_according_to_vm = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ return true;
}
/*
@@ -1096,6 +1366,9 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
BlockNumber next_block;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ return heap_vac_scan_next_block_parallel(vacrel, blkno, all_visible_according_to_vm);
+
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1145,7 +1418,7 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
next_block = vacrel->next_unskippable_block;
if (skipsallvis)
- vacrel->skippedallvis = true;
+ vacrel->scan_stats->skippedallvis = true;
}
}
@@ -1218,11 +1491,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/*
* Caller must scan the last page to determine whether it has tuples
- * (caller must have the opportunity to set vacrel->nonempty_pages).
- * This rule avoids having lazy_truncate_heap() take access-exclusive
- * lock on rel to attempt a truncation that fails anyway, just because
- * there are tuples on the last page (it is likely that there will be
- * tuples on other nearby pages as well, but those can be skipped).
+ * (caller must have the opportunity to set
+ * vacrel->scan_stats->nonempty_pages). This rule avoids having
+ * lazy_truncate_heap() take access-exclusive lock on rel to attempt a
+ * truncation that fails anyway, just because there are tuples on the
+ * last page (it is likely that there will be tuples on other nearby
+ * pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
@@ -1447,10 +1721,10 @@ lazy_scan_prune(LVRelState *vacrel,
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
- &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+ &vacrel->scan_stats->NewRelfrozenXid, &vacrel->scan_stats->NewRelminMxid);
- Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
- Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+ Assert(MultiXactIdIsValid(vacrel->scan_stats->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->scan_stats->NewRelfrozenXid));
if (presult.nfrozen > 0)
{
@@ -1459,7 +1733,7 @@ lazy_scan_prune(LVRelState *vacrel,
* nfrozen == 0, since it only counts pages with newly frozen tuples
* (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->frozen_pages++;
+ vacrel->scan_stats->frozen_pages++;
}
/*
@@ -1494,7 +1768,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if (presult.lpdead_items > 0)
{
- vacrel->lpdead_item_pages++;
+ vacrel->scan_stats->lpdead_item_pages++;
/*
* deadoffsets are collected incrementally in
@@ -1509,15 +1783,15 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += presult.live_tuples;
- vacrel->recently_dead_tuples += presult.recently_dead_tuples;
+ vacrel->scan_stats->tuples_deleted += presult.ndeleted;
+ vacrel->scan_stats->tuples_frozen += presult.nfrozen;
+ vacrel->scan_stats->lpdead_items += presult.lpdead_items;
+ vacrel->scan_stats->live_tuples += presult.live_tuples;
+ vacrel->scan_stats->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_stats->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
@@ -1667,8 +1941,8 @@ lazy_scan_noprune(LVRelState *vacrel,
missed_dead_tuples;
bool hastup;
HeapTupleHeader tupleheader;
- TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ TransactionId NoFreezePageRelfrozenXid = vacrel->scan_stats->NewRelfrozenXid;
+ MultiXactId NoFreezePageRelminMxid = vacrel->scan_stats->NewRelminMxid;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1795,8 +2069,8 @@ lazy_scan_noprune(LVRelState *vacrel,
* this particular page until the next VACUUM. Remember its details now.
* (lazy_scan_prune expects a clean slate, so we have to do this last.)
*/
- vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = NoFreezePageRelminMxid;
+ vacrel->scan_stats->NewRelfrozenXid = NoFreezePageRelfrozenXid;
+ vacrel->scan_stats->NewRelminMxid = NoFreezePageRelminMxid;
/* Save any LP_DEAD items found on the page in dead_items */
if (vacrel->nindexes == 0)
@@ -1823,25 +2097,25 @@ lazy_scan_noprune(LVRelState *vacrel,
* indexes will be deleted during index vacuuming (and then marked
* LP_UNUSED in the heap)
*/
- vacrel->lpdead_item_pages++;
+ vacrel->scan_stats->lpdead_item_pages++;
dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
- vacrel->lpdead_items += lpdead_items;
+ vacrel->scan_stats->lpdead_items += lpdead_items;
}
/*
* Finally, add relevant page-local counts to whole-VACUUM counts
*/
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
- vacrel->missed_dead_tuples += missed_dead_tuples;
+ vacrel->scan_stats->live_tuples += live_tuples;
+ vacrel->scan_stats->recently_dead_tuples += recently_dead_tuples;
+ vacrel->scan_stats->missed_dead_tuples += missed_dead_tuples;
if (missed_dead_tuples > 0)
- vacrel->missed_dead_pages++;
+ vacrel->scan_stats->missed_dead_pages++;
/* Can't truncate this page */
if (hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_stats->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
@@ -1870,7 +2144,7 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(vacrel->lpdead_item_pages > 0);
+ Assert(vacrel->scan_stats->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
@@ -1904,7 +2178,7 @@ lazy_vacuum(LVRelState *vacrel)
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
- Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
+ Assert(vacrel->scan_stats->lpdead_items == vacrel->dead_items_info->num_items);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -1931,7 +2205,7 @@ lazy_vacuum(LVRelState *vacrel)
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
- bypass = (vacrel->lpdead_item_pages < threshold &&
+ bypass = (vacrel->scan_stats->lpdead_item_pages < threshold &&
(TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L)));
}
@@ -2024,7 +2298,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2069,7 +2343,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
* place).
*/
Assert(vacrel->num_index_scans > 0 ||
- vacrel->dead_items_info->num_items == vacrel->lpdead_items);
+ vacrel->dead_items_info->num_items == vacrel->scan_stats->lpdead_items);
Assert(allindexes || VacuumFailsafeActive);
/*
@@ -2178,8 +2452,8 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
- (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
- vacuumed_pages == vacrel->lpdead_item_pages));
+ (vacrel->dead_items_info->num_items == vacrel->scan_stats->lpdead_items &&
+ vacuumed_pages == vacrel->scan_stats->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
@@ -2332,7 +2606,7 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
vacrel->do_index_cleanup = false;
vacrel->do_rel_truncate = false;
- /* Reset the progress counters */
+ /* Reset the progress scan_stats */
pgstat_progress_update_multi_param(2, progress_index, progress_val);
ereport(WARNING,
@@ -2360,7 +2634,7 @@ static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
double reltuples = vacrel->new_rel_tuples;
- bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
+ bool estimated_count = vacrel->scan_stats->scanned_pages < vacrel->rel_pages;
const int progress_start_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_INDEXES_TOTAL
@@ -2383,7 +2657,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2407,7 +2681,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
estimated_count);
}
- /* Reset the progress counters */
+ /* Reset the progress scan_stats */
pgstat_progress_update_multi_param(2, progress_end_index, progress_end_val);
}
@@ -2541,7 +2815,7 @@ should_attempt_truncation(LVRelState *vacrel)
if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
return false;
- possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
+ possibly_freeable = vacrel->rel_pages - vacrel->scan_stats->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
@@ -2567,7 +2841,7 @@ lazy_truncate_heap(LVRelState *vacrel)
/* Update error traceback information one last time */
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
- vacrel->nonempty_pages, InvalidOffsetNumber);
+ vacrel->scan_stats->nonempty_pages, InvalidOffsetNumber);
/*
* Loop until no more truncating can be done.
@@ -2668,7 +2942,7 @@ lazy_truncate_heap(LVRelState *vacrel)
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
- vacrel->removed_pages += orig_rel_pages - new_rel_pages;
+ vacrel->scan_stats->removed_pages += orig_rel_pages - new_rel_pages;
vacrel->rel_pages = new_rel_pages;
ereport(vacrel->verbose ? INFO : DEBUG2,
@@ -2676,7 +2950,7 @@ lazy_truncate_heap(LVRelState *vacrel)
vacrel->relname,
orig_rel_pages, new_rel_pages)));
orig_rel_pages = new_rel_pages;
- } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
+ } while (new_rel_pages > vacrel->scan_stats->nonempty_pages && lock_waiter_detected);
}
/*
@@ -2704,7 +2978,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
- while (blkno > vacrel->nonempty_pages)
+ while (blkno > vacrel->scan_stats->nonempty_pages)
{
Buffer buf;
Page page;
@@ -2816,7 +3090,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
- return vacrel->nonempty_pages;
+ return vacrel->scan_stats->nonempty_pages;
}
/*
@@ -2834,12 +3108,8 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- /*
- * Initialize state for a parallel vacuum. As of now, only one worker can
- * be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
- */
- if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
+ /* Initialize state for a parallel vacuum */
+ if (nworkers >= 0)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -2857,11 +3127,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
+ {
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes. Note
+ * that only one worker can be used for an index, we invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
+ }
/*
* If parallel mode started, dead_items and dead_items_info spaces are
@@ -2902,9 +3181,19 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
};
int64 prog_val[2];
+ /*
+ * Protect both dead_items and dead_items_info from concurrent updates in
+ * parallel heap scan cases.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreLockExclusive(dead_items);
+
TidStoreSetBlockOffsets(dead_items, blkno, offsets, num_offsets);
vacrel->dead_items_info->num_items += num_offsets;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreUnlock(dead_items);
+
/* update the progress information */
prog_val[0] = vacrel->dead_items_info->num_items;
prog_val[1] = TidStoreMemoryUsage(dead_items);
@@ -3106,6 +3395,457 @@ update_relstats_all_indexes(LVRelState *vacrel)
}
}
+/*
+ * Compute the number of parallel workers for parallel vacuum heap scan.
+ *
+ * The calculation logic is borrowed from compute_parallel_worker().
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ int parallel_workers = 0;
+ int heap_parallel_threshold;
+ int heap_pages;
+
+ if (nrequested == 0)
+ {
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. This probably needs to be a good deal more
+ * sophisticated, but we need something here for now. Note that the
+ * upper limit of the min_parallel_table_scan_size GUC is chosen to
+ * prevent overflow here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
+ {
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
+ }
+ }
+ else
+ parallel_workers = nrequested;
+
+ return parallel_workers;
+}
+
+/* Estimate shared memory sizes required for parallel heap vacuum */
+static inline void
+heap_parallel_estimate_shared_memory_size(Relation rel, int nworkers, Size *pscan_len,
+ Size *shared_len, Size *pscanwork_len)
+{
+ Size size = 0;
+
+ size = add_size(size, SizeOfPHVShared);
+ size = add_size(size, mul_size(sizeof(LVRelScanStats), nworkers));
+ *shared_len = size;
+
+ *pscan_len = table_block_parallelscan_estimate(rel);
+
+ *pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+}
+
+/*
+ * Compute the amount of space we'll need in the parallel heap vacuum
+ * DSM, and inform pcxt->estimator about our needs.
+ *
+ * nworkers is the number of workers for the table vacuum. Note that it could
+ * be different than pcxt->nworkers since it is the maximum of number of
+ * workers for table vacuum and index vacuum.
+ */
+void
+heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ /* space for PHVShared */
+ shm_toc_estimate_chunk(&pcxt->estimator, shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ pscan_len = table_block_parallelscan_estimate(rel);
+ shm_toc_estimate_chunk(&pcxt->estimator, pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+ shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up shared memory for parallel heap vacuum.
+ */
+void
+heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ LVRelState *vacrel = (LVRelState *) state;
+ PHVState *phvstate = vacrel->phvstate;
+ ParallelBlockTableScanDesc pscan;
+ PHVScanWorkerState *pscanwork;
+ PHVShared *shared;
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ phvstate = (PHVState *) palloc(sizeof(PHVState));
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ shared = shm_toc_allocate(pcxt->toc, shared_len);
+
+ /* Prepare the shared information */
+
+ MemSet(shared, 0, shared_len);
+ shared->aggressive = vacrel->aggressive;
+ shared->skipwithvm = vacrel->skipwithvm;
+ shared->cutoffs = vacrel->cutoffs;
+ shared->NewRelfrozenXid = vacrel->scan_stats->NewRelfrozenXid;
+ shared->NewRelminMxid = vacrel->scan_stats->NewRelminMxid;
+ shared->skippedallvis = vacrel->scan_stats->skippedallvis;
+
+ /*
+ * XXX: we copy the contents of vistest to the shared area, but in order
+ * to do that, we need to either expose GlobalVisTest or to provide
+ * functions to copy contents of GlobalVisTest to somewhere. Currently we
+ * do the former but not sure it's the best choice.
+ *
+ * Alternative idea is to have each worker determine cutoff and have their
+ * own vistest. But we need to carefully consider it since parallel
+ * workers end up having different cutoff and horizon.
+ */
+ shared->vistest = *vacrel->vistest;
+
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_SHARED, shared);
+
+ phvstate->shared = shared;
+
+ /* prepare the parallel block table scan description */
+ pscan = shm_toc_allocate(pcxt->toc, pscan_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC, pscan);
+
+ /* initialize parallel scan description */
+ table_block_parallelscan_initialize(rel, (ParallelTableScanDesc) pscan);
+
+ /* Disable sync scan to always start from the first block */
+ pscan->base.phs_syncscan = false;
+
+ phvstate->pscandesc = pscan;
+
+ /* prepare the workers' parallel block table scan state */
+ pscanwork = shm_toc_allocate(pcxt->toc, pscanwork_len);
+ MemSet(pscanwork, 0, pscanwork_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_SCAN_DESC_WORKER, pscanwork);
+ phvstate->scanstates = pscanwork;
+
+ vacrel->phvstate = phvstate;
+}
+
+/*
+ * Main function for parallel heap vacuum workers.
+ */
+void
+heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ LVRelState vacrel = {0};
+ PHVState *phvstate;
+ PHVShared *shared;
+ ParallelBlockTableScanDesc pscandesc;
+ PHVScanWorkerState *scanstate;
+ LVRelScanStats *scan_stats;
+ ErrorContextCallback errcallback;
+ bool scan_done;
+
+ phvstate = palloc(sizeof(PHVState));
+
+ pscandesc = (ParallelBlockTableScanDesc) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC,
+ false);
+ phvstate->pscandesc = pscandesc;
+
+ shared = (PHVShared *) shm_toc_lookup(pwcxt->toc, LV_PARALLEL_SCAN_SHARED,
+ false);
+ phvstate->shared = shared;
+
+ scanstate = (PHVScanWorkerState *) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_SCAN_DESC_WORKER,
+ false);
+
+ phvstate->myscanstate = &(scanstate[ParallelWorkerNumber]);
+ scan_stats = &(shared->worker_scan_stats[ParallelWorkerNumber]);
+
+ /* Prepare LVRelState */
+ vacrel.rel = rel;
+ vacrel.indrels = parallel_vacuum_get_table_indexes(pvs, &vacrel.nindexes);
+ vacrel.pvs = pvs;
+ vacrel.phvstate = phvstate;
+ vacrel.aggressive = shared->aggressive;
+ vacrel.skipwithvm = shared->skipwithvm;
+ vacrel.cutoffs = shared->cutoffs;
+ vacrel.vistest = &(shared->vistest);
+ vacrel.dead_items = parallel_vacuum_get_dead_items(pvs,
+ &vacrel.dead_items_info);
+ vacrel.rel_pages = RelationGetNumberOfBlocks(rel);
+ vacrel.scan_stats = scan_stats;
+
+ /* initialize per-worker relation statistics */
+ MemSet(scan_stats, 0, sizeof(LVRelScanStats));
+
+ /* Set fields necessary for heap scan */
+ vacrel.scan_stats->NewRelfrozenXid = shared->NewRelfrozenXid;
+ vacrel.scan_stats->NewRelminMxid = shared->NewRelminMxid;
+ vacrel.scan_stats->skippedallvis = shared->skippedallvis;
+
+ /* Initialize the per-worker scan state if not yet */
+ if (!phvstate->myscanstate->initialized)
+ {
+ table_block_parallelscan_startblock_init(rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
+
+ pg_atomic_init_u32(&(phvstate->myscanstate->cur_blkno), 0);
+ phvstate->myscanstate->maybe_have_blocks = false;
+ phvstate->myscanstate->initialized = true;
+ }
+
+ /*
+ * Setup error traceback support for ereport() for parallel table vacuum
+ * workers
+ */
+ vacrel.dbname = get_database_name(MyDatabaseId);
+ vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
+ vacrel.relname = pstrdup(RelationGetRelationName(rel));
+ vacrel.indname = NULL;
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = &vacrel;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in its
+ * scan state. Since this scan state might not be used in the next heap
+ * scan, we remember that it might have some unconsumed blocks so that the
+ * leader complete the scans after the heap scan phase finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+}
+
+/*
+ * Complete parallel heaps scans that have remaining blocks in their
+ * chunks.
+ */
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)
+{
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+
+ nworkers = parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
+ for (int i = 0; i < nworkers; i++)
+ {
+ PHVScanWorkerState *wstate = &(vacrel->phvstate->scanstates[i]);
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ if (!wstate->maybe_have_blocks)
+
+ continue;
+
+ /* Attache the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ Assert(scan_done);
+ }
+
+ /*
+ * We don't need to gather the scan statistics here because statistics
+ * have already been accumulated the leaders statistics directly.
+ */
+}
+
+/*
+ * Compute the minimum block number we have scanned so far and update
+ * vacrel->min_blkno.
+ */
+static void
+parallel_heap_vacuum_compute_min_blkno(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+
+ /*
+ * We check all worker scan states here to compute the minimum block
+ * number among all scan states.
+ */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ PHVScanWorkerState *wstate = &(phvstate->scanstates[i]);
+ BlockNumber blkno;
+
+ /* Skip if no worker has been initialized the scan state */
+ if (!wstate->initialized)
+ continue;
+
+ blkno = pg_atomic_read_u32(&(wstate->cur_blkno));
+ if (blkno < phvstate->min_blkno)
+ phvstate->min_blkno = blkno;
+ }
+}
+
+/*
+ * Accumulate relation scan_stats that parallel workers collected into the
+ * leader's counters.
+ */
+static void
+parallel_heap_vacuum_gather_scan_stats(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* Gather the scan statistics that workers collected */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanStats *ss = &(phvstate->shared->worker_scan_stats[i]);
+
+ vacrel->scan_stats->scanned_pages += ss->scanned_pages;
+ vacrel->scan_stats->removed_pages += ss->removed_pages;
+ vacrel->scan_stats->frozen_pages += ss->frozen_pages;
+ vacrel->scan_stats->lpdead_item_pages += ss->lpdead_item_pages;
+ vacrel->scan_stats->missed_dead_pages += ss->missed_dead_pages;
+ vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
+ vacrel->scan_stats->tuples_deleted += ss->tuples_deleted;
+ vacrel->scan_stats->tuples_frozen += ss->tuples_frozen;
+ vacrel->scan_stats->lpdead_items += ss->lpdead_items;
+ vacrel->scan_stats->live_tuples += ss->live_tuples;
+ vacrel->scan_stats->recently_dead_tuples += ss->recently_dead_tuples;
+ vacrel->scan_stats->missed_dead_tuples += ss->missed_dead_tuples;
+
+ if (ss->nonempty_pages < vacrel->scan_stats->nonempty_pages)
+ vacrel->scan_stats->nonempty_pages = ss->nonempty_pages;
+
+ if (TransactionIdPrecedes(ss->NewRelfrozenXid, vacrel->scan_stats->NewRelfrozenXid))
+ vacrel->scan_stats->NewRelfrozenXid = ss->NewRelfrozenXid;
+
+ if (MultiXactIdPrecedesOrEquals(ss->NewRelminMxid, vacrel->scan_stats->NewRelminMxid))
+ vacrel->scan_stats->NewRelminMxid = ss->NewRelminMxid;
+
+ if (!vacrel->scan_stats->skippedallvis && ss->skippedallvis)
+ vacrel->scan_stats->skippedallvis = true;
+ }
+
+ /* Also, compute the minimum block number we scanned so far */
+ parallel_heap_vacuum_compute_min_blkno(vacrel);
+}
+
+/*
+ * A parallel variant of do_lazy_scan_heap(). The leader process launches parallel
+ * workers to scan the heap in parallel.
+ */
+static void
+do_parallel_lazy_scan_heap(LVRelState *vacrel)
+{
+ PHVScanWorkerState *scanstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* launcher workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ /* initialize parallel scan description to join as a worker */
+ scanstate = palloc(sizeof(PHVScanWorkerState));
+ table_block_parallelscan_startblock_init(vacrel->rel, &(scanstate->state),
+ vacrel->phvstate->pscandesc);
+ vacrel->phvstate->myscanstate = scanstate;
+
+ for (;;)
+ {
+ bool scan_done;
+
+ /*
+ * Scan the table until either we are close to overrunning the
+ * available space for dead_items TIDs or we reach the end of the
+ * table.
+ */
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* stop parallel workers and gather the collected stats */
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+ parallel_heap_vacuum_gather_scan_stats(vacrel);
+
+ /*
+ * If the heap scan paused in the middle of the table due to full of
+ * dead_items TIDs, perform a round of index and heap vacuuming.
+ */
+ if (!scan_done)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_blkno should have already been updated when gathering
+ * statistics
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ break;
+ }
+
+ /*
+ * The parallel heap vacuum finished, but it's possible that some workers
+ * have allocated blocks but not processed yet. This can happen for
+ * example when workers exit because of full of dead_items TIDs and the
+ * leader process could launch fewer workers in the next cycle.
+ */
+ parallel_heap_complete_unfinised_scan(vacrel);
+}
+
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 4fd6574e129..3aea80a29c4 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -6,15 +6,24 @@
* This file contains routines that are intended to support setting up, using,
* and tearing down a ParallelVacuumState.
*
- * In a parallel vacuum, we perform both index bulk deletion and index cleanup
- * with parallel worker processes. Individual indexes are processed by one
- * vacuum process. ParallelVacuumState contains shared information as well as
- * the memory space for storing dead items allocated in the DSA area. We
- * launch parallel worker processes at the start of parallel index
- * bulk-deletion and index cleanup and once all indexes are processed, the
- * parallel worker processes exit. Each time we process indexes in parallel,
- * the parallel context is re-initialized so that the same DSM can be used for
- * multiple passes of index bulk-deletion and index cleanup.
+ * In a parallel vacuum, we perform table scan or both index bulk deletion and
+ * index cleanup or all of them with parallel worker processes. Different
+ * numbers of workers are launched for the table vacuuming and index processing.
+ * ParallelVacuumState contains shared information as well as the memory space
+ * for storing dead items allocated in the DSA area.
+ *
+ * When initializing parallel table vacuum scan, we invoke table AM routines for
+ * estimating DSM sizes and initializing DSM memory. Parallel table vacuum
+ * workers invoke the table AM routine for vacuuming the table.
+ *
+ * For processing indexes in parallel, individual indexes are processed by one
+ * vacuum process. We launch parallel worker processes at the start of parallel index
+ * bulk-deletion and index cleanup and once all indexes are processed, the parallel
+ * worker processes exit.
+ *
+ * Each time we process table or indexes in parallel, the parallel context is
+ * re-initialized so that the same DSM can be used for multiple passes of table vacuum
+ * or index bulk-deletion and index cleanup.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -28,6 +37,7 @@
#include "access/amapi.h"
#include "access/table.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
@@ -65,6 +75,12 @@ typedef struct PVShared
int elevel;
uint64 queryid;
+ /*
+ * True if the caller wants parallel workers to invoke vacuum table scan
+ * callback.
+ */
+ bool do_vacuum_table_scan;
+
/*
* Fields for both index vacuum and cleanup.
*
@@ -101,6 +117,13 @@ typedef struct PVShared
*/
pg_atomic_uint32 cost_balance;
+ /*
+ * The number of workers for parallel table scan/vacuuming and index
+ * vacuuming, respectively.
+ */
+ int nworkers_for_table;
+ int nworkers_for_index;
+
/*
* Number of active parallel workers. This is used for computing the
* minimum threshold of the vacuum cost balance before a worker sleeps for
@@ -164,6 +187,9 @@ struct ParallelVacuumState
/* NULL for worker processes */
ParallelContext *pcxt;
+ /* Passed to parallel table scan workers. NULL for leader process */
+ ParallelWorkerContext *pwcxt;
+
/* Parent Heap Relation */
Relation heaprel;
@@ -193,6 +219,9 @@ struct ParallelVacuumState
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /* How many times parallel table vacuum scan is called? */
+ int num_table_scans;
+
/*
* False if the index is totally unsuitable target for all parallel
* processing. For example, the index could be <
@@ -221,8 +250,9 @@ struct ParallelVacuumState
PVIndVacStatus status;
};
-static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum);
+static void parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum);
static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
@@ -242,7 +272,7 @@ static void parallel_vacuum_error_callback(void *arg);
ParallelVacuumState *
parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
int nrequested_workers, int vac_work_mem,
- int elevel, BufferAccessStrategy bstrategy)
+ int elevel, BufferAccessStrategy bstrategy, void *state)
{
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -256,6 +286,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
Size est_shared_len;
int nindexes_mwm = 0;
int parallel_workers = 0;
+ int nworkers_table;
+ int nworkers_index;
int querylen;
/*
@@ -263,15 +295,17 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
* relation
*/
Assert(nrequested_workers >= 0);
- Assert(nindexes > 0);
/*
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
- nrequested_workers,
- will_parallel_vacuum);
+ parallel_vacuum_compute_workers(rel, indrels, nindexes, nrequested_workers,
+ &nworkers_table, &nworkers_index,
+ will_parallel_vacuum);
+
+ parallel_workers = Max(nworkers_table, nworkers_index);
+
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- return NULL */
@@ -327,6 +361,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
else
querylen = 0; /* keep compiler quiet */
+ /* Estimate AM-specific space for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_estimate(rel, pcxt, nworkers_table, state);
+
InitializeParallelDSM(pcxt);
/* Prepare index vacuum stats */
@@ -371,6 +409,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
shared->relid = RelationGetRelid(rel);
shared->elevel = elevel;
shared->queryid = pgstat_get_my_query_id();
+ shared->nworkers_for_table = nworkers_table;
+ shared->nworkers_for_index = nworkers_index;
shared->maintenance_work_mem_worker =
(nindexes_mwm > 0) ?
maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
@@ -419,6 +459,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+ /* Prepare AM-specific DSM for parallel table vacuum */
+ if (nworkers_table > 0)
+ table_parallel_vacuum_initialize(rel, pcxt, nworkers_table, state);
+
/* Success -- return parallel vacuum state */
return pvs;
}
@@ -534,33 +578,47 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
}
/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers.
- * The index is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
+ * Compute the number of parallel worker processes to request for table
+ * vacuum and index vacuum/cleanup.
+ *
+ * For parallel table vacuum, it asks AM-specific routine to compute the
+ * number of parallel worker processes. The result is set to *nworkers_table.
*
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
+ * For parallel index vacuum, The index is eligible for parallel vacuum iff
+ * its size is greater than min_parallel_index_scan_size as invoking workers
+ * for very small indexes can hurt performance. nrequested is the number of
+ * parallel workers that user requested. If nrequested is 0, we compute the
+ * parallel degree based on nindexes, that is the number of indexes that
+ * support parallel vacuum. This function also sets will_parallel_vacuum to
+ * remember indexes that participate in parallel vacuum.
*/
-static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum)
+static void
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table,
+ int *nworkers_index, bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
+
+ *nworkers_table = 0;
+ *nworkers_index = 0;
/*
* We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
+ return;
+
+ /*
+ * Compute the number of workers for parallel table scan. Cap by
+ * max_parallel_maintenance_workers.
+ */
+ parallel_workers_table = Min(table_paralle_vacuum_compute_workers(rel, nrequested),
+ max_parallel_maintenance_workers);
/*
* Compute the number of indexes that can participate in parallel vacuum.
@@ -591,17 +649,18 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
nindexes_parallel--;
/* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Compute the parallel degree for parallel index vacuum */
+ parallel_workers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers_index = Min(parallel_workers_index, max_parallel_maintenance_workers);
+ }
- return parallel_workers;
+ *nworkers_table = parallel_workers_table;
+ *nworkers_index = parallel_workers_index;
}
/*
@@ -671,7 +730,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -980,6 +1039,139 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
return true;
}
+/*
+ * Prepare DSM and shared vacuum delays, and launch parallel workers for parallel
+ * table vacuum. Return the number of parallel workers launched.
+ *
+ * The caller must call parallel_vacuum_table_scan_end() to finish the parallel
+ * table vacuum.
+ */
+int
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->shared->nworkers_for_table == 0)
+ return 0;
+
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ pvs->shared->do_vacuum_table_scan = true;
+
+ if (pvs->num_table_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * The number of workers might vary between table vacuum and index
+ * processing
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, pvs->shared->nworkers_for_table);
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have already
+ * accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+
+ /* Include the worker count for the leader itself */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+ }
+
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for table processing (planned: %d)",
+ "launched %d parallel vacuum workers for table processing (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, pvs->shared->nworkers_for_table)));
+
+ return pvs->pcxt->nworkers_launched;
+}
+
+/*
+ * Wait for all workers for parallel table vacuum scan, and gather statistics.
+ */
+void
+parallel_vacuum_table_scan_end(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->shared->nworkers_for_table == 0)
+ return;
+
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ /* Decrement the worker count for the leader itself */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->shared->do_vacuum_table_scan = false;
+ pvs->num_table_scans++;
+}
+
+/* Return the array of indexes associated to the given table to be vacuumed */
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
+{
+ *nindexes = pvs->nindexes;
+
+ return pvs->indrels;
+}
+
+/* Return the number of workers for parallel table vacuum */
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_table;
+}
+
+/* Return the number of workers for parallel index processing */
+int
+parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_index;
+}
+
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ Assert(VacuumActiveNWorkers);
+
+ /* Increment the active worker before starting the table vacuum */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Do table vacuum scan */
+ table_parallel_vacuum_scan(pvs->heaprel, pvs, pvs->pwcxt);
+
+ /*
+ * We have completed the table vacuum so decrement the active worker
+ * count.
+ */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
/*
* Perform work within a launched parallel process.
*
@@ -999,7 +1191,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
WalUsage *wal_usage;
int nindexes;
char *sharedquery;
- ErrorContextCallback errcallback;
/*
* A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
@@ -1031,7 +1222,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
* matched to the leader's one.
*/
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
if (shared->maintenance_work_mem_worker > 0)
maintenance_work_mem = shared->maintenance_work_mem_worker;
@@ -1062,6 +1252,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.relname = pstrdup(RelationGetRelationName(rel));
pvs.heaprel = rel;
+ pvs.pwcxt = palloc(sizeof(ParallelWorkerContext));
+ pvs.pwcxt->toc = toc;
+ pvs.pwcxt->seg = seg;
+
/* These fields will be filled during index vacuum or cleanup */
pvs.indname = NULL;
pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
@@ -1070,17 +1264,29 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.bstrategy = GetAccessStrategyWithSize(BAS_VACUUM,
shared->ring_nbuffers * (BLCKSZ / 1024));
- /* Setup error traceback support for ereport() */
- errcallback.callback = parallel_vacuum_error_callback;
- errcallback.arg = &pvs;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
/* Prepare to track buffer usage during parallel execution */
InstrStartParallelQuery();
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ ErrorContextCallback errcallback;
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+ }
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
@@ -1090,9 +1296,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
TidStoreDetach(dead_items);
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
vac_close_indexes(nindexes, indrels, RowExclusiveLock);
table_close(rel, ShareUpdateExclusiveLock);
FreeAccessStrategy(pvs.bstrategy);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 36610a1c7e7..5b2b08a844c 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -164,15 +164,6 @@ typedef struct ProcArrayStruct
*
* The typedef is in the header.
*/
-struct GlobalVisState
-{
- /* XIDs >= are considered running by some backend */
- FullTransactionId definitely_needed;
-
- /* XIDs < are not considered to be running by any backend */
- FullTransactionId maybe_needed;
-};
-
/*
* Result of ComputeXidHorizons().
*/
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 96cf82f97b7..427a2f97105 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -21,6 +21,7 @@
#include "access/skey.h"
#include "access/table.h" /* for backward compatibility */
#include "access/tableam.h"
+#include "commands/vacuum.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -401,6 +402,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
+extern int heap_parallel_vacuum_compute_workers(Relation rel, int requested);
+extern void heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_scan_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index adb478a93ca..26e36d90790 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
#include "access/relscan.h"
#include "access/sdir.h"
#include "access/xact.h"
+#include "commands/vacuum.h"
#include "executor/tuptable.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
@@ -654,6 +655,46 @@ typedef struct TableAmRoutine
struct VacuumParams *params,
BufferAccessStrategy bstrategy);
+ /* ------------------------------------------------------------------------
+ * Callbacks for parallel table vacuum.
+ * ------------------------------------------------------------------------
+ */
+
+ /*
+ * Compute the number of parallel workers for parallel table vacuum. The
+ * function must return 0 to disable parallel table vacuum.
+ */
+ int (*parallel_vacuum_compute_workers) (Relation rel, int requested);
+
+ /*
+ * Compute the amount of DSM space AM need in the parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_estimate) (Relation rel,
+ ParallelContext *pcxt,
+ int nworkers,
+ void *state);
+
+ /*
+ * Initialize DSM space for parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_initialize) (Relation rel,
+ ParallelContext *pctx,
+ int nworkers,
+ void *state);
+
+ /*
+ * This callback is called for parallel table vacuum workers.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_scan_worker) (Relation rel,
+ ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
+
/*
* Prepare to analyze block `blockno` of `scan`. The scan has been started
* with table_beginscan_analyze(). See also
@@ -1719,6 +1760,52 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
}
+/* ----------------------------------------------------------------------------
+ * Parallel vacuum related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the number of parallel workers for a parallel vacuum scan of this
+ * relation.
+ */
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
+
+/*
+ * Estimate the size of shared memory needed for a parallel vacuum scan of this
+ * of this relation.
+ */
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Initialize shared memory area for a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Start a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_scan(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_scan_worker(rel, pvs, pwcxt);
+}
+
/*
* Prepare to analyze the next block in the read stream. The scan needs to
* have been started with table_beginscan_analyze(). Note that this routine
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 759f9a87d38..a225f314290 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -360,7 +360,8 @@ extern void VacuumUpdateCosts(void);
extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
int nindexes, int nrequested_workers,
int vac_work_mem, int elevel,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy,
+ void *state);
extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
@@ -372,6 +373,11 @@ extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
int num_index_scans,
bool estimated_count);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs);
+extern Relation *parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 9398a84051c..6ccb19a29ff 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -102,8 +102,20 @@ extern char *ExportSnapshot(Snapshot snapshot);
/*
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
+ *
+ * XXX the struct definition is temporarily moved from procarray.c for
+ * parallel table vacuum development. We need to find a suitable way for
+ * parallel table vacuum workers to share the GlobalVisState.
*/
-typedef struct GlobalVisState GlobalVisState;
+typedef struct GlobalVisState
+{
+ /* XIDs >= are considered running by some backend */
+ FullTransactionId definitely_needed;
+
+ /* XIDs < are not considered to be running by any backend */
+ FullTransactionId maybe_needed;
+} GlobalVisState;
+
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
--
2.43.5
v4-0003-Support-shared-itereation-on-TidStore.patchapplication/octet-stream; name=v4-0003-Support-shared-itereation-on-TidStore.patchDownload
From 8d802ea873622abc43265615b8e6537da70987b7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:34:57 -0700
Subject: [PATCH v4 3/4] Support shared itereation on TidStore.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/common/tidstore.c | 59 ++++++++++++++++++++++++++++
src/include/access/tidstore.h | 3 ++
2 files changed, 62 insertions(+)
diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index a7179759d67..637d26012d2 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -483,6 +483,7 @@ TidStoreBeginIterate(TidStore *ts)
iter = palloc0(sizeof(TidStoreIter));
iter->ts = ts;
+ /* begin iteration on the radix tree */
if (TidStoreIsShared(ts))
iter->tree_iter.shared = shared_ts_begin_iterate(ts->tree.shared);
else
@@ -533,6 +534,56 @@ TidStoreEndIterate(TidStoreIter *iter)
pfree(iter);
}
+/*
+ * Prepare to iterate through a shared TidStore in shared mode. This function
+ * is aimed to start the iteration on the given TidStore with parallel workers.
+ *
+ * The TidStoreIter struct is created in the caller's memory context, and it
+ * will be freed in TidStoreEndIterate.
+ *
+ * The caller is responsible for locking TidStore until the iteration is
+ * finished.
+ */
+TidStoreIter *
+TidStoreBeginIterateShared(TidStore *ts)
+{
+ TidStoreIter *iter;
+
+ if (!TidStoreIsShared(ts))
+ elog(ERROR, "cannot begin shared iteration on local TidStore");
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* begin the shared iteration on radix tree */
+ iter->tree_iter.shared =
+ (shared_ts_iter *) shared_ts_begin_iterate_shared(ts->tree.shared);
+
+ return iter;
+}
+
+/*
+ * Attach to the shared TidStore iterator. 'iter_handle' is the dsa_pointer
+ * returned by TidStoreGetSharedIterHandle(). The returned object is allocated
+ * in backend-local memory using CurrentMemoryContext.
+ */
+TidStoreIter *
+TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle)
+{
+ TidStoreIter *iter;
+
+ Assert(TidStoreIsShared(ts));
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* Attach to the shared iterator */
+ iter->tree_iter.shared = shared_ts_attach_iterate_shared(ts->tree.shared,
+ iter_handle);
+
+ return iter;
+}
+
/*
* Return the memory usage of TidStore.
*/
@@ -564,6 +615,14 @@ TidStoreGetHandle(TidStore *ts)
return (dsa_pointer) shared_ts_get_handle(ts->tree.shared);
}
+dsa_pointer
+TidStoreGetSharedIterHandle(TidStoreIter *iter)
+{
+ Assert(TidStoreIsShared(iter->ts));
+
+ return (dsa_pointer) shared_ts_get_iter_handle(iter->tree_iter.shared);
+}
+
/*
* Given a TidStoreIterResult returned by TidStoreIterateNext(), extract the
* offset numbers. Returns the number of offsets filled in, if <=
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index aeaf563b6a9..f20c9a92e55 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -37,6 +37,9 @@ extern void TidStoreDetach(TidStore *ts);
extern void TidStoreLockExclusive(TidStore *ts);
extern void TidStoreLockShare(TidStore *ts);
extern void TidStoreUnlock(TidStore *ts);
+extern TidStoreIter *TidStoreBeginIterateShared(TidStore *ts);
+extern TidStoreIter *TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle);
+extern dsa_pointer TidStoreGetSharedIterHandle(TidStoreIter *iter);
extern void TidStoreDestroy(TidStore *ts);
extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
int num_offsets);
--
2.43.5
v4-0002-raidxtree.h-support-shared-iteration.patchapplication/octet-stream; name=v4-0002-raidxtree.h-support-shared-iteration.patchDownload
From 57e745ab91adbc41b08ae821f1fd5e5e2024349e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:29:51 -0700
Subject: [PATCH v4 2/4] raidxtree.h: support shared iteration.
This commit supports a shared iteration operation on a radix tree with
multiple processes. The radix tree must be in shared mode to start a
shared itereation. Parallel workers can attach the shared iteration
using the iterator handle given by the leader process. Same as the
normal interation, it's guarnteed that the shared iteration returns
key-values in an ascending order.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
---
src/include/lib/radixtree.h | 221 +++++++++++++++++++++++++++++++-----
1 file changed, 190 insertions(+), 31 deletions(-)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 88bf695e3f3..bd5b8eed1bf 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -177,6 +177,9 @@
#define RT_ATTACH RT_MAKE_NAME(attach)
#define RT_DETACH RT_MAKE_NAME(detach)
#define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
+#define RT_BEGIN_ITERATE_SHARED RT_MAKE_NAME(begin_iterate_shared)
+#define RT_ATTACH_ITERATE_SHARED RT_MAKE_NAME(attach_iterate_shared)
+#define RT_GET_ITER_HANDLE RT_MAKE_NAME(get_iter_handle)
#define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
#define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
#define RT_UNLOCK RT_MAKE_NAME(unlock)
@@ -236,15 +239,19 @@
#define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
#define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
#define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
+#define RT_INITIALIZE_ITER RT_MAKE_NAME(initialize_iter)
#define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
#define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
/* type declarations */
#define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
#define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
+#define RT_ITER_CONTROL RT_MAKE_NAME(iter_control)
#define RT_ITER RT_MAKE_NAME(iter)
#ifdef RT_SHMEM
#define RT_HANDLE RT_MAKE_NAME(handle)
+#define RT_ITER_CONTROL_SHARED RT_MAKE_NAME(iter_control_shared)
+#define RT_ITER_HANDLE RT_MAKE_NAME(iter_handle)
#endif
#define RT_NODE RT_MAKE_NAME(node)
#define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
@@ -270,6 +277,7 @@ typedef struct RT_ITER RT_ITER;
#ifdef RT_SHMEM
typedef dsa_pointer RT_HANDLE;
+typedef dsa_pointer RT_ITER_HANDLE;
#endif
#ifdef RT_SHMEM
@@ -687,6 +695,7 @@ typedef struct RT_RADIX_TREE_CONTROL
RT_HANDLE handle;
uint32 magic;
LWLock lock;
+ int tranche_id;
#endif
RT_PTR_ALLOC root;
@@ -740,11 +749,9 @@ typedef struct RT_NODE_ITER
int idx;
} RT_NODE_ITER;
-/* state for iterating over the whole radix tree */
-struct RT_ITER
+/* Contain the iteration state data */
+typedef struct RT_ITER_CONTROL
{
- RT_RADIX_TREE *tree;
-
/*
* A stack to track iteration for each level. Level 0 is the lowest (or
* leaf) level
@@ -755,8 +762,36 @@ struct RT_ITER
/* The key constructed during iteration */
uint64 key;
-};
+} RT_ITER_CONTROL;
+
+#ifdef RT_SHMEM
+/* Contain the shared iteration state data */
+typedef struct RT_ITER_CONTROL_SHARED
+{
+ /* Actual shared iteration state data */
+ RT_ITER_CONTROL common;
+
+ /* protect the control data */
+ LWLock lock;
+
+ RT_ITER_HANDLE handle;
+ pg_atomic_uint32 refcnt;
+} RT_ITER_CONTROL_SHARED;
+#endif
+
+/* state for iterating over the whole radix tree */
+struct RT_ITER
+{
+ RT_RADIX_TREE *tree;
+ /* pointing to either local memory or DSA */
+ RT_ITER_CONTROL *ctl;
+
+#ifdef RT_SHMEM
+ /* True if the iterator is for shared iteration */
+ bool shared;
+#endif
+};
/* verification (available only in assert-enabled builds) */
static void RT_VERIFY_NODE(RT_NODE * node);
@@ -1848,6 +1883,7 @@ RT_CREATE(MemoryContext ctx)
tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
tree->ctl->handle = dp;
tree->ctl->magic = RT_RADIX_TREE_MAGIC;
+ tree->ctl->tranche_id = tranche_id;
LWLockInitialize(&tree->ctl->lock, tranche_id);
#else
tree->ctl = (RT_RADIX_TREE_CONTROL *) palloc0(sizeof(RT_RADIX_TREE_CONTROL));
@@ -1900,6 +1936,9 @@ RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
dsa_pointer control;
tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE));
+ tree->iter_context = AllocSetContextCreate(CurrentMemoryContext,
+ RT_STR(RT_PREFIX) "_radix_tree iter context",
+ ALLOCSET_SMALL_SIZES);
/* Find the control object in shared memory */
control = handle;
@@ -2072,35 +2111,86 @@ RT_FREE(RT_RADIX_TREE * tree)
/***************** ITERATION *****************/
+/* Common routine to initialize the given iterator */
+static void
+RT_INITIALIZE_ITER(RT_RADIX_TREE * tree, RT_ITER * iter)
+{
+ RT_CHILD_PTR root;
+
+ iter->tree = tree;
+
+ Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
+ root.alloc = iter->tree->ctl->root;
+ RT_PTR_SET_LOCAL(tree, &root);
+
+ iter->ctl->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+
+ /* Set the root to start */
+ iter->ctl->cur_level = iter->ctl->top_level;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = root;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
+}
+
/*
* Create and return the iterator for the given radix tree.
*
- * Taking a lock in shared mode during the iteration is the caller's
- * responsibility.
+ * Taking a lock on a radix tree in shared mode during the iteration is the
+ * caller's responsibility.
*/
RT_SCOPE RT_ITER *
RT_BEGIN_ITERATE(RT_RADIX_TREE * tree)
{
RT_ITER *iter;
- RT_CHILD_PTR root;
iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
sizeof(RT_ITER));
- iter->tree = tree;
+ iter->ctl = (RT_ITER_CONTROL *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER_CONTROL));
- Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
- root.alloc = iter->tree->ctl->root;
- RT_PTR_SET_LOCAL(tree, &root);
+ RT_INITIALIZE_ITER(tree, iter);
- iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+#ifdef RT_SHMEM
+ /* we will non-shared iteration on a shared radix tree */
+ iter->shared = false;
+#endif
- /* Set the root to start */
- iter->cur_level = iter->top_level;
- iter->node_iters[iter->cur_level].node = root;
- iter->node_iters[iter->cur_level].idx = 0;
+ return iter;
+}
+
+#ifdef RT_SHMEM
+/*
+ * Create and return the shared iterator for the given shard radix tree.
+ *
+ * Taking a lock on a radix tree in shared mode during the shared iteration to
+ * prevent concurrent writes is the caller's responsibility.
+ */
+RT_SCOPE RT_ITER *
+RT_BEGIN_ITERATE_SHARED(RT_RADIX_TREE * tree)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl_shared;
+ dsa_pointer dp;
+
+ /* The radix tree must be in shared mode */
+ Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
+
+ dp = dsa_allocate0(tree->dsa, sizeof(RT_ITER_CONTROL_SHARED));
+ ctl_shared = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, dp);
+ ctl_shared->handle = dp;
+ LWLockInitialize(&ctl_shared->lock, tree->ctl->tranche_id);
+ pg_atomic_init_u32(&ctl_shared->refcnt, 1);
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+
+ iter->ctl = (RT_ITER_CONTROL *) ctl_shared;
+ iter->shared = true;
+
+ RT_INITIALIZE_ITER(tree, iter);
return iter;
}
+#endif
/*
* Scan the inner node and return the next child pointer if one exists, otherwise
@@ -2114,12 +2204,18 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
RT_CHILD_PTR node;
RT_PTR_ALLOC *slot = NULL;
+ node_iter = &(iter->ctl->node_iters[level]);
+ node = node_iter->node;
+
#ifdef RT_SHMEM
- Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
-#endif
- node_iter = &(iter->node_iters[level]);
- node = node_iter->node;
+ /*
+ * Since the iterator is shared, the local pointer of the node might be
+ * set by other backends, we need to make sure to use the local pointer.
+ */
+ if (iter->shared)
+ RT_PTR_SET_LOCAL(iter->tree, &node);
+#endif
Assert(node.local != NULL);
@@ -2192,8 +2288,8 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
}
/* Update the key */
- iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
- iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
+ iter->ctl->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
+ iter->ctl->key |= (((uint64) key_chunk) << (level * RT_SPAN));
return slot;
}
@@ -2207,18 +2303,29 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
{
RT_PTR_ALLOC *slot = NULL;
- while (iter->cur_level <= iter->top_level)
+#ifdef RT_SHMEM
+ /* Prevent the shared iterator from being updated concurrently */
+ if (iter->shared)
+ LWLockAcquire(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock, LW_EXCLUSIVE);
+#endif
+
+ while (iter->ctl->cur_level <= iter->ctl->top_level)
{
RT_CHILD_PTR node;
- slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
+ slot = RT_NODE_ITERATE_NEXT(iter, iter->ctl->cur_level);
- if (iter->cur_level == 0 && slot != NULL)
+ if (iter->ctl->cur_level == 0 && slot != NULL)
{
/* Found a value at the leaf node */
- *key_p = iter->key;
+ *key_p = iter->ctl->key;
node.alloc = *slot;
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
if (RT_CHILDPTR_IS_VALUE(*slot))
return (RT_VALUE_TYPE *) slot;
else
@@ -2234,17 +2341,23 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
node.alloc = *slot;
RT_PTR_SET_LOCAL(iter->tree, &node);
- iter->cur_level--;
- iter->node_iters[iter->cur_level].node = node;
- iter->node_iters[iter->cur_level].idx = 0;
+ iter->ctl->cur_level--;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = node;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
}
else
{
/* Not found the child slot, move up the tree */
- iter->cur_level++;
+ iter->ctl->cur_level++;
}
+
}
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
/* We've visited all nodes, so the iteration finished */
return NULL;
}
@@ -2255,9 +2368,45 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
RT_SCOPE void
RT_END_ITERATE(RT_ITER * iter)
{
+#ifdef RT_SHMEM
+ RT_ITER_CONTROL_SHARED *ctl = (RT_ITER_CONTROL_SHARED *) iter->ctl;;
+
+ if (iter->shared &&
+ pg_atomic_sub_fetch_u32(&ctl->refcnt, 1) == 0)
+ dsa_free(iter->tree->dsa, ctl->handle);
+#endif
pfree(iter);
}
+#ifdef RT_SHMEM
+RT_SCOPE RT_ITER_HANDLE
+RT_GET_ITER_HANDLE(RT_ITER * iter)
+{
+ Assert(iter->shared);
+ return ((RT_ITER_CONTROL_SHARED *) iter->ctl)->handle;
+
+}
+
+RT_SCOPE RT_ITER *
+RT_ATTACH_ITERATE_SHARED(RT_RADIX_TREE * tree, RT_ITER_HANDLE handle)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl;
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+ iter->tree = tree;
+ ctl = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, handle);
+ iter->ctl = (RT_ITER_CONTROL *) ctl;
+ iter->shared = true;
+
+ /* For every iterator, increase the refcnt by 1 */
+ pg_atomic_add_fetch_u32(&ctl->refcnt, 1);
+
+ return iter;
+}
+#endif
+
/***************** DELETION *****************/
#ifdef RT_USE_DELETE
@@ -2957,7 +3106,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_PTR_ALLOC
#undef RT_INVALID_PTR_ALLOC
#undef RT_HANDLE
+#undef RT_ITER_HANDLE
+#undef RT_ITER_CONTROL
+#undef RT_ITER_HANDLE
#undef RT_ITER
+#undef RT_SHARED_ITER
#undef RT_NODE
#undef RT_NODE_ITER
#undef RT_NODE_KIND_4
@@ -2994,6 +3147,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_LOCK_SHARE
#undef RT_UNLOCK
#undef RT_GET_HANDLE
+#undef RT_BEGIN_ITERATE_SHARED
+#undef RT_ATTACH_ITERATE_SHARED
+#undef RT_GET_ITER_HANDLE
+#undef RT_ATTACH_ITER
+#undef RT_GET_ITER_HANDLE
#undef RT_FIND
#undef RT_SET
#undef RT_BEGIN_ITERATE
@@ -3050,5 +3208,6 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_SHRINK_NODE_256
#undef RT_NODE_DELETE
#undef RT_NODE_INSERT
+#undef RT_INITIALIZE_ITER
#undef RT_NODE_ITERATE_NEXT
#undef RT_VERIFY_NODE
--
2.43.5
Dear Sawda-san,
I've attached new version patches that fixes failures reported by
cfbot. I hope these changes make cfbot happy.
Thanks for updating the patch and sorry for delaying the reply. I confirmed cfbot
for Linux/Windows said ok.
I'm still learning the feature so I can post only one comment :-(.
I wanted to know whether TidStoreBeginIterateShared() was needed. IIUC, pre-existing API,
TidStoreBeginIterate(), has already accepted the shared TidStore. The only difference
is whether elog(ERROR) exists, but I wonder if it benefits others. Is there another
reason that lazy_vacuum_heap_rel() uses TidStoreBeginIterateShared()?
Another approach is to restrict TidStoreBeginIterate() to support only the local one.
How do you think?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Mon, Nov 11, 2024 at 5:08 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Sawda-san,
I've attached new version patches that fixes failures reported by
cfbot. I hope these changes make cfbot happy.Thanks for updating the patch and sorry for delaying the reply. I confirmed cfbot
for Linux/Windows said ok.
I'm still learning the feature so I can post only one comment :-(.I wanted to know whether TidStoreBeginIterateShared() was needed. IIUC, pre-existing API,
TidStoreBeginIterate(), has already accepted the shared TidStore. The only difference
is whether elog(ERROR) exists, but I wonder if it benefits others. Is there another
reason that lazy_vacuum_heap_rel() uses TidStoreBeginIterateShared()?
TidStoreBeginIterateShared() is designed for multiple parallel workers
to iterate a shared TidStore. During an iteration, parallel workers
share the iteration state and iterate the underlying radix tree while
taking appropriate locks. Therefore, it's available only for a shared
TidStore. This is required to implement the parallel heap vacuum,
where multiple parallel workers do the iteration on the shared
TidStore.
On the other hand, TidStoreBeginIterate() is designed for a single
process to iterate a TidStore. It accepts even a shared TidStore as
you mentioned, but during an iteration there is no inter-process
coordination such as locking. When it comes to parallel vacuum,
supporting TidStoreBeginIterate() on a shared TidStore is necessary to
cover the case where we use only parallel index vacuum but not
parallel heap scan/vacuum. In this case, we need to store dead tuple
TIDs on the shared TidStore during heap scan so parallel workers can
use it during index vacuum. But it's not necessary to use
TidStoreBeginIterateShared() because only one (leader) process does
heap vacuum.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, 30 Oct 2024 at 22:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached new version patches that fixes failures reported by
cfbot. I hope these changes make cfbot happy.
I just started reviewing the patch and found the following comments
while going through the patch:
1) I felt we should add some documentation for this at [1]https://www.postgresql.org/docs/devel/sql-vacuum.html.
2) Can we add some tests in vacuum_parallel with tables having no
indexes and having dead tuples.
3) This should be included in typedefs.list:
3.a)
+/*
+ * Relation statistics collected during heap scanning and need to be
shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelScanStats
+{
+ BlockNumber scanned_pages; /* # pages examined (not
skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation
truncation */
+ BlockNumber frozen_pages; /* # pages with newly frozen tuples */
3.b) Similarly this too:
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
3.c) Similarly this too:
+/* Per-worker scan state for parallel heap vacuum scan */
+typedef struct PHVScanWorkerState
+{
+ bool initialized;
3.d) Similarly this too:
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
4) Since we are initializing almost all the members of structure,
should we use palloc0 in this case:
+ scan_stats = palloc(sizeof(LVRelScanStats));
+ scan_stats->scanned_pages = 0;
+ scan_stats->removed_pages = 0;
+ scan_stats->frozen_pages = 0;
+ scan_stats->lpdead_item_pages = 0;
+ scan_stats->missed_dead_pages = 0;
+ scan_stats->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ scan_stats->tuples_deleted = 0;
+ scan_stats->tuples_frozen = 0;
+ scan_stats->lpdead_items = 0;
+ scan_stats->live_tuples = 0;
+ scan_stats->recently_dead_tuples = 0;
+ scan_stats->missed_dead_tuples = 0;
5) typo paralle should be parallel
+/*
+ * Return the number of parallel workers for a parallel vacuum scan of this
+ * relation.
+ */
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
[1]: https://www.postgresql.org/docs/devel/sql-vacuum.html
Regards,
Vignesh
Dear Sawada-san,
TidStoreBeginIterateShared() is designed for multiple parallel workers
to iterate a shared TidStore. During an iteration, parallel workers
share the iteration state and iterate the underlying radix tree while
taking appropriate locks. Therefore, it's available only for a shared
TidStore. This is required to implement the parallel heap vacuum,
where multiple parallel workers do the iteration on the shared
TidStore.On the other hand, TidStoreBeginIterate() is designed for a single
process to iterate a TidStore. It accepts even a shared TidStore as
you mentioned, but during an iteration there is no inter-process
coordination such as locking. When it comes to parallel vacuum,
supporting TidStoreBeginIterate() on a shared TidStore is necessary to
cover the case where we use only parallel index vacuum but not
parallel heap scan/vacuum. In this case, we need to store dead tuple
TIDs on the shared TidStore during heap scan so parallel workers can
use it during index vacuum. But it's not necessary to use
TidStoreBeginIterateShared() because only one (leader) process does
heap vacuum.
Okay, thanks for the description. I felt it is OK to keep.
I read 0001 again and here are comments.
01. vacuumlazy.c
```
+#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003
```
I checked other DMS keys used for parallel work, and they seems to have name
like PARALEL_KEY_XXX. Can we follow it?
02. LVRelState
```
+ BlockNumber next_fsm_block_to_vacuum;
```
Only the attribute does not have comments Can we add like:
"Next freespace map page to be checked"?
03. parallel_heap_vacuum_gather_scan_stats
```
+ vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
```
Note that `scan_stats->vacuumed_pages` does not exist in 0001, it is defined
in 0004. Can you move it?
04. heap_parallel_vacuum_estimate
```
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ /* space for PHVShared */
+ shm_toc_estimate_chunk(&pcxt->estimator, shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ pscan_len = table_block_parallelscan_estimate(rel);
+ shm_toc_estimate_chunk(&pcxt->estimator, pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+ shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
```
I feel pscan_len and pscanwork_len are calclated in heap_parallel_estimate_shared_memory_size().
Can we remove table_block_parallelscan_estimate() and mul_size() from here?
05. Idea
Can you update documentations?
06. Idea
AFAICS pg_stat_progress_vacuum does not contain information related with the
parallel execution. How do you think adding an attribute which shows a list of pids?
Not sure it is helpful for users but it can show the parallelism.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Wed, Nov 13, 2024 at 3:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
Dear Sawada-san,
TidStoreBeginIterateShared() is designed for multiple parallel workers
to iterate a shared TidStore. During an iteration, parallel workers
share the iteration state and iterate the underlying radix tree while
taking appropriate locks. Therefore, it's available only for a shared
TidStore. This is required to implement the parallel heap vacuum,
where multiple parallel workers do the iteration on the shared
TidStore.On the other hand, TidStoreBeginIterate() is designed for a single
process to iterate a TidStore. It accepts even a shared TidStore as
you mentioned, but during an iteration there is no inter-process
coordination such as locking. When it comes to parallel vacuum,
supporting TidStoreBeginIterate() on a shared TidStore is necessary to
cover the case where we use only parallel index vacuum but not
parallel heap scan/vacuum. In this case, we need to store dead tuple
TIDs on the shared TidStore during heap scan so parallel workers can
use it during index vacuum. But it's not necessary to use
TidStoreBeginIterateShared() because only one (leader) process does
heap vacuum.Okay, thanks for the description. I felt it is OK to keep.
I read 0001 again and here are comments.
Thank you for the review comments!
01. vacuumlazy.c ``` +#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001 +#define LV_PARALLEL_SCAN_DESC 0xFFFF0002 +#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003 ```I checked other DMS keys used for parallel work, and they seems to have name
like PARALEL_KEY_XXX. Can we follow it?
Yes. How about LV_PARALLEL_KEY_XXX?
02. LVRelState
```
+ BlockNumber next_fsm_block_to_vacuum;
```Only the attribute does not have comments Can we add like:
"Next freespace map page to be checked"?
Agreed. I'll add a comment "next block to check for FSM vacuum*.
03. parallel_heap_vacuum_gather_scan_stats
```
+ vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
```Note that `scan_stats->vacuumed_pages` does not exist in 0001, it is defined
in 0004. Can you move it?
Will remove.
04. heap_parallel_vacuum_estimate ``` + + heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len, + &shared_len, &pscanwork_len); + + /* space for PHVShared */ + shm_toc_estimate_chunk(&pcxt->estimator, shared_len); + shm_toc_estimate_keys(&pcxt->estimator, 1); + + /* space for ParallelBlockTableScanDesc */ + pscan_len = table_block_parallelscan_estimate(rel); + shm_toc_estimate_chunk(&pcxt->estimator, pscan_len); + shm_toc_estimate_keys(&pcxt->estimator, 1); + + /* space for per-worker scan state, PHVScanWorkerState */ + pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers); + shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len); + shm_toc_estimate_keys(&pcxt->estimator, 1); ```I feel pscan_len and pscanwork_len are calclated in heap_parallel_estimate_shared_memory_size().
Can we remove table_block_parallelscan_estimate() and mul_size() from here?
Yes, it's an oversight. Will remove.
05. Idea
Can you update documentations?
Will update the doc as well.
06. Idea
AFAICS pg_stat_progress_vacuum does not contain information related with the
parallel execution. How do you think adding an attribute which shows a list of pids?
Not sure it is helpful for users but it can show the parallelism.
I think it's possible to show the parallelism even today (for parallel
index vacuuming):
=# select leader.pid, leader.query, array_agg(worker.pid) from
pg_stat_activity as leader, pg_stat_activity as worker,
pg_stat_progress_vacuum as v where leader.pid = worker.leader_pid and
leader.pid = v.pid group by 1, 2;
pid | query | array_agg
---------+---------------------+-------------------
2952103 | vacuum (verbose) t; | {2952257,2952258}
(1 row)
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Tue, Nov 12, 2024 at 3:21 AM vignesh C <vignesh21@gmail.com> wrote:
On Wed, 30 Oct 2024 at 22:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached new version patches that fixes failures reported by
cfbot. I hope these changes make cfbot happy.I just started reviewing the patch and found the following comments
while going through the patch:
1) I felt we should add some documentation for this at [1].2) Can we add some tests in vacuum_parallel with tables having no
indexes and having dead tuples.3) This should be included in typedefs.list: 3.a) +/* + * Relation statistics collected during heap scanning and need to be shared among + * parallel vacuum workers. + */ +typedef struct LVRelScanStats +{ + BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */ + BlockNumber removed_pages; /* # pages removed by relation truncation */ + BlockNumber frozen_pages; /* # pages with newly frozen tuples */3.b) Similarly this too: +/* + * Struct for information that need to be shared among parallel vacuum workers + */ +typedef struct PHVShared +{ + bool aggressive; + bool skipwithvm; +3.c) Similarly this too: +/* Per-worker scan state for parallel heap vacuum scan */ +typedef struct PHVScanWorkerState +{ + bool initialized;3.d) Similarly this too: +/* Struct for parallel heap vacuum */ +typedef struct PHVState +{ + /* Parallel scan description shared among parallel workers */4) Since we are initializing almost all the members of structure, should we use palloc0 in this case: + scan_stats = palloc(sizeof(LVRelScanStats)); + scan_stats->scanned_pages = 0; + scan_stats->removed_pages = 0; + scan_stats->frozen_pages = 0; + scan_stats->lpdead_item_pages = 0; + scan_stats->missed_dead_pages = 0; + scan_stats->nonempty_pages = 0; + + /* Initialize remaining counters (be tidy) */ + scan_stats->tuples_deleted = 0; + scan_stats->tuples_frozen = 0; + scan_stats->lpdead_items = 0; + scan_stats->live_tuples = 0; + scan_stats->recently_dead_tuples = 0; + scan_stats->missed_dead_tuples = 0;5) typo paralle should be parallel +/* + * Return the number of parallel workers for a parallel vacuum scan of this + * relation. + */ +static inline int +table_paralle_vacuum_compute_workers(Relation rel, int requested) +{ + return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested); +}
Thank you for the comments! I'll address these comments in the next
version patch.
BTW while updating the patch, I found that we might want to launch
different numbers of workers for scanning heap and vacuuming heap. The
number of parallel workers is determined based on the number of blocks
in the table. However, even if this number is high, it could happen
that we want to launch fewer workers to vacuum heap pages when there
are not many pages having garbage. And the number of workers for
vacuuming heap could vary on each vacuum pass. I'm considering
implementing it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Dear Swada-san,
BTW while updating the patch, I found that we might want to launch
different numbers of workers for scanning heap and vacuuming heap. The
number of parallel workers is determined based on the number of blocks
in the table. However, even if this number is high, it could happen
that we want to launch fewer workers to vacuum heap pages when there
are not many pages having garbage. And the number of workers for
vacuuming heap could vary on each vacuum pass. I'm considering
implementing it.
Just to clarify - this idea looks good to me. I imagine you will add new APIs for
tableam like parallel_vacuum_compute_workers_for_scaning and parallel_vacuum_compute_workers_for_vacuuming.
If other tableam developers want to use the same number of workers as scanning,
they can pass the same function to the pointer. Is it right?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Hi Sawada-San,
FYI, the patch 0001 fails to build stand-alone
vacuumlazy.c: In function ‘parallel_heap_vacuum_gather_scan_stats’:
vacuumlazy.c:3739:21: error: ‘LVRelScanStats’ has no member named
‘vacuumed_pages’
vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
^
vacuumlazy.c:3739:43: error: ‘LVRelScanStats’ has no member named
‘vacuumed_pages’
vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages;
^
make[4]: *** [vacuumlazy.o] Error 1
It appears to be using a struct field which is not even introduced
until the patch 0004 of the patch set.
======
Kind Regards,
Peter Smith.
Fujitsu Australia.
Hi Sawada-San,
I started to look at patch v4-0001 in this thread.
It is quite a big patch so this is a WIP, and these below are just the
comments I have so far.
======
src/backend/access/heap/vacuumlazy.c
1.1.
+/*
+ * Relation statistics collected during heap scanning and need to be
shared among
+ * parallel vacuum workers.
+ */
+typedef struct LVRelScanStats
The comment wording is not quite right.
/Relation statistics collected during heap scanning/Relation
statistics that are collected during heap scanning/
~~~
1.2
+/*
+ * Struct for information that need to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
The comment wording is not quite right.
/that need to be shared/that needs to be shared/
~~~
1.3.
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
If 'pscandesc' is described as 'shared among parallel workers', should
that field be within 'PHVShared' instead?
~~~
1.4.
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->frozen_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
- /* dead_items_alloc allocates vacrel->dead_items later on */
+ scan_stats = palloc(sizeof(LVRelScanStats));
+ scan_stats->scanned_pages = 0;
+ scan_stats->removed_pages = 0;
+ scan_stats->frozen_pages = 0;
+ scan_stats->lpdead_item_pages = 0;
+ scan_stats->missed_dead_pages = 0;
+ scan_stats->nonempty_pages = 0;
+
+ /* Initialize remaining counters (be tidy) */
+ scan_stats->tuples_deleted = 0;
+ scan_stats->tuples_frozen = 0;
+ scan_stats->lpdead_items = 0;
+ scan_stats->live_tuples = 0;
+ scan_stats->recently_dead_tuples = 0;
+ scan_stats->missed_dead_tuples = 0;
+
+ vacrel->scan_stats = scan_stats;
1.4a.
Or, maybe just palloc0 this and provide a comment to say all counters
have been zapped to 0.
~
1.4b.
Maybe you don't need that 'scan_stats' variable; just assign the
palloc0 directly to the field instead.
~~~
1.5.
- vacrel->missed_dead_tuples = 0;
+ /* dead_items_alloc allocates vacrel->dead_items later on */
The patch seems to have moved this "dead_items_alloc" comment to now
be below the "Allocate/initialize output statistics state" stuff (??).
======
src/backend/commands/vacuumparallel.c
parallel_vacuum_init:
1.6.
int parallel_workers = 0;
+ int nworkers_table;
+ int nworkers_index;
The local vars and function params are named like this (here and in
other functions). But the struct field names say 'nworkers_for_XXX'
e.g.
shared->nworkers_for_table = nworkers_table;
shared->nworkers_for_index = nworkers_index;
It may be better to use these 'nworkers_for_table' and
'nworkers_for_index' names consistently everywhere.
~~~
parallel_vacuum_compute_workers:
1.7.
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
+
+ *nworkers_table = 0;
+ *nworkers_index = 0;
The local variables 'parallel_workers_table' and
'parallel_workers_index; are hardly needed because AFAICT those
results can be directly assigned to *nworkers_table and
*nworkers_index.
~~~
parallel_vacuum_process_all_indexes:
1.8.
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
I don't know if it is feasible or even makes sense to change, but
somehow it seemed strange that the 'num_index_scans' field is not
co-located with the 'num_table_scans' in the ParallelVacuumState. If
this is doable, then lots of functions also would no longer need to
pass 'num_index_scans' since they are already passing 'pvs'.
~~~
parallel_vacuum_table_scan_begin:
1.9.
+ (errmsg(ngettext("launched %d parallel vacuum worker for table
processing (planned: %d)",
+ "launched %d parallel vacuum workers for table processing (planned: %d)",
+ pvs->pcxt->nworkers_launched),
Isn't this the same as errmsg_plural?
~~~
1.10.
+/* Return the array of indexes associated to the given table to be vacuumed */
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
Even though the function comment can fit on one line it is nicer to
use a block-style comment with a period, like below. It then will be
consistent with other function comments (e.g.
parallel_vacuum_table_scan_end, parallel_vacuum_process_table, etc).
There are multiple places that this review comment can apply to.
(also typo /associated to/associated with/)
SUGGESTION
/*
* Return the array of indexes associated with the given table to be vacuumed.
*/
~~~
parallel_vacuum_get_nworkers_table:
parallel_vacuum_get_nworkers_index:
1.11.
+/* Return the number of workers for parallel table vacuum */
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_table;
+}
+
+/* Return the number of workers for parallel index processing */
+int
+parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_index;
+}
+
Are these functions needed? AFAICT, they are called only from macros
where it seems just as easy to reference the pvs fields directly.
~~~
parallel_vacuum_process_table:
1.12.
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ Assert(VacuumActiveNWorkers);
Maybe here also we should Assert(pvs.shared->do_vacuum_table_scan);
~~~
1.13.
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ ErrorContextCallback errcallback;
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+ }
There are still some functions following this code (like
'shm_toc_lookup') that could potentially raise ERRORs. But, now the
error_context_stack is getting assigned/reset earlier than previously
was the case. Is that going to be a potential problem?
======
src/include/access/tableam.h
1.14.
+ /*
+ * Compute the amount of DSM space AM need in the parallel table vacuum.
+ *
Maybe reword this comment to be more like table_parallelscan_estimate.
SUGGESTION
Estimate the size of shared memory that the parallel table vacuum needs for AM.
~~~
1.15.
+/*
+ * Estimate the size of shared memory needed for a parallel vacuum scan of this
+ * of this relation.
+ */
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Initialize shared memory area for a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Start a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_scan(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_scan_worker(rel, pvs, pwcxt);
+}
+
All of the "Callbacks for parallel table vacuum." had comments saying
"Not called if parallel table vacuum is disabled.". So, IIUC that
means all of these table_parallel_vacuum_XXX functions (other than the
compute_workers one) could have Assert(nworkers > 0); just to
double-check that is true.
~~~
table_paralle_vacuum_compute_workers:
1.16.
+static inline int
+table_paralle_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
Typo in function name. /paralle/parallel/
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi Sawada-San.
FWIW, here are the remainder of my review comments for the patch v4-0001
======
src/backend/access/heap/vacuumlazy.c
lazy_scan_heap:
2.1.
+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap with parallel workers.
+ */
/with/using/
~~~
2.2.
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ do_lazy_scan_heap(vacrel);
The do_lazy_scan_heap() returns a boolean and according to that
function comment it should always be true if it is not using the
parallel heap scan. So should we get the function return value here
and Assert that it is true?
~~~
2.3.
Start uppercase even for all the single line comments for consistency
with exasiting code.
e.g.
+ /* report that everything is now scanned */
e.g
+ /* now we can compute the new value for pg_class.reltuples */
e.g.
+ /* report all blocks vacuumed */
~~~
heap_vac_scan_next_block_parallel:
2.4.
+/*
+ * A parallel scan variant of heap_vac_scan_next_block.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
The function comment should explain the return value.
~~~
2.5.
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+
+ if (vacrel->aggressive)
+ break;
Unnecessary whitespace.
~~~
dead_items_alloc:
2.6.
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes. Note
+ * that only one worker can be used for an index, we invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
Is this information misplaced? Why describe here "only one worker" and
"at least two indexes on a table" I don't see anything here checking
those conditions.
~~~
heap_parallel_vacuum_compute_workers:
2.7.
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. This probably needs to be a good deal more
+ * sophisticated, but we need something here for now. Note that the
+ * upper limit of the min_parallel_table_scan_size GUC is chosen to
+ * prevent overflow here.
+ */
The "This probably needs to..." part maybe should have an "XXX" marker
in the comment which AFAIK is used to highlight current decisions and
potential for future changes.
~~~
heap_parallel_vacuum_initialize:
2.8.
There is inconsistent capitalization of the single-line comments in
this function. The same occurs in many functions in this file. but it
is just a bit more obvious in this one. Please see all the others too.
~~~
parallel_heap_complete_unfinised_scan:
2.9.
+static void
+parallel_heap_complete_unfinised_scan(LVRelState *vacrel)
TYPO in function name /unfinised/unfinished/
~~~
2.10.
+ if (!wstate->maybe_have_blocks)
+
+ continue;
Unnecessary blank line.
~~~
2.11.
+
+ /* Attache the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);
/Attache/Attach/
~~~
2.12.
+ /*
+ * We don't need to gather the scan statistics here because statistics
+ * have already been accumulated the leaders statistics directly.
+ */
"have already been accumulated the leaders" -- missing word there somewhere?
~~~
do_parallel_lazy_scan_heap:
2.13.
+ /*
+ * If the heap scan paused in the middle of the table due to full of
+ * dead_items TIDs, perform a round of index and heap vacuuming.
+ */
+ if (!scan_done)
+ {
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_blkno should have already been updated when gathering
+ * statistics
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* re-launcher workers */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ continue;
+ }
+
+ /* We reach the end of the table */
+ break;
Instead of:
if (!scan_done)
{
<other code ...>
continue;
}
break;
Won't it be better to refactor like:
SUGGESTION
if (scan_done)
break;
<other code...>
~~~
2.14.
+ /*
+ * The parallel heap vacuum finished, but it's possible that some workers
+ * have allocated blocks but not processed yet. This can happen for
+ * example when workers exit because of full of dead_items TIDs and the
+ * leader process could launch fewer workers in the next cycle.
+ */
There seem to be some missing words:
e.g. /not processed yet./not processed them yet./
e.g. /because of full of dead_items/because they are full of dead_items/
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Hi,
Thanks for working on this. I took a quick look at this today, to do
some basic review. I plan to do a bunch of testing, but that's mostly to
get a better idea of what kind of improvements to expect - the initial
results look quite nice and sensible.
A couple basic comments:
1) I really like the idea of introducing "compute_workers" callback to
the heap AM interface. I faced a similar issue with calculating workers
for index builds, because right now plan_create_index_workers is doing
that the logic works for btree, but really not brin etc. It didn't occur
to me we might make this part of the index AM ...
2) I find it a bit weird vacuumlazy.c needs to include optimizer/paths.h
because it really has nothing to do with planning / paths. I realize it
needs the min_parallel_table_scan_size, but it doesn't seem right. I
guess it's a sign this bit of code (calculating parallel workers based
on log of relation size) should in some "shared" location.
3) The difference in naming ParallelVacuumState vs. PHVState is a bit
weird. I suggest ParallelIndexVacuumState and ParallelHeapVacuumState to
make it consistent and clear.
4) I think it would be good to have some sort of README explaining how
the parallel heap vacuum works, i.e. how it's driven by FSM. Took me a
while to realize how the workers coordinate which blocks to scan.
5) Wouldn't it be better to introduce the scan_stats (grouping some of
the fields in a separate patch)? Seems entirely independent from the
parallel part, so doing it separately would make it easier to review.
Also, maybe reference the fields through scan_stats->x, instead of
through vacrel->scan_stats->x, when there's the pointer.
6) Is it a good idea to move NewRelfrozenXID/... to the scan_stats?
AFAIK it's not a statistic, it's actually a parameter affecting the
decisions, right?
7) I find it a bit strange that heap_vac_scan_next_block() needs to
check if it's a parallel scan, and redirect to the parallel callback. I
mean, shouldn't the caller know which callback to invoke? Why should the
serial callback care about this?
8) It's not clear to me why do_lazy_scan_heap() needs to "advertise" the
current block. Can you explain?
9) I'm a bit confused why the code checks IsParallelWorker() in so many
places. Doesn't that mean the leader can't participate in the vacuum
like a regular worker?
10) I'm not quite sure I understand the comments at the end of
do_lazy_scan_heap - it says "do a cycle of vacuuming" but I guess that
means "index vacuuming", right? And then it says "pause without invoking
index and heap vacuuming" but isn't the whole point of this block to do
that cleanup so that the TidStore can be discarded? Maybe I just don't
understand how the work is divided between the leader and workers ...
11) Why does GlobalVisState need to move to snapmgr.h? If I undo this
the patch still builds fine for me.
thanks
--
Tomas Vondra
Dear Tomas,
1) I really like the idea of introducing "compute_workers" callback to
the heap AM interface. I faced a similar issue with calculating workers
for index builds, because right now plan_create_index_workers is doing
that the logic works for btree, but really not brin etc. It didn't occur
to me we might make this part of the index AM ...
+1, so let's keep the proposed style. Or, can we even propose the idea
to table/index access method API?
I've considered bit and the point seemed that which arguments should be required.
4) I think it would be good to have some sort of README explaining how
the parallel heap vacuum works, i.e. how it's driven by FSM. Took me a
while to realize how the workers coordinate which blocks to scan.
I love the idea, it is quite helpful for reviewers like me.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
On Mon, Dec 9, 2024 at 2:11 PM Tomas Vondra <tomas@vondra.me> wrote:
Hi,
Thanks for working on this. I took a quick look at this today, to do
some basic review. I plan to do a bunch of testing, but that's mostly to
get a better idea of what kind of improvements to expect - the initial
results look quite nice and sensible.
Thank you for reviewing the patch!
A couple basic comments:
1) I really like the idea of introducing "compute_workers" callback to
the heap AM interface. I faced a similar issue with calculating workers
for index builds, because right now plan_create_index_workers is doing
that the logic works for btree, but really not brin etc. It didn't occur
to me we might make this part of the index AM ...
Thanks.
2) I find it a bit weird vacuumlazy.c needs to include optimizer/paths.h
because it really has nothing to do with planning / paths. I realize it
needs the min_parallel_table_scan_size, but it doesn't seem right. I
guess it's a sign this bit of code (calculating parallel workers based
on log of relation size) should in some "shared" location.
True. The same is actually true also for vacuumparallel.c. It includes
optimizer/paths.h to use min_parallel_index_scan_size.
3) The difference in naming ParallelVacuumState vs. PHVState is a bit
weird. I suggest ParallelIndexVacuumState and ParallelHeapVacuumState to
make it consistent and clear.
With the patch, since ParallelVacuumState is no longer dedicated for
parallel index vacuuming we cannot rename them in this way. Both
parallel table scanning/vacuuming and parallel index vacuuming can use
the same ParallelVacuumState instance. The heap-specific necessary
data for parallel heap scanning and vacuuming are stored in PHVState.
4) I think it would be good to have some sort of README explaining how
the parallel heap vacuum works, i.e. how it's driven by FSM. Took me a
while to realize how the workers coordinate which blocks to scan.
+1. I will add README in the next version patch.
5) Wouldn't it be better to introduce the scan_stats (grouping some of
the fields in a separate patch)? Seems entirely independent from the
parallel part, so doing it separately would make it easier to review.
Also, maybe reference the fields through scan_stats->x, instead of
through vacrel->scan_stats->x, when there's the pointer.
Agreed.
6) Is it a good idea to move NewRelfrozenXID/... to the scan_stats?
AFAIK it's not a statistic, it's actually a parameter affecting the
decisions, right?
Right. It would be better to move them to a separate struct or somewhere.
7) I find it a bit strange that heap_vac_scan_next_block() needs to
check if it's a parallel scan, and redirect to the parallel callback. I
mean, shouldn't the caller know which callback to invoke? Why should the
serial callback care about this?
do_lazy_scan_heap(), the sole caller of heap_vac_scan_next_block(), is
called in serial vacuum and parallel vacuum cases. I wanted to make
heap_vac_scan_next_block() workable in both cases. I think it also
makes sense to have do_lazy_scan_heap() calls either function
depending on parallel scan being enabled.
8) It's not clear to me why do_lazy_scan_heap() needs to "advertise" the
current block. Can you explain?
The workers' current block numbers are used to calculate the minimum
block number where we've scanned so far. In serial scan case, we
vacuum FSM of the particular block range for every
VACUUM_FSM_EVERY_PAGES pages . On the other hand, in parallel scan
case, it doesn't make sense to vacuum FSM in that way because we might
not have processed some blocks in the block range. So the idea is to
calculate the minimum block number where we've scanned so far and
vacuum FSM of the range of consecutive already-scanned blocks.
9) I'm a bit confused why the code checks IsParallelWorker() in so many
places. Doesn't that mean the leader can't participate in the vacuum
like a regular worker?
I used '!isParallelWorker()' for some jobs that should be done only by
the leader process. For example, checking failsafe mode, vacuuming FSM
etc.
10) I'm not quite sure I understand the comments at the end of
do_lazy_scan_heap - it says "do a cycle of vacuuming" but I guess that
means "index vacuuming", right?
It means both index vacuuming and heap vacuuming.
And then it says "pause without invoking
index and heap vacuuming" but isn't the whole point of this block to do
that cleanup so that the TidStore can be discarded? Maybe I just don't
understand how the work is divided between the leader and workers ...
The comment needs to be updated. But what the patch does is that when
the memory usage of the shared TidStore reaches the limit, worker
processes exit after updating the shared statistics, and then the
leader invokes (parallel) index vacuuming and parallel heap vacuuming.
Since the different number workers could be used for parallel heap
scan, parallel index vacuuming, and parallel heap vacuuming, the
leader process waits for all workers to finish at end of each phase.
11) Why does GlobalVisState need to move to snapmgr.h? If I undo this
the patch still builds fine for me.
Oh, I might have missed something. I'll check if it's really necessary.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 12/9/24 19:47, Tomas Vondra wrote:
Hi,
Thanks for working on this. I took a quick look at this today, to do
some basic review. I plan to do a bunch of testing, but that's mostly to
get a better idea of what kind of improvements to expect - the initial
results look quite nice and sensible.
I worked on the benchmarks/testing, mostly to get an idea of how
effective this vacuum parallelism is. But I noticed something a bit
weird ...
Attached is a bash script I used for the testing - it measures vacuum
with varying numbers of indexes, number of deleted tuples, WAL logging,
etc. And it does that both with master and patched builds, with
different number of vacuum workers.
It does expect databases "test-small-logged" and "test-small-unlogged",
initialized like this:
create [unlogged] table test_vacuum (a bigint)
with (autovacuum_enabled=off);
insert into test_vacuum select i from generate_series(1,100000000) s(i);
create index idx_0 on test_vacuum (a);
create index idx_1 on test_vacuum (a);
create index idx_2 on test_vacuum (a);
create index idx_3 on test_vacuum (a);
create index idx_4 on test_vacuum (a);
That's ~2GB table, with a bunch of indexes. Not massive, not tiny.
I wanted to run this on larger datasets too, but for now I have the
small dataset.
One of the things the tests change is the fraction of pages with deleted
rows. The DELETE has
... WHERE mod(id,M) = 0
where "id" is a bigint column with sequential values. There are ~230
rows per page, so the M determines what fraction of pages gets a DELETE.
With M=100, each page gets ~2 deleted rows, with M=500 we get a page
with a delete, then a clean page, etc. Similar for 1000 and 5000.
Attached are results.csv with raw data, and a PDF showing the difference
between master and patched build with varying number of workers. The
columns on the right show timing relative to master (with no parallel
workers). Green means "faster" and "red" would be "slower" (but there
are no such cases). 50% means "half the time" i.e. "twice as fast".
And for M=100 and M=500 the results look quite sensible. But for higher
values of M (i.e. smaller fraction of the table DELETED) things get a
bit strange, especially for the runs with 0 indexes.
Consider for example these runs from i5 machine with M=5000:
master patched
indexes 0 0 1 2 3 4 6 8
-------------------------------------------------------------------
0 2.58 2.75 0.17 0.19 0.16 0.24 0.20 0.19
On master it takes 2.58s, and on patched build (0 workers) it's ~2.75s,
so about the same (single run, so the difference is just noise).
But then with 1 worker it drops to 0.17s. That's ~15x faster, but we
only added one worker, so the best we could expect is 2x. Either there's
a bug that skips some work, or the master code is horribly inefficient.
The reason for the difference is this - on master, the vacuum verbose
log looks like this:
---
INFO: vacuuming "test.public.test_vacuum"
INFO: finished vacuuming "test.public.test_vacuum": index scans: 0
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
tuples: 10000 removed, 49590000 remain, 0 are dead but not yet removable
removable cutoff: 20088, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 642.429 MB/s, avg write rate: 30.650 MB/s
buffer usage: 231616 hits, 210965 reads, 10065 dirtied
WAL usage: 30058 records, 10065 full page images, 72101687 bytes
system usage: CPU: user: 2.29 s, system: 0.27 s, elapsed: 2.56 s
---
and on patched with no parallelism it's almost the same:
---
INFO: vacuuming "test.public.test_vacuum"
INFO: finished vacuuming "test.public.test_vacuum": index scans: 0
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
tuples: 10000 removed, 49570000 remain, 0 are dead but not yet removable
removable cutoff: 20094, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 602.557 MB/s, avg write rate: 28.748 MB/s
buffer usage: 231620 hits, 210961 reads, 10065 dirtied
WAL usage: 30058 records, 10065 full page images, 71578455 bytes
system usage: CPU: user: 2.42 s, system: 0.30 s, elapsed: 2.73 s
---
But then for vacuum (parallel 1) it changes like this:
---
INFO: vacuuming "test.public.test_vacuum"
INFO: launched 1 parallel vacuum worker for table processing (planned: 1)
INFO: finished vacuuming "test.public.test_vacuum": index scans: 0
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
tuples: 10000 removed, 49137961 remain, 0 are dead but not yet removable
removable cutoff: 20107, which was 0 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
buffer usage: 25175 hits, 0 reads, 10065 dirtied
WAL usage: 30058 records, 10065 full page images, 70547639 bytes
system usage: CPU: user: 0.07 s, system: 0.02 s, elapsed: 0.14 s
---
The main difference is here:
master / no parallel workers:
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
1 parallel worker:
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
essentially just those with deleted tuples, which is ~1/20 of pages.
That's close to the 15x speedup.
This effect is clearest without indexes, but it does affect even runs
with indexes - having to scan the indexes makes it much less pronounced,
though. However, these indexes are pretty massive (about the same size
as the table) - multiple times larger than the table. Chances are it'd
be clearer on realistic data sets.
So the question is - is this correct? And if yes, why doesn't the
regular (serial) vacuum do that?
There's some more strange things, though. For example, how come the avg
read rate is 0.000 MB/s?
avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
pages are in shared buffers (thanks to the DELETE earlier in that session).
regards
--
Tomas Vondra
On 12/13/24 00:04, Tomas Vondra wrote:
...
Attached are results.csv with raw data, and a PDF showing the difference
between master and patched build with varying number of workers. The
columns on the right show timing relative to master (with no parallel
workers). Green means "faster" and "red" would be "slower" (but there
are no such cases). 50% means "half the time" i.e. "twice as fast"....
Apologies, forgot the PDF with results, so here it is.
regards
--
Tomas Vondra
Attachments:
On 12/13/24 00:04, Tomas Vondra wrote:
...
The main difference is here:
master / no parallel workers:
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
1 parallel worker:
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
essentially just those with deleted tuples, which is ~1/20 of pages.
That's close to the 15x speedup.This effect is clearest without indexes, but it does affect even runs
with indexes - having to scan the indexes makes it much less pronounced,
though. However, these indexes are pretty massive (about the same size
as the table) - multiple times larger than the table. Chances are it'd
be clearer on realistic data sets.So the question is - is this correct? And if yes, why doesn't the
regular (serial) vacuum do that?There's some more strange things, though. For example, how come the avg
read rate is 0.000 MB/s?avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
pages are in shared buffers (thanks to the DELETE earlier in that session).
OK, after looking into this a bit more I think the reason is rather
simple - SKIP_PAGES_THRESHOLD.
With serial runs, we end up scanning all pages, because even with an
update every 5000 tuples, that's still only ~25 pages apart, well within
the 32-page window. So we end up skipping no pages, scan and vacuum all
everything.
But parallel runs have this skipping logic disabled, or rather the logic
that switches to sequential scans if the gap is less than 32 pages.
IMHO this raises two questions:
1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
sequential scans is the pages are close enough. Maybe there is a reason
for this difference? Workers can reduce the difference between random
and sequential I/0, similarly to prefetching. But that just means the
workers should use a lower threshold, e.g. as
SKIP_PAGES_THRESHOLD / nworkers
or something like that? I don't see this discussed in this thread.
2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
storage. If I can get an order of magnitude improvement (or more than
that) by disabling the threshold, and just doing random I/O, maybe
there's time to adjust it a bit.
regards
--
Tomas Vondra
On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <tomas@vondra.me> wrote:
On 12/13/24 00:04, Tomas Vondra wrote:
...
The main difference is here:
master / no parallel workers:
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
1 parallel worker:
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
essentially just those with deleted tuples, which is ~1/20 of pages.
That's close to the 15x speedup.This effect is clearest without indexes, but it does affect even runs
with indexes - having to scan the indexes makes it much less pronounced,
though. However, these indexes are pretty massive (about the same size
as the table) - multiple times larger than the table. Chances are it'd
be clearer on realistic data sets.So the question is - is this correct? And if yes, why doesn't the
regular (serial) vacuum do that?There's some more strange things, though. For example, how come the avg
read rate is 0.000 MB/s?avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
pages are in shared buffers (thanks to the DELETE earlier in that session).OK, after looking into this a bit more I think the reason is rather
simple - SKIP_PAGES_THRESHOLD.With serial runs, we end up scanning all pages, because even with an
update every 5000 tuples, that's still only ~25 pages apart, well within
the 32-page window. So we end up skipping no pages, scan and vacuum all
everything.But parallel runs have this skipping logic disabled, or rather the logic
that switches to sequential scans if the gap is less than 32 pages.IMHO this raises two questions:
1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
sequential scans is the pages are close enough. Maybe there is a reason
for this difference? Workers can reduce the difference between random
and sequential I/0, similarly to prefetching. But that just means the
workers should use a lower threshold, e.g. asSKIP_PAGES_THRESHOLD / nworkers
or something like that? I don't see this discussed in this thread.
Each parallel heap scan worker allocates a chunk of blocks which is
8192 blocks at maximum, so we would need to use the
SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we
need to evaluate the differences anyway. WIll do the benchmark test
and share the results.
2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
storage. If I can get an order of magnitude improvement (or more than
that) by disabling the threshold, and just doing random I/O, maybe
there's time to adjust it a bit.
Yeah, you've started a thread for this so let's discuss it there.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, Dec 11, 2024 at 12:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 9, 2024 at 2:11 PM Tomas Vondra <tomas@vondra.me> wrote:
Hi,
Thanks for working on this. I took a quick look at this today, to do
some basic review. I plan to do a bunch of testing, but that's mostly to
get a better idea of what kind of improvements to expect - the initial
results look quite nice and sensible.Thank you for reviewing the patch!
I've attached the updated patches. Here some comments for some review comments:
2) I find it a bit weird vacuumlazy.c needs to include optimizer/paths.h
because it really has nothing to do with planning / paths. I realize it
needs the min_parallel_table_scan_size, but it doesn't seem right. I
guess it's a sign this bit of code (calculating parallel workers based
on log of relation size) should in some "shared" location.True. The same is actually true also for vacuumparallel.c. It includes
optimizer/paths.h to use min_parallel_index_scan_size.
I left this change for now. Since vacuumparallel.c already has this
issue, I think we can create a separate patch to address this issue.
4) I think it would be good to have some sort of README explaining how
the parallel heap vacuum works, i.e. how it's driven by FSM. Took me a
while to realize how the workers coordinate which blocks to scan.+1. I will add README in the next version patch.
I've added the comment at the top of vacuumlazy.c to explain the
overall of how parallel vacuum works (done in 0008 patch).
5) Wouldn't it be better to introduce the scan_stats (grouping some of
the fields in a separate patch)? Seems entirely independent from the
parallel part, so doing it separately would make it easier to review.
Also, maybe reference the fields through scan_stats->x, instead of
through vacrel->scan_stats->x, when there's the pointer.Agreed.
Done in 0001 patch.
6) Is it a good idea to move NewRelfrozenXID/... to the scan_stats?
AFAIK it's not a statistic, it's actually a parameter affecting the
decisions, right?Right. It would be better to move them to a separate struct or somewhere.
I've renamed it to LVRelScanState.
8) It's not clear to me why do_lazy_scan_heap() needs to "advertise" the
current block. Can you explain?The workers' current block numbers are used to calculate the minimum
block number where we've scanned so far. In serial scan case, we
vacuum FSM of the particular block range for every
VACUUM_FSM_EVERY_PAGES pages . On the other hand, in parallel scan
case, it doesn't make sense to vacuum FSM in that way because we might
not have processed some blocks in the block range. So the idea is to
calculate the minimum block number where we've scanned so far and
vacuum FSM of the range of consecutive already-scanned blocks.
I've simplified the logic to calculate the minimum scanned block. We
didn't actually need to advertise the current block.
11) Why does GlobalVisState need to move to snapmgr.h? If I undo this
the patch still builds fine for me.Oh, I might have missed something. I'll check if it's really necessary.
I've tried to undo that change, but now that we copy the contents of
GlobalVisState in vacuumlazy.c it seems we need to expose the
declaration of GlobalVisState.
The attached patches address all comments I got so far including
comments from Peter[1]/messages/by-id/CAHut+PtnyLVkgg7BsfXy0ciVeyCBaXNRSSi0h8AVdx9cTL9_ug@mail.gmail.com[2]/messages/by-id/CAHut+PsA=9UOFKd52A41DSTgeUreMuuweWHmxsokqLzTMao=Rw@mail.gmail.com. From the previous version, I've made many
changes not only for fixing bugs but also for improving parallel
vacuum logic itself and comments. So some review comments about typos
and clarifying the comments are not addressed where I've removed these
comments themself.
I'm doing some benchmark tests so I will share the results.
Feedback is very welcome!
Regards,
[1]: /messages/by-id/CAHut+PtnyLVkgg7BsfXy0ciVeyCBaXNRSSi0h8AVdx9cTL9_ug@mail.gmail.com
[2]: /messages/by-id/CAHut+PsA=9UOFKd52A41DSTgeUreMuuweWHmxsokqLzTMao=Rw@mail.gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0006-radixtree.h-Add-RT_NUM_KEY-API-to-get-the-number-.patchapplication/octet-stream; name=v5-0006-radixtree.h-Add-RT_NUM_KEY-API-to-get-the-number-.patchDownload
From c95bc0f1241c3196dbe09e3ecc617a450a0c094a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 13 Dec 2024 16:54:46 -0800
Subject: [PATCH v5 6/8] radixtree.h: Add RT_NUM_KEY API to get the number of
keys.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/include/lib/radixtree.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index d5767f31c55..3e36f7577b7 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -126,6 +126,7 @@
* RT_ITERATE_NEXT - Return next key-value pair, if any
* RT_END_ITERATE - End iteration
* RT_MEMORY_USAGE - Get the memory as measured by space in memory context blocks
+ * RT_NUM_KEYS - Get the number of key-value pairs in radix tree
*
* Interface for Shared Memory
* ---------
@@ -197,6 +198,7 @@
#define RT_DELETE RT_MAKE_NAME(delete)
#endif
#define RT_MEMORY_USAGE RT_MAKE_NAME(memory_usage)
+#define RT_NUM_KEYS RT_MAKE_NAME(num_keys)
#define RT_DUMP_NODE RT_MAKE_NAME(dump_node)
#define RT_STATS RT_MAKE_NAME(stats)
@@ -313,6 +315,7 @@ RT_SCOPE RT_VALUE_TYPE *RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p);
RT_SCOPE void RT_END_ITERATE(RT_ITER * iter);
RT_SCOPE uint64 RT_MEMORY_USAGE(RT_RADIX_TREE * tree);
+RT_SCOPE int64 RT_NUM_KEYS(RT_RADIX_TREE * tree);
#ifdef RT_DEBUG
RT_SCOPE void RT_STATS(RT_RADIX_TREE * tree);
@@ -2844,6 +2847,15 @@ RT_MEMORY_USAGE(RT_RADIX_TREE * tree)
return total;
}
+RT_SCOPE int64
+RT_NUM_KEYS(RT_RADIX_TREE * tree)
+{
+#ifdef RT_SHMEM
+ Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
+#endif
+ return tree->ctl->num_keys;
+}
+
/*
* Perform some sanity checks on the given node.
*/
@@ -3167,6 +3179,7 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_END_ITERATE
#undef RT_DELETE
#undef RT_MEMORY_USAGE
+#undef RT_NUM_KEYS
#undef RT_DUMP_NODE
#undef RT_STATS
--
2.43.5
v5-0005-Support-shared-itereation-on-TidStore.patchapplication/octet-stream; name=v5-0005-Support-shared-itereation-on-TidStore.patchDownload
From 7298cb4e3e43ba2355e13e258e244fa62e8d4b13 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:34:57 -0700
Subject: [PATCH v5 5/8] Support shared itereation on TidStore.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/common/tidstore.c | 59 ++++++++++++++++++
src/include/access/tidstore.h | 3 +
.../modules/test_tidstore/test_tidstore.c | 62 ++++++++++++++-----
3 files changed, 110 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index a7179759d67..637d26012d2 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -483,6 +483,7 @@ TidStoreBeginIterate(TidStore *ts)
iter = palloc0(sizeof(TidStoreIter));
iter->ts = ts;
+ /* begin iteration on the radix tree */
if (TidStoreIsShared(ts))
iter->tree_iter.shared = shared_ts_begin_iterate(ts->tree.shared);
else
@@ -533,6 +534,56 @@ TidStoreEndIterate(TidStoreIter *iter)
pfree(iter);
}
+/*
+ * Prepare to iterate through a shared TidStore in shared mode. This function
+ * is aimed to start the iteration on the given TidStore with parallel workers.
+ *
+ * The TidStoreIter struct is created in the caller's memory context, and it
+ * will be freed in TidStoreEndIterate.
+ *
+ * The caller is responsible for locking TidStore until the iteration is
+ * finished.
+ */
+TidStoreIter *
+TidStoreBeginIterateShared(TidStore *ts)
+{
+ TidStoreIter *iter;
+
+ if (!TidStoreIsShared(ts))
+ elog(ERROR, "cannot begin shared iteration on local TidStore");
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* begin the shared iteration on radix tree */
+ iter->tree_iter.shared =
+ (shared_ts_iter *) shared_ts_begin_iterate_shared(ts->tree.shared);
+
+ return iter;
+}
+
+/*
+ * Attach to the shared TidStore iterator. 'iter_handle' is the dsa_pointer
+ * returned by TidStoreGetSharedIterHandle(). The returned object is allocated
+ * in backend-local memory using CurrentMemoryContext.
+ */
+TidStoreIter *
+TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle)
+{
+ TidStoreIter *iter;
+
+ Assert(TidStoreIsShared(ts));
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* Attach to the shared iterator */
+ iter->tree_iter.shared = shared_ts_attach_iterate_shared(ts->tree.shared,
+ iter_handle);
+
+ return iter;
+}
+
/*
* Return the memory usage of TidStore.
*/
@@ -564,6 +615,14 @@ TidStoreGetHandle(TidStore *ts)
return (dsa_pointer) shared_ts_get_handle(ts->tree.shared);
}
+dsa_pointer
+TidStoreGetSharedIterHandle(TidStoreIter *iter)
+{
+ Assert(TidStoreIsShared(iter->ts));
+
+ return (dsa_pointer) shared_ts_get_iter_handle(iter->tree_iter.shared);
+}
+
/*
* Given a TidStoreIterResult returned by TidStoreIterateNext(), extract the
* offset numbers. Returns the number of offsets filled in, if <=
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index aeaf563b6a9..f20c9a92e55 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -37,6 +37,9 @@ extern void TidStoreDetach(TidStore *ts);
extern void TidStoreLockExclusive(TidStore *ts);
extern void TidStoreLockShare(TidStore *ts);
extern void TidStoreUnlock(TidStore *ts);
+extern TidStoreIter *TidStoreBeginIterateShared(TidStore *ts);
+extern TidStoreIter *TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle);
+extern dsa_pointer TidStoreGetSharedIterHandle(TidStoreIter *iter);
extern void TidStoreDestroy(TidStore *ts);
extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
int num_offsets);
diff --git a/src/test/modules/test_tidstore/test_tidstore.c b/src/test/modules/test_tidstore/test_tidstore.c
index 25077caf8f1..2dc649124bb 100644
--- a/src/test/modules/test_tidstore/test_tidstore.c
+++ b/src/test/modules/test_tidstore/test_tidstore.c
@@ -33,6 +33,7 @@ PG_FUNCTION_INFO_V1(test_is_full);
PG_FUNCTION_INFO_V1(test_destroy);
static TidStore *tidstore = NULL;
+static bool tidstore_is_shared;
static size_t tidstore_empty_size;
/* array for verification of some tests */
@@ -107,6 +108,7 @@ test_create(PG_FUNCTION_ARGS)
LWLockRegisterTranche(tranche_id, "test_tidstore");
tidstore = TidStoreCreateShared(tidstore_max_size, tranche_id);
+ tidstore_is_shared = true;
/*
* Remain attached until end of backend or explicitly detached so that
@@ -115,8 +117,11 @@ test_create(PG_FUNCTION_ARGS)
dsa_pin_mapping(TidStoreGetDSA(tidstore));
}
else
+ {
/* VACUUM uses insert only, so we test the other option. */
tidstore = TidStoreCreateLocal(tidstore_max_size, false);
+ tidstore_is_shared = false;
+ }
tidstore_empty_size = TidStoreMemoryUsage(tidstore);
@@ -212,14 +217,42 @@ do_set_block_offsets(PG_FUNCTION_ARGS)
PG_RETURN_INT64(blkno);
}
+/* Collect TIDs stored in the tidstore, in order */
+static void
+check_iteration(TidStore *tidstore, int *num_iter_tids, bool shared_iter)
+{
+ TidStoreIter *iter;
+ TidStoreIterResult *iter_result;
+
+ TidStoreLockShare(tidstore);
+
+ if (shared_iter)
+ iter = TidStoreBeginIterateShared(tidstore);
+ else
+ iter = TidStoreBeginIterate(tidstore);
+
+ while ((iter_result = TidStoreIterateNext(iter)) != NULL)
+ {
+ OffsetNumber offsets[MaxOffsetNumber];
+ int num_offsets;
+
+ num_offsets = TidStoreGetBlockOffsets(iter_result, offsets, lengthof(offsets));
+ Assert(num_offsets <= lengthof(offsets));
+ for (int i = 0; i < num_offsets; i++)
+ ItemPointerSet(&(items.iter_tids[(*num_iter_tids)++]), iter_result->blkno,
+ offsets[i]);
+ }
+
+ TidStoreEndIterate(iter);
+ TidStoreUnlock(tidstore);
+}
+
/*
* Verify TIDs in store against the array.
*/
Datum
check_set_block_offsets(PG_FUNCTION_ARGS)
{
- TidStoreIter *iter;
- TidStoreIterResult *iter_result;
int num_iter_tids = 0;
int num_lookup_tids = 0;
BlockNumber prevblkno = 0;
@@ -261,22 +294,23 @@ check_set_block_offsets(PG_FUNCTION_ARGS)
}
/* Collect TIDs stored in the tidstore, in order */
+ check_iteration(tidstore, &num_iter_tids, false);
- TidStoreLockShare(tidstore);
- iter = TidStoreBeginIterate(tidstore);
- while ((iter_result = TidStoreIterateNext(iter)) != NULL)
+ /* If the tidstore is shared, check the shared-iteration as well */
+ if (tidstore_is_shared)
{
- OffsetNumber offsets[MaxOffsetNumber];
- int num_offsets;
+ int num_iter_tids_shared = 0;
- num_offsets = TidStoreGetBlockOffsets(iter_result, offsets, lengthof(offsets));
- Assert(num_offsets <= lengthof(offsets));
- for (int i = 0; i < num_offsets; i++)
- ItemPointerSet(&(items.iter_tids[num_iter_tids++]), iter_result->blkno,
- offsets[i]);
+ check_iteration(tidstore, &num_iter_tids_shared, true);
+
+ /*
+ * verify that normal iteration and shared iteration returned the
+ * number of TIDs.
+ */
+ if (num_lookup_tids != num_iter_tids_shared)
+ elog(ERROR, "shared-iteration should have %d TIDs, have %d aaa",
+ items.num_tids, num_iter_tids_shared);
}
- TidStoreEndIterate(iter);
- TidStoreUnlock(tidstore);
/*
* Sort verification and lookup arrays and test that all arrays are the
--
2.43.5
v5-0008-Support-parallel-heap-vacuum-during-lazy-vacuum.patchapplication/octet-stream; name=v5-0008-Support-parallel-heap-vacuum-during-lazy-vacuum.patchDownload
From 8a0e97565e6bcc1b952de0d2b3034a7dce35a62d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:37:45 -0700
Subject: [PATCH v5 8/8] Support parallel heap vacuum during lazy vacuum.
This commit further extends parallel vacuum to perform the heap vacuum
phase with parallel workers. It leverages the shared TidStore iteration.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
doc/src/sgml/ref/vacuum.sgml | 17 +-
src/backend/access/heap/vacuumlazy.c | 280 +++++++++++++++++++-------
src/backend/commands/vacuumparallel.c | 10 +-
src/include/commands/vacuum.h | 2 +-
4 files changed, 223 insertions(+), 86 deletions(-)
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index aae0bbcd577..104157b5a56 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -278,20 +278,21 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<term><literal>PARALLEL</literal></term>
<listitem>
<para>
- Perform scanning heap, index vacuum, and index cleanup phases of
- <command>VACUUM</command> in parallel using
+ Perform scanning heap, vacuuming heap, index vacuum, and index cleanup
+ phases of <command>VACUUM</command> in parallel using
<replaceable class="parameter">integer</replaceable> background workers
(for the details of each vacuum phase, please refer to
<xref linkend="vacuum-phases"/>).
</para>
<para>
For heap tables, the number of workers used to perform the scanning
- heap is determined based on the size of table. A table can participate in
- parallel scanning heap if and only if the size of the table is more than
- <xref linkend="guc-min-parallel-table-scan-size"/>. During scanning heap,
- the heap table's blocks will be divided into ranges and shared among the
- cooperating processes. Each worker process will complete the scanning of
- its given range of blocks before requesting an additional range of blocks.
+ heap and vacuuming heap is determined based on the size of table. A table
+ can participate in parallel scanning heap if and only if the size of the
+ table is more than <xref linkend="guc-min-parallel-table-scan-size"/>.
+ During scanning heap, the heap table's blocks will be divided into ranges
+ and shared among the cooperating processes. Each worker process will
+ complete the scanning of its given range of blocks before requesting an
+ additional range of blocks.
</para>
<para>
The number of workers used to perform parallel index vacuum and index
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2e70bc68d2c..67516391d89 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -20,6 +20,41 @@
* that there only needs to be one call to lazy_vacuum, after the initial pass
* completes.
*
+ * Parallel Vacuum
+ * ----------------
+ * Lazy vacuum on heap tables supports parallel processing for three vacuum
+ * phases: scanning heap, vacuuming indexes, and vacuuming heap. Before the
+ * scanning heap phase, we initialize parallel vacuum state, ParallelVacuumState,
+ * and allocate the TID store in a DSA area if we can use parallel mode for any
+ * of these three phases.
+ *
+ * We could require different number of parallel vacuum workers for each phase
+ * for various factors such as table size, number of indexes, and the number
+ * of pages having dead tuples. Parallel workers are launched at the beginning
+ * of each phase and exit at the end of each phase.
+ *
+ * For scanning the heap table with parallel workers, we utilize the
+ * table_block_parallelscan_xxx facility which splits the table into several
+ * chunks and parallel workers allocate chunks to scan. If dead_items TIDs is
+ * close to overrunning the available space during parallel heap scan, parallel
+ * workers exit and leader process gathers the scan results. Then, it performs
+ * a round of index and heap vacuuming that could also use the parallelism. After
+ * vacuuming both indexes and heap table, the leader process vacuums FSM to make
+ * newly-freed space visible. Then, it relaunches parallel workers to resume the
+ * scanning heap phase with parallel workers again. In order to be able to resume
+ * the parallel heap scan from the previous status, the workers' parallel scan
+ * descriptions are stored in the shared memory (DSM) space to share among parallel
+ * workers. If the leader could launch fewer workers to resume the parallel heap
+ * scan, some blocks are remained as un-scanned. The leader process serially deals
+ * with such blocks at the end of scanning heap phase (see
+ * parallel_heap_complete_unfinished_scan()).
+ *
+ * At the beginning of the vacuuming heap phase, the leader launches parallel
+ * workers and initiates the shared iteration on the shared TID store. At the
+ * end of the phase, the leader process waits for all workers to finish and gather
+ * the workers' results.
+ *
+ *
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
@@ -172,6 +207,7 @@ typedef struct LVRelScanState
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber vacuumed_pages; /* # pages vacuumed in one second pass */
/* Counters that follow are only for scanned_pages */
int64 tuples_deleted; /* # deleted from table */
@@ -205,11 +241,15 @@ typedef struct PHVShared
* The final value is OR of worker's skippedallvis.
*/
bool skippedallvis;
+ bool do_index_vacuuming;
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState vistest;
+ dsa_pointer shared_iter_handle;
+ bool do_heap_vacuum;
+
/* per-worker scan stats for parallel heap vacuum scan */
LVRelScanState worker_scan_state[FLEXIBLE_ARRAY_MEMBER];
} PHVShared;
@@ -257,6 +297,14 @@ typedef struct PHVState
/* Assigned per-worker scan state */
PHVScanWorkerState *myscanstate;
+ /*
+ * The number of parallel workers to launch for parallel heap scanning.
+ * Note that the number of parallel workers for parallel heap vacuuming
+ * could vary but is less than num_heapscan_workers. So this works also as
+ * the maximum number of workers for parallel heap scanning and vacuuming.
+ */
+ int num_heapscan_workers;
+
/*
* All blocks up to this value has been scanned, i.e. the minimum of all
* PHVScanWorkerState->last_blkno. This field is updated by
@@ -374,6 +422,7 @@ static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
static void lazy_vacuum(LVRelState *vacrel);
static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
static void lazy_vacuum_heap_rel(LVRelState *vacrel);
+static void do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter);
static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
Buffer buffer, OffsetNumber *deadoffsets,
int num_offsets, Buffer vmbuffer);
@@ -404,6 +453,7 @@ static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
static void parallel_heap_vacuum_compute_min_scanned_blkno(LVRelState *vacrel);
static void parallel_heap_vacuum_gather_scan_results(LVRelState *vacrel);
static void parallel_heap_complete_unfinished_scan(LVRelState *vacrel);
+static int compute_heap_vacuum_parallel_workers(Relation rel, BlockNumber nblocks);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -551,6 +601,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
scan_state->lpdead_item_pages = 0;
scan_state->missed_dead_pages = 0;
scan_state->nonempty_pages = 0;
+ scan_state->vacuumed_pages = 0;
scan_state->tuples_deleted = 0;
scan_state->tuples_frozen = 0;
scan_state->lpdead_items = 0;
@@ -2456,46 +2507,14 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
return allindexes;
}
-/*
- * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
- *
- * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
- * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
- *
- * We may also be able to truncate the line pointer array of the heap pages we
- * visit. If there is a contiguous group of LP_UNUSED items at the end of the
- * array, it can be reclaimed as free space. These LP_UNUSED items usually
- * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
- * each page to LP_UNUSED, and then consider if it's possible to truncate the
- * page's line pointer array).
- *
- * Note: the reason for doing this as a second pass is we cannot remove the
- * tuples until we've removed their index entries, and we want to process
- * index entry removal in batches as large as possible.
- */
static void
-lazy_vacuum_heap_rel(LVRelState *vacrel)
+do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter)
{
- BlockNumber vacuumed_pages = 0;
Buffer vmbuffer = InvalidBuffer;
- LVSavedErrInfo saved_err_info;
- TidStoreIter *iter;
- TidStoreIterResult *iter_result;
-
- Assert(vacrel->do_index_vacuuming);
- Assert(vacrel->do_index_cleanup);
- Assert(vacrel->num_index_scans > 0);
-
- /* Report that we are now vacuuming the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
- /* Update error traceback information */
- update_vacuum_error_info(vacrel, &saved_err_info,
- VACUUM_ERRCB_PHASE_VACUUM_HEAP,
- InvalidBlockNumber, InvalidOffsetNumber);
+ /* LVSavedErrInfo saved_err_info; */
+ TidStoreIterResult *iter_result;
- iter = TidStoreBeginIterate(vacrel->dead_items);
while ((iter_result = TidStoreIterateNext(iter)) != NULL)
{
BlockNumber blkno;
@@ -2533,26 +2552,106 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
UnlockReleaseBuffer(buf);
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
- vacuumed_pages++;
+ vacrel->scan_state->vacuumed_pages++;
}
- TidStoreEndIterate(iter);
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
+}
+
+/*
+ * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
+ *
+ * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
+ * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
+ *
+ * We may also be able to truncate the line pointer array of the heap pages we
+ * visit. If there is a contiguous group of LP_UNUSED items at the end of the
+ * array, it can be reclaimed as free space. These LP_UNUSED items usually
+ * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
+ * each page to LP_UNUSED, and then consider if it's possible to truncate the
+ * page's line pointer array).
+ *
+ * Note: the reason for doing this as a second pass is we cannot remove the
+ * tuples until we've removed their index entries, and we want to process
+ * index entry removal in batches as large as possible.
+ */
+static void
+lazy_vacuum_heap_rel(LVRelState *vacrel)
+{
+ LVSavedErrInfo saved_err_info;
+ TidStoreIter *iter;
+ int nworkers = 0;
+
+ Assert(vacrel->do_index_vacuuming);
+ Assert(vacrel->do_index_cleanup);
+ Assert(vacrel->num_index_scans > 0);
+
+ /* Report that we are now vacuuming the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
+
+ /* Update error traceback information */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ vacrel->scan_state->vacuumed_pages = 0;
+
+ /* Compute parallel workers required to scan blocks to vacuum */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ nworkers = compute_heap_vacuum_parallel_workers(vacrel->rel,
+ TidStoreNumBlocks(vacrel->dead_items));
+
+ if (nworkers > 0)
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ iter = TidStoreBeginIterateShared(vacrel->dead_items);
+
+ /* launch workers */
+ phvstate->shared->do_heap_vacuum = true;
+ phvstate->shared->shared_iter_handle = TidStoreGetSharedIterHandle(iter);
+ phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs,
+ nworkers);
+ }
+ else
+ iter = TidStoreBeginIterate(vacrel->dead_items);
+
+ /* do the real work */
+ do_lazy_vacuum_heap_rel(vacrel, iter);
+
+ if (ParallelHeapVacuumIsActive(vacrel) && nworkers > 0)
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+
+ /* Gather the heap vacuum statistics that workers collected */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanState *ss = &(phvstate->shared->worker_scan_state[i]);
+
+ vacrel->scan_state->vacuumed_pages += ss->vacuumed_pages;
+ }
+ }
+
+ TidStoreEndIterate(iter);
+
/*
* We set all LP_DEAD items from the first heap pass to LP_UNUSED during
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
(vacrel->dead_items_info->num_items == vacrel->scan_state->lpdead_items &&
- vacuumed_pages == vacrel->scan_state->lpdead_item_pages));
+ vacrel->scan_state->vacuumed_pages == vacrel->scan_state->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) vacrel->dead_items_info->num_items,
- vacuumed_pages)));
+ vacrel->scan_state->vacuumed_pages)));
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3261,6 +3360,11 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
{
vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs,
&vacrel->dead_items_info);
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ vacrel->phvstate->num_heapscan_workers =
+ parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
return;
}
}
@@ -3508,37 +3612,41 @@ update_relstats_all_indexes(LVRelState *vacrel)
*
* The calculation logic is borrowed from compute_parallel_worker().
*/
-int
-heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+static int
+compute_heap_vacuum_parallel_workers(Relation rel, BlockNumber nblocks)
{
int parallel_workers = 0;
int heap_parallel_threshold;
int heap_pages;
- if (nrequested == 0)
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation.Note that the upper limit of the min_parallel_table_scan_size
+ * GUC is chosen to prevent overflow here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = BlockNumberIsValid(nblocks) ?
+ nblocks : RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
{
- /*
- * Select the number of workers based on the log of the size of the
- * relation. Note that the upper limit of the
- * min_parallel_table_scan_size GUC is chosen to prevent overflow
- * here.
- */
- heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
- heap_pages = RelationGetNumberOfBlocks(rel);
- while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
- {
- parallel_workers++;
- heap_parallel_threshold *= 3;
- if (heap_parallel_threshold > INT_MAX / 3)
- break;
- }
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
}
- else
- parallel_workers = nrequested;
return parallel_workers;
}
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ if (nrequested == 0)
+ return compute_heap_vacuum_parallel_workers(rel, InvalidBlockNumber);
+ else
+ return nrequested;
+}
+
/* Estimate shared memory sizes required for parallel heap vacuum */
static inline void
heap_parallel_estimate_shared_memory_size(Relation rel, int nworkers, Size *pscan_len,
@@ -3620,6 +3728,7 @@ heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
shared->NewRelfrozenXid = vacrel->scan_state->NewRelfrozenXid;
shared->NewRelminMxid = vacrel->scan_state->NewRelminMxid;
shared->skippedallvis = vacrel->scan_state->skippedallvis;
+ shared->do_index_vacuuming = vacrel->do_index_vacuuming;
/*
* XXX: we copy the contents of vistest to the shared area, but in order
@@ -3672,7 +3781,6 @@ heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
PHVScanWorkerState *scanstate;
LVRelScanState *scan_state;
ErrorContextCallback errcallback;
- bool scan_done;
phvstate = palloc(sizeof(PHVState));
@@ -3709,10 +3817,11 @@ heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
/* initialize per-worker relation statistics */
MemSet(scan_state, 0, sizeof(LVRelScanState));
- /* Set fields necessary for heap scan */
+ /* Set fields necessary for heap scan and vacuum */
vacrel.scan_state->NewRelfrozenXid = shared->NewRelfrozenXid;
vacrel.scan_state->NewRelminMxid = shared->NewRelminMxid;
vacrel.scan_state->skippedallvis = shared->skippedallvis;
+ vacrel.do_index_vacuuming = shared->do_index_vacuuming;
/* Initialize the per-worker scan state if not yet */
if (!phvstate->myscanstate->initialized)
@@ -3734,25 +3843,44 @@ heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
vacrel.relname = pstrdup(RelationGetRelationName(rel));
vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
errcallback.callback = vacuum_error_callback;
errcallback.arg = &vacrel;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
- scan_done = do_lazy_scan_heap(&vacrel);
+ if (shared->do_heap_vacuum)
+ {
+ TidStoreIter *iter;
+
+ iter = TidStoreAttachIterateShared(vacrel.dead_items, shared->shared_iter_handle);
+
+ /* Join parallel heap vacuum */
+ vacrel.phase = VACUUM_ERRCB_PHASE_VACUUM_HEAP;
+ do_lazy_vacuum_heap_rel(&vacrel, iter);
+
+ TidStoreEndIterate(iter);
+ }
+ else
+ {
+ bool scan_done;
+
+ /* Join parallel heap scan */
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in
+ * its scan state. Since this scan state might not be used in the next
+ * heap scan, we remember that it might have some unconsumed blocks so
+ * that the leader complete the scans after the heap scan phase
+ * finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+ }
/* Pop the error context stack */
error_context_stack = errcallback.previous;
-
- /*
- * If the leader or a worker finishes the heap scan because dead_items
- * TIDs is close to the limit, it might have some allocated blocks in its
- * scan state. Since this scan state might not be used in the next heap
- * scan, we remember that it might have some unconsumed blocks so that the
- * leader complete the scans after the heap scan phase finishes.
- */
- phvstate->myscanstate->maybe_have_blocks = !scan_done;
}
/*
@@ -3874,7 +4002,10 @@ do_parallel_lazy_scan_heap(LVRelState *vacrel)
Assert(!IsParallelWorker());
/* launcher workers */
- vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+ vacrel->phvstate->shared->do_heap_vacuum = false;
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs,
+ vacrel->phvstate->num_heapscan_workers);
/* initialize parallel scan description to join as a worker */
scanstate = palloc0(sizeof(PHVScanWorkerState));
@@ -3933,7 +4064,8 @@ do_parallel_lazy_scan_heap(LVRelState *vacrel)
/* Re-launch workers to restart parallel heap scan */
vacrel->phvstate->nworkers_launched =
- parallel_vacuum_table_scan_begin(vacrel->pvs);
+ parallel_vacuum_table_scan_begin(vacrel->pvs,
+ vacrel->phvstate->num_heapscan_workers);
}
/*
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 3001be84ddf..fd897ddadf3 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1054,8 +1054,10 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
* table vacuum.
*/
int
-parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs, int nworkers_request)
{
+ int nworkers;
+
Assert(!IsParallelWorker());
if (pvs->shared->nworkers_for_table == 0)
@@ -1069,11 +1071,13 @@ parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
if (pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
+ nworkers = Min(nworkers_request, pvs->shared->nworkers_for_table);
+
/*
* The number of workers might vary between table vacuum and index
* processing
*/
- ReinitializeParallelWorkers(pvs->pcxt, pvs->shared->nworkers_for_table);
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
LaunchParallelWorkers(pvs->pcxt);
if (pvs->pcxt->nworkers_launched > 0)
@@ -1097,7 +1101,7 @@ parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
(errmsg(ngettext("launched %d parallel vacuum worker for table processing (planned: %d)",
"launched %d parallel vacuum workers for table processing (planned: %d)",
pvs->pcxt->nworkers_launched),
- pvs->pcxt->nworkers_launched, pvs->shared->nworkers_for_table)));
+ pvs->pcxt->nworkers_launched, nworkers)));
return pvs->pcxt->nworkers_launched;
}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index b70e50133fa..ab6b6cde759 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -371,7 +371,7 @@ extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
bool estimated_count);
-extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs, int nworkers_request);
extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
extern int parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs);
--
2.43.5
v5-0004-raidxtree.h-support-shared-iteration.patchapplication/octet-stream; name=v5-0004-raidxtree.h-support-shared-iteration.patchDownload
From 8d4f5c162f19b080f117294c9f089a95cb731a99 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:29:51 -0700
Subject: [PATCH v5 4/8] raidxtree.h: support shared iteration.
This commit supports a shared iteration operation on a radix tree with
multiple processes. The radix tree must be in shared mode to start a
shared itereation. Parallel workers can attach the shared iteration
using the iterator handle given by the leader process. Same as the
normal interation, it's guarnteed that the shared iteration returns
key-values in an ascending order.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
---
src/include/lib/radixtree.h | 227 +++++++++++++++---
.../modules/test_radixtree/test_radixtree.c | 128 ++++++----
2 files changed, 281 insertions(+), 74 deletions(-)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 1301f3fee44..d5767f31c55 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -136,6 +136,9 @@
* RT_LOCK_SHARE - Lock the radix tree in share mode
* RT_UNLOCK - Unlock the radix tree
* RT_GET_HANDLE - Return the handle of the radix tree
+ * RT_BEGIN_ITERATE_SHARED - Begin iterating in shared mode.
+ * RT_ATTACH_ITERATE_SHARED - Attach to the shared iterator.
+ * RT_GET_ITER_HANDLE - Get the handle of the shared iterator.
*
* Optional Interface
* ---------
@@ -179,6 +182,9 @@
#define RT_ATTACH RT_MAKE_NAME(attach)
#define RT_DETACH RT_MAKE_NAME(detach)
#define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
+#define RT_BEGIN_ITERATE_SHARED RT_MAKE_NAME(begin_iterate_shared)
+#define RT_ATTACH_ITERATE_SHARED RT_MAKE_NAME(attach_iterate_shared)
+#define RT_GET_ITER_HANDLE RT_MAKE_NAME(get_iter_handle)
#define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
#define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
#define RT_UNLOCK RT_MAKE_NAME(unlock)
@@ -238,15 +244,19 @@
#define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
#define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
#define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
+#define RT_INITIALIZE_ITER RT_MAKE_NAME(initialize_iter)
#define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
#define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
/* type declarations */
#define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
#define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
+#define RT_ITER_CONTROL RT_MAKE_NAME(iter_control)
#define RT_ITER RT_MAKE_NAME(iter)
#ifdef RT_SHMEM
#define RT_HANDLE RT_MAKE_NAME(handle)
+#define RT_ITER_CONTROL_SHARED RT_MAKE_NAME(iter_control_shared)
+#define RT_ITER_HANDLE RT_MAKE_NAME(iter_handle)
#endif
#define RT_NODE RT_MAKE_NAME(node)
#define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
@@ -272,6 +282,7 @@ typedef struct RT_ITER RT_ITER;
#ifdef RT_SHMEM
typedef dsa_pointer RT_HANDLE;
+typedef dsa_pointer RT_ITER_HANDLE;
#endif
#ifdef RT_SHMEM
@@ -282,6 +293,9 @@ RT_SCOPE RT_HANDLE RT_GET_HANDLE(RT_RADIX_TREE * tree);
RT_SCOPE void RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree);
RT_SCOPE void RT_LOCK_SHARE(RT_RADIX_TREE * tree);
RT_SCOPE void RT_UNLOCK(RT_RADIX_TREE * tree);
+RT_SCOPE RT_ITER *RT_BEGIN_ITERATE_SHARED(RT_RADIX_TREE * tree);
+RT_SCOPE RT_ITER_HANDLE RT_GET_ITER_HANDLE(RT_ITER * iter);
+RT_SCOPE RT_ITER *RT_ATTACH_ITERATE_SHARED(RT_RADIX_TREE * tree, RT_ITER_HANDLE handle);
#else
RT_SCOPE RT_RADIX_TREE *RT_CREATE(MemoryContext ctx);
#endif
@@ -689,6 +703,7 @@ typedef struct RT_RADIX_TREE_CONTROL
RT_HANDLE handle;
uint32 magic;
LWLock lock;
+ int tranche_id;
#endif
RT_PTR_ALLOC root;
@@ -742,11 +757,9 @@ typedef struct RT_NODE_ITER
int idx;
} RT_NODE_ITER;
-/* state for iterating over the whole radix tree */
-struct RT_ITER
+/* Contain the iteration state data */
+typedef struct RT_ITER_CONTROL
{
- RT_RADIX_TREE *tree;
-
/*
* A stack to track iteration for each level. Level 0 is the lowest (or
* leaf) level
@@ -757,8 +770,36 @@ struct RT_ITER
/* The key constructed during iteration */
uint64 key;
-};
+} RT_ITER_CONTROL;
+
+#ifdef RT_SHMEM
+/* Contain the shared iteration state data */
+typedef struct RT_ITER_CONTROL_SHARED
+{
+ /* Actual shared iteration state data */
+ RT_ITER_CONTROL common;
+
+ /* protect the control data */
+ LWLock lock;
+
+ RT_ITER_HANDLE handle;
+ pg_atomic_uint32 refcnt;
+} RT_ITER_CONTROL_SHARED;
+#endif
+
+/* state for iterating over the whole radix tree */
+struct RT_ITER
+{
+ RT_RADIX_TREE *tree;
+ /* pointing to either local memory or DSA */
+ RT_ITER_CONTROL *ctl;
+
+#ifdef RT_SHMEM
+ /* True if the iterator is for shared iteration */
+ bool shared;
+#endif
+};
/* verification (available only in assert-enabled builds) */
static void RT_VERIFY_NODE(RT_NODE * node);
@@ -1850,6 +1891,7 @@ RT_CREATE(MemoryContext ctx)
tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
tree->ctl->handle = dp;
tree->ctl->magic = RT_RADIX_TREE_MAGIC;
+ tree->ctl->tranche_id = tranche_id;
LWLockInitialize(&tree->ctl->lock, tranche_id);
#else
tree->ctl = (RT_RADIX_TREE_CONTROL *) palloc0(sizeof(RT_RADIX_TREE_CONTROL));
@@ -1902,6 +1944,9 @@ RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
dsa_pointer control;
tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE));
+ tree->iter_context = AllocSetContextCreate(CurrentMemoryContext,
+ RT_STR(RT_PREFIX) "_radix_tree iter context",
+ ALLOCSET_SMALL_SIZES);
/* Find the control object in shared memory */
control = handle;
@@ -2074,35 +2119,86 @@ RT_FREE(RT_RADIX_TREE * tree)
/***************** ITERATION *****************/
+/* Common routine to initialize the given iterator */
+static void
+RT_INITIALIZE_ITER(RT_RADIX_TREE * tree, RT_ITER * iter)
+{
+ RT_CHILD_PTR root;
+
+ iter->tree = tree;
+
+ Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
+ root.alloc = iter->tree->ctl->root;
+ RT_PTR_SET_LOCAL(tree, &root);
+
+ iter->ctl->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+
+ /* Set the root to start */
+ iter->ctl->cur_level = iter->ctl->top_level;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = root;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
+}
+
/*
* Create and return the iterator for the given radix tree.
*
- * Taking a lock in shared mode during the iteration is the caller's
- * responsibility.
+ * Taking a lock on a radix tree in shared mode during the iteration is the
+ * caller's responsibility.
*/
RT_SCOPE RT_ITER *
RT_BEGIN_ITERATE(RT_RADIX_TREE * tree)
{
RT_ITER *iter;
- RT_CHILD_PTR root;
iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
sizeof(RT_ITER));
- iter->tree = tree;
+ iter->ctl = (RT_ITER_CONTROL *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER_CONTROL));
- Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
- root.alloc = iter->tree->ctl->root;
- RT_PTR_SET_LOCAL(tree, &root);
+ RT_INITIALIZE_ITER(tree, iter);
- iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+#ifdef RT_SHMEM
+ /* we will non-shared iteration on a shared radix tree */
+ iter->shared = false;
+#endif
- /* Set the root to start */
- iter->cur_level = iter->top_level;
- iter->node_iters[iter->cur_level].node = root;
- iter->node_iters[iter->cur_level].idx = 0;
+ return iter;
+}
+
+#ifdef RT_SHMEM
+/*
+ * Create and return the shared iterator for the given shard radix tree.
+ *
+ * Taking a lock on a radix tree in shared mode during the shared iteration to
+ * prevent concurrent writes is the caller's responsibility.
+ */
+RT_SCOPE RT_ITER *
+RT_BEGIN_ITERATE_SHARED(RT_RADIX_TREE * tree)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl_shared;
+ dsa_pointer dp;
+
+ /* The radix tree must be in shared mode */
+ Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
+
+ dp = dsa_allocate0(tree->dsa, sizeof(RT_ITER_CONTROL_SHARED));
+ ctl_shared = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, dp);
+ ctl_shared->handle = dp;
+ LWLockInitialize(&ctl_shared->lock, tree->ctl->tranche_id);
+ pg_atomic_init_u32(&ctl_shared->refcnt, 1);
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+
+ iter->ctl = (RT_ITER_CONTROL *) ctl_shared;
+ iter->shared = true;
+
+ RT_INITIALIZE_ITER(tree, iter);
return iter;
}
+#endif
/*
* Scan the inner node and return the next child pointer if one exists, otherwise
@@ -2116,12 +2212,18 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
RT_CHILD_PTR node;
RT_PTR_ALLOC *slot = NULL;
+ node_iter = &(iter->ctl->node_iters[level]);
+ node = node_iter->node;
+
#ifdef RT_SHMEM
- Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
-#endif
- node_iter = &(iter->node_iters[level]);
- node = node_iter->node;
+ /*
+ * Since the iterator is shared, the local pointer of the node might be
+ * set by other backends, we need to make sure to use the local pointer.
+ */
+ if (iter->shared)
+ RT_PTR_SET_LOCAL(iter->tree, &node);
+#endif
Assert(node.local != NULL);
@@ -2194,8 +2296,8 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
}
/* Update the key */
- iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
- iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
+ iter->ctl->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
+ iter->ctl->key |= (((uint64) key_chunk) << (level * RT_SPAN));
return slot;
}
@@ -2209,18 +2311,29 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
{
RT_PTR_ALLOC *slot = NULL;
- while (iter->cur_level <= iter->top_level)
+#ifdef RT_SHMEM
+ /* Prevent the shared iterator from being updated concurrently */
+ if (iter->shared)
+ LWLockAcquire(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock, LW_EXCLUSIVE);
+#endif
+
+ while (iter->ctl->cur_level <= iter->ctl->top_level)
{
RT_CHILD_PTR node;
- slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
+ slot = RT_NODE_ITERATE_NEXT(iter, iter->ctl->cur_level);
- if (iter->cur_level == 0 && slot != NULL)
+ if (iter->ctl->cur_level == 0 && slot != NULL)
{
/* Found a value at the leaf node */
- *key_p = iter->key;
+ *key_p = iter->ctl->key;
node.alloc = *slot;
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
if (RT_CHILDPTR_IS_VALUE(*slot))
return (RT_VALUE_TYPE *) slot;
else
@@ -2236,17 +2349,23 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
node.alloc = *slot;
RT_PTR_SET_LOCAL(iter->tree, &node);
- iter->cur_level--;
- iter->node_iters[iter->cur_level].node = node;
- iter->node_iters[iter->cur_level].idx = 0;
+ iter->ctl->cur_level--;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = node;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
}
else
{
/* Not found the child slot, move up the tree */
- iter->cur_level++;
+ iter->ctl->cur_level++;
}
+
}
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
/* We've visited all nodes, so the iteration finished */
return NULL;
}
@@ -2257,9 +2376,45 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
RT_SCOPE void
RT_END_ITERATE(RT_ITER * iter)
{
+#ifdef RT_SHMEM
+ RT_ITER_CONTROL_SHARED *ctl = (RT_ITER_CONTROL_SHARED *) iter->ctl;
+
+ if (iter->shared &&
+ pg_atomic_sub_fetch_u32(&ctl->refcnt, 1) == 0)
+ dsa_free(iter->tree->dsa, ctl->handle);
+#endif
pfree(iter);
}
+#ifdef RT_SHMEM
+RT_SCOPE RT_ITER_HANDLE
+RT_GET_ITER_HANDLE(RT_ITER * iter)
+{
+ Assert(iter->shared);
+ return ((RT_ITER_CONTROL_SHARED *) iter->ctl)->handle;
+
+}
+
+RT_SCOPE RT_ITER *
+RT_ATTACH_ITERATE_SHARED(RT_RADIX_TREE * tree, RT_ITER_HANDLE handle)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl;
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+ iter->tree = tree;
+ ctl = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, handle);
+ iter->ctl = (RT_ITER_CONTROL *) ctl;
+ iter->shared = true;
+
+ /* For every iterator, increase the refcnt by 1 */
+ pg_atomic_add_fetch_u32(&ctl->refcnt, 1);
+
+ return iter;
+}
+#endif
+
/***************** DELETION *****************/
#ifdef RT_USE_DELETE
@@ -2959,7 +3114,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_PTR_ALLOC
#undef RT_INVALID_PTR_ALLOC
#undef RT_HANDLE
+#undef RT_ITER_HANDLE
+#undef RT_ITER_CONTROL
+#undef RT_ITER_HANDLE
#undef RT_ITER
+#undef RT_SHARED_ITER
#undef RT_NODE
#undef RT_NODE_ITER
#undef RT_NODE_KIND_4
@@ -2996,6 +3155,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_LOCK_SHARE
#undef RT_UNLOCK
#undef RT_GET_HANDLE
+#undef RT_BEGIN_ITERATE_SHARED
+#undef RT_ATTACH_ITERATE_SHARED
+#undef RT_GET_ITER_HANDLE
+#undef RT_ATTACH_ITER
+#undef RT_GET_ITER_HANDLE
#undef RT_FIND
#undef RT_SET
#undef RT_BEGIN_ITERATE
@@ -3052,5 +3216,6 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_SHRINK_NODE_256
#undef RT_NODE_DELETE
#undef RT_NODE_INSERT
+#undef RT_INITIALIZE_ITER
#undef RT_NODE_ITERATE_NEXT
#undef RT_VERIFY_NODE
diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c
index 3e5aa3720c7..ef9cc6eb507 100644
--- a/src/test/modules/test_radixtree/test_radixtree.c
+++ b/src/test/modules/test_radixtree/test_radixtree.c
@@ -161,13 +161,87 @@ test_empty(void)
#endif
}
+/* Iteration test for test_basic() */
+static void
+test_iterate_basic(rt_radix_tree *radixtree, uint64 *keys, int children,
+ bool asc, bool shared)
+{
+ rt_iter *iter;
+
+#ifdef TEST_SHARED_RT
+ if (!shared)
+ iter = rt_begin_iterate(radixtree);
+ else
+ iter = rt_begin_iterate_shared(radixtree);
+#else
+ iter = rt_begin_iterate(radixtree);
+#endif
+
+ for (int i = 0; i < children; i++)
+ {
+ uint64 expected;
+ uint64 iterkey;
+ TestValueType *iterval;
+
+ /* iteration is ordered by key, so adjust expected value accordingly */
+ if (asc)
+ expected = keys[i];
+ else
+ expected = keys[children - 1 - i];
+
+ iterval = rt_iterate_next(iter, &iterkey);
+
+ EXPECT_TRUE(iterval != NULL);
+ EXPECT_EQ_U64(iterkey, expected);
+ EXPECT_EQ_U64(*iterval, expected);
+ }
+
+ rt_end_iterate(iter);
+}
+
+/* Iteration test for test_random() */
+static void
+test_iterate_random(rt_radix_tree *radixtree, uint64 *keys, int num_keys,
+ bool shared)
+{
+ rt_iter *iter;
+
+#ifdef TEST_SHARED_RT
+ if (!shared)
+ iter = rt_begin_iterate(radixtree);
+ else
+ iter = rt_begin_iterate_shared(radixtree);
+#else
+ iter = rt_begin_iterate(radixtree);
+#endif
+
+ for (int i = 0; i < num_keys; i++)
+ {
+ uint64 expected;
+ uint64 iterkey;
+ TestValueType *iterval;
+
+ /* skip duplicate keys */
+ if (i < num_keys - 1 && keys[i + 1] == keys[i])
+ continue;
+
+ expected = keys[i];
+ iterval = rt_iterate_next(iter, &iterkey);
+
+ EXPECT_TRUE(iterval != NULL);
+ EXPECT_EQ_U64(iterkey, expected);
+ EXPECT_EQ_U64(*iterval, expected);
+ }
+
+ rt_end_iterate(iter);
+}
+
/* Basic set, find, and delete tests */
static void
test_basic(rt_node_class_test_elem *test_info, int shift, bool asc)
{
MemoryContext radixtree_ctx;
rt_radix_tree *radixtree;
- rt_iter *iter;
uint64 *keys;
int children = test_info->nkeys;
#ifdef TEST_SHARED_RT
@@ -250,28 +324,12 @@ test_basic(rt_node_class_test_elem *test_info, int shift, bool asc)
}
/* test that iteration returns the expected keys and values */
- iter = rt_begin_iterate(radixtree);
-
- for (int i = 0; i < children; i++)
- {
- uint64 expected;
- uint64 iterkey;
- TestValueType *iterval;
-
- /* iteration is ordered by key, so adjust expected value accordingly */
- if (asc)
- expected = keys[i];
- else
- expected = keys[children - 1 - i];
-
- iterval = rt_iterate_next(iter, &iterkey);
-
- EXPECT_TRUE(iterval != NULL);
- EXPECT_EQ_U64(iterkey, expected);
- EXPECT_EQ_U64(*iterval, expected);
- }
+ test_iterate_basic(radixtree, keys, children, asc, false);
- rt_end_iterate(iter);
+#ifdef TEST_SHARED_RT
+ /* test shared-iteration as well */
+ test_iterate_basic(radixtree, keys, children, asc, true);
+#endif
/* delete all keys again */
for (int i = 0; i < children; i++)
@@ -302,7 +360,6 @@ test_random(void)
{
MemoryContext radixtree_ctx;
rt_radix_tree *radixtree;
- rt_iter *iter;
pg_prng_state state;
/* limit memory usage by limiting the key space */
@@ -395,27 +452,12 @@ test_random(void)
}
/* test that iteration returns the expected keys and values */
- iter = rt_begin_iterate(radixtree);
-
- for (int i = 0; i < num_keys; i++)
- {
- uint64 expected;
- uint64 iterkey;
- TestValueType *iterval;
+ test_iterate_random(radixtree, keys, num_keys, false);
- /* skip duplicate keys */
- if (i < num_keys - 1 && keys[i + 1] == keys[i])
- continue;
-
- expected = keys[i];
- iterval = rt_iterate_next(iter, &iterkey);
-
- EXPECT_TRUE(iterval != NULL);
- EXPECT_EQ_U64(iterkey, expected);
- EXPECT_EQ_U64(*iterval, expected);
- }
-
- rt_end_iterate(iter);
+#ifdef TEST_SHARED_RT
+ /* test shared-iteration as well */
+ test_iterate_random(radixtree, keys, num_keys, true);
+#endif
/* reset random number generator for deletion */
pg_prng_seed(&state, seed);
--
2.43.5
v5-0007-Add-TidStoreNumBlocks-API-to-get-the-number-of-bl.patchapplication/octet-stream; name=v5-0007-Add-TidStoreNumBlocks-API-to-get-the-number-of-bl.patchDownload
From 275f367616e608794e9da6869428d55027c55368 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 13 Dec 2024 16:55:52 -0800
Subject: [PATCH v5 7/8] Add TidStoreNumBlocks API to get the number of blocks
in TidStore.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/common/tidstore.c | 12 ++++++++++++
src/include/access/tidstore.h | 1 +
2 files changed, 13 insertions(+)
diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index 637d26012d2..18d0e855ab2 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -596,6 +596,18 @@ TidStoreMemoryUsage(TidStore *ts)
return local_ts_memory_usage(ts->tree.local);
}
+/*
+ * Return the number of entries in TidStore.
+ */
+BlockNumber
+TidStoreNumBlocks(TidStore *ts)
+{
+ if (TidStoreIsShared(ts))
+ return shared_ts_num_keys(ts->tree.shared);
+ else
+ return local_ts_num_keys(ts->tree.local);
+}
+
/*
* Return the DSA area where the TidStore lives.
*/
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index f20c9a92e55..1566cb47593 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -51,6 +51,7 @@ extern int TidStoreGetBlockOffsets(TidStoreIterResult *result,
int max_offsets);
extern void TidStoreEndIterate(TidStoreIter *iter);
extern size_t TidStoreMemoryUsage(TidStore *ts);
+extern BlockNumber TidStoreNumBlocks(TidStore *ts);
extern dsa_pointer TidStoreGetHandle(TidStore *ts);
extern dsa_area *TidStoreGetDSA(TidStore *ts);
--
2.43.5
v5-0002-Remember-the-number-of-times-parallel-index-vacuu.patchapplication/octet-stream; name=v5-0002-Remember-the-number-of-times-parallel-index-vacuu.patchDownload
From eda66e24fbf9357e9c4af262ebb8e2e0d2c46ac0 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 13 Dec 2024 15:54:32 -0800
Subject: [PATCH v5 2/8] Remember the number of times parallel index
vacuuming/cleanup is executed in ParallelVacuumState.
Previously, the caller can passes an arbitrary value for
'num_index_scans' to parallel index vacuuming or cleaning up APIs, but
it didn't make sense since if the caller needs to be careful about
counting how many times it executed index vacuuming or cleaning
up. Otherwise, it fails to reinitialize parallel DSM.
This commit changes parallel vacuum APIs so that ParallelVacuumState
has the counter num_index_scans and re-initialize parallel DSM based
on that.
An upcoming patch for parallel table scan will do a similar thing.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/heap/vacuumlazy.c | 4 +---
src/backend/commands/vacuumparallel.c | 27 +++++++++++++++------------
src/include/commands/vacuum.h | 4 +---
3 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05406a0bc5a..61b77af09b1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2143,8 +2143,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, old_live_tuples,
- vacrel->num_index_scans);
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2514,7 +2513,6 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
{
/* Outsource everything to parallel variant */
parallel_vacuum_cleanup_all_indexes(vacrel->pvs, reltuples,
- vacrel->num_index_scans,
estimated_count);
}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 67cba17a564..50dd3d7d14d 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -200,6 +200,9 @@ struct ParallelVacuumState
*/
bool *will_parallel_vacuum;
+ /* How many time index vacuuming or cleaning up is executed? */
+ int num_index_scans;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -223,8 +226,7 @@ struct ParallelVacuumState
static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
bool *will_parallel_vacuum);
-static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
- bool vacuum);
+static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
@@ -497,8 +499,7 @@ parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
* Do parallel index bulk-deletion with parallel workers.
*/
void
-parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
- int num_index_scans)
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
{
Assert(!IsParallelWorker());
@@ -509,7 +510,7 @@ parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tup
pvs->shared->reltuples = num_table_tuples;
pvs->shared->estimated_count = true;
- parallel_vacuum_process_all_indexes(pvs, num_index_scans, true);
+ parallel_vacuum_process_all_indexes(pvs, true);
}
/*
@@ -517,7 +518,7 @@ parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tup
*/
void
parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
- int num_index_scans, bool estimated_count)
+ bool estimated_count)
{
Assert(!IsParallelWorker());
@@ -529,7 +530,7 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
pvs->shared->reltuples = num_table_tuples;
pvs->shared->estimated_count = estimated_count;
- parallel_vacuum_process_all_indexes(pvs, num_index_scans, false);
+ parallel_vacuum_process_all_indexes(pvs, false);
}
/*
@@ -608,8 +609,7 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
* must be used by the parallel vacuum leader process.
*/
static void
-parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
- bool vacuum)
+parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
{
int nworkers;
PVIndVacStatus new_status;
@@ -631,7 +631,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
nworkers = pvs->nindexes_parallel_cleanup;
/* Add conditionally parallel-aware indexes if in the first time call */
- if (num_index_scans == 0)
+ if (pvs->num_index_scans == 0)
nworkers += pvs->nindexes_parallel_condcleanup;
}
@@ -659,7 +659,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
indstats->parallel_workers_can_process =
(pvs->will_parallel_vacuum[i] &&
parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
- num_index_scans,
+ pvs->num_index_scans,
vacuum));
}
@@ -670,7 +670,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (pvs->num_index_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -764,6 +764,9 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
VacuumSharedCostBalance = NULL;
VacuumActiveNWorkers = NULL;
}
+
+ /* Increment the counter */
+ pvs->num_index_scans++;
}
/*
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 759f9a87d38..7613d00e26f 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -366,11 +366,9 @@ extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
extern void parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs);
extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
- long num_table_tuples,
- int num_index_scans);
+ long num_table_tuples);
extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
- int num_index_scans,
bool estimated_count);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
--
2.43.5
v5-0003-Support-parallel-heap-scan-during-lazy-vacuum.patchapplication/octet-stream; name=v5-0003-Support-parallel-heap-scan-during-lazy-vacuum.patchDownload
From 714952711dab3bb3940ed9792caafdfe7e4f7672 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 1 Jul 2024 15:17:46 +0900
Subject: [PATCH v5 3/8] Support parallel heap scan during lazy vacuum.
Commit 40d964ec99 allowed vacuum command to process indexes in
parallel. This change extends the parallel vacuum to support parallel
heap scan during lazy vacuum.
---
doc/src/sgml/ref/vacuum.sgml | 58 +-
src/backend/access/heap/heapam_handler.c | 6 +
src/backend/access/heap/vacuumlazy.c | 929 ++++++++++++++++++++---
src/backend/commands/vacuumparallel.c | 305 ++++++--
src/backend/storage/ipc/procarray.c | 74 --
src/include/access/heapam.h | 8 +
src/include/access/tableam.h | 88 +++
src/include/commands/vacuum.h | 8 +-
src/include/utils/snapmgr.h | 2 +-
src/include/utils/snapmgr_internal.h | 89 +++
src/tools/pgindent/typedefs.list | 3 +
11 files changed, 1330 insertions(+), 240 deletions(-)
create mode 100644 src/include/utils/snapmgr_internal.h
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 9110938fab6..aae0bbcd577 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -277,27 +277,43 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<varlistentry>
<term><literal>PARALLEL</literal></term>
<listitem>
- <para>
- Perform index vacuum and index cleanup phases of <command>VACUUM</command>
- in parallel using <replaceable class="parameter">integer</replaceable>
- background workers (for the details of each vacuum phase, please
- refer to <xref linkend="vacuum-phases"/>). The number of workers used
- to perform the operation is equal to the number of indexes on the
- relation that support parallel vacuum which is limited by the number of
- workers specified with <literal>PARALLEL</literal> option if any which is
- further limited by <xref linkend="guc-max-parallel-maintenance-workers"/>.
- An index can participate in parallel vacuum if and only if the size of the
- index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
- Please note that it is not guaranteed that the number of parallel workers
- specified in <replaceable class="parameter">integer</replaceable> will be
- used during execution. It is possible for a vacuum to run with fewer
- workers than specified, or even with no workers at all. Only one worker
- can be used per index. So parallel workers are launched only when there
- are at least <literal>2</literal> indexes in the table. Workers for
- vacuum are launched before the start of each phase and exit at the end of
- the phase. These behaviors might change in a future release. This
- option can't be used with the <literal>FULL</literal> option.
- </para>
+ <para>
+ Perform scanning heap, index vacuum, and index cleanup phases of
+ <command>VACUUM</command> in parallel using
+ <replaceable class="parameter">integer</replaceable> background workers
+ (for the details of each vacuum phase, please refer to
+ <xref linkend="vacuum-phases"/>).
+ </para>
+ <para>
+ For heap tables, the number of workers used to perform the scanning
+ heap is determined based on the size of table. A table can participate in
+ parallel scanning heap if and only if the size of the table is more than
+ <xref linkend="guc-min-parallel-table-scan-size"/>. During scanning heap,
+ the heap table's blocks will be divided into ranges and shared among the
+ cooperating processes. Each worker process will complete the scanning of
+ its given range of blocks before requesting an additional range of blocks.
+ </para>
+ <para>
+ The number of workers used to perform parallel index vacuum and index
+ cleanup is equal to the number of indexes on the relation that support
+ parallel vacuum. An index can participate in parallel vacuum if and only
+ if the size of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
+ Only one worker can be used per index. So parallel workers for index vacuum
+ and index cleanup are launched only when there are at least <literal>2</literal>
+ indexes in the table.
+ </para>
+ <para>
+ Workers for vacuum are launched before the start of each phase and exit
+ at the end of the phase. The number of workers for each phase is limited by
+ the number of workers specified with <literal>PARALLEL</literal> option if
+ any which is futher limited by <xref linkend="guc-max-parallel-maintenance-workers"/>.
+ Please note that in any parallel vacuum phase, it is not guaanteed that the
+ number of parallel workers specified in <replaceable class="parameter">integer</replaceable>
+ will be used during execution. It is possible for a vacuum to run with fewer
+ workers than specified, or even with no workers at all. These behaviors might
+ change in a future release. This option can't be used with the <literal>FULL</literal>
+ option.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2da4e4da13e..598fafae4a0 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2662,6 +2662,12 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_data = heapam_relation_copy_data,
.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
.relation_vacuum = heap_vacuum_rel,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,
+ .parallel_vacuum_estimate = heap_parallel_vacuum_estimate,
+ .parallel_vacuum_initialize = heap_parallel_vacuum_initialize,
+ .parallel_vacuum_relation_worker = heap_parallel_vacuum_worker,
+
.scan_analyze_next_block = heapam_scan_analyze_next_block,
.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
.index_build_range_scan = heapam_index_build_range_scan,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61b77af09b1..2e70bc68d2c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -48,6 +48,7 @@
#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
+#include "optimizer/paths.h"
#include "pgstat.h"
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
@@ -115,10 +116,24 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * Macro to check if we are in a parallel vacuum. If true, we are in the
- * parallel mode and the DSM segment is initialized.
+ * DSM keys for heap parallel vacuum scan. Unlike other parallel execution code, we
+ * we don't need to worry about DSM keys conflicting with plan_node_id, but need to
+ * avoid conflicting with DSM keys used in vacuumparallel.c.
+ */
+#define LV_PARALLEL_KEY_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_KEY_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_KEY_SCAN_DESC_WORKER 0xFFFF0003
+
+/*
+ * Macros to check if we are in parallel heap vacuuming, parallel index vacuuming,
+ * or both. If ParallelVacuumIsActive() is true, we are in the parallel mode, meaning
+ * that we have dead items TIDs on shared memory area.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
+#define ParallelIndexVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_index((vacrel)->pvs) > 0)
+#define ParallelHeapVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_table((vacrel)->pvs) > 0)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -172,6 +187,87 @@ typedef struct LVRelScanState
bool skippedallvis;
} LVRelScanState;
+/*
+ * Struct for information that needs to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
+ /* The current oldest extant XID/MXID shared by the leader process */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+
+ /*
+ * Have we skipped any all-visible pages?
+ *
+ * The final value is OR of worker's skippedallvis.
+ */
+ bool skippedallvis;
+
+ /* VACUUM operation's cutoffs for freezing and pruning */
+ struct VacuumCutoffs cutoffs;
+ GlobalVisState vistest;
+
+ /* per-worker scan stats for parallel heap vacuum scan */
+ LVRelScanState worker_scan_state[FLEXIBLE_ARRAY_MEMBER];
+} PHVShared;
+#define SizeOfPHVShared (offsetof(PHVShared, worker_scan_state))
+
+/* Per-worker scan state for parallel heap vacuum scan */
+typedef struct PHVScanWorkerState
+{
+ bool initialized;
+
+ /* per-worker parallel table scan state */
+ ParallelBlockTableScanWorkerData state;
+
+ /*
+ * True if a parallel vacuum scan worker allocated blocks in state but
+ * might have not scanned all of them. The leader process will take over
+ * for scanning these remaining blocks.
+ */
+ bool maybe_have_blocks;
+
+ /* last block number the worker scanned */
+ BlockNumber last_blkno;
+} PHVScanWorkerState;
+
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
+
+ /*
+ * Points to all per-worker scan state array stored on DSM area.
+ *
+ * During parallel heap scan, each worker allocates some chunks of blocks
+ * to scan in its scan state, and could exit while leaving some chunks
+ * un-scanned if the size of dead_items TIDs is close to overrunning the
+ * the available space. We store the scan states on shared memory area so
+ * that workers can resume heap scans from the previous point.
+ */
+ PHVScanWorkerState *scanstates;
+
+ /* Assigned per-worker scan state */
+ PHVScanWorkerState *myscanstate;
+
+ /*
+ * All blocks up to this value has been scanned, i.e. the minimum of all
+ * PHVScanWorkerState->last_blkno. This field is updated by
+ * parallel_heap_vacuum_compute_min_scanned_blkno().
+ */
+ BlockNumber min_scanned_blkno;
+
+ /* The number of workers launched for parallel heap vacuum */
+ int nworkers_launched;
+} PHVState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -183,6 +279,9 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
+ /* Parallel heap vacuum state and sizes for each struct */
+ PHVState *phvstate;
+
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
@@ -223,6 +322,8 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
+ BlockNumber next_fsm_block_to_vacuum; /* next block to check for FSM
+ * vacuum */
/* Working state for heap scanning and vacuuming */
LVRelScanState *scan_state;
@@ -254,8 +355,11 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
+static bool do_lazy_scan_heap(LVRelState *vacrel);
static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
bool *all_visible_according_to_vm);
+static bool heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm);
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -296,6 +400,11 @@ static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
+static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
+static void parallel_heap_vacuum_compute_min_scanned_blkno(LVRelState *vacrel);
+static void parallel_heap_vacuum_gather_scan_results(LVRelState *vacrel);
+static void parallel_heap_complete_unfinished_scan(LVRelState *vacrel);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -432,6 +541,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
+ vacrel->next_fsm_block_to_vacuum = 0;
+
/* Initialize page counters explicitly (be tidy) */
scan_state = palloc(sizeof(LVRelScanState));
scan_state->scanned_pages = 0;
@@ -452,6 +563,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->scan_state = scan_state;
/* dead_items_alloc allocates vacrel->dead_items later on */
+ /* dead_items_alloc allocates vacrel->dead_items later on */
+
/* Allocate/initialize output statistics state */
vacrel->new_rel_tuples = 0;
vacrel->new_live_tuples = 0;
@@ -861,12 +974,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- blkno,
- next_fsm_block_to_vacuum = 0;
- bool all_visible_according_to_vm;
-
- Buffer vmbuffer = InvalidBuffer;
+ BlockNumber rel_pages = vacrel->rel_pages;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -886,12 +994,93 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
- while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap using parallel workers.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ {
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* We must have scanned all heap pages */
+ Assert(scan_done);
+ }
+
+ /* report that everything is now scanned */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, rel_pages);
+
+ /* now we can compute the new value for pg_class.reltuples */
+ vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
+ vacrel->scan_state->scanned_pages,
+ vacrel->scan_state->live_tuples);
+
+ /*
+ * Also compute the total number of surviving heap entries. In the
+ * (unlikely) scenario that new_live_tuples is -1, take it as zero.
+ */
+ vacrel->new_rel_tuples =
+ Max(vacrel->new_live_tuples, 0) + vacrel->scan_state->recently_dead_tuples +
+ vacrel->scan_state->missed_dead_tuples;
+
+ /*
+ * Do index vacuuming (call each index's ambulkdelete routine), then do
+ * related heap vacuuming
+ */
+ if (vacrel->dead_items_info->num_items > 0)
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the remainder of the Free Space Map. We must do this whether or
+ * not there were indexes, and whether or not we bypassed index vacuuming.
+ */
+ if (rel_pages > vacrel->next_fsm_block_to_vacuum)
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ rel_pages);
+
+ /* report all blocks vacuumed */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, rel_pages);
+
+ /* Do final index cleanup (call each index's amvacuumcleanup routine) */
+ if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
+ lazy_cleanup_all_indexes(vacrel);
+}
+
+/*
+ * Workhorse for lazy_scan_heap().
+ *
+ * Return true if we processed all blocks, otherwise false if we exit from this function
+ * while not completing the heap scan due to full of dead item TIDs. In serial heap scan
+ * case, this function always returns true. In parallel heap vacuum scan, this function
+ * is called by both worker processes and the leader process, and could return false.
+ */
+static bool
+do_lazy_scan_heap(LVRelState *vacrel)
+{
+ bool all_visible_according_to_vm;
+ BlockNumber blkno;
+ Buffer vmbuffer = InvalidBuffer;
+ bool scan_done = true;
+
+ while (true)
{
Buffer buf;
Page page;
bool has_lpdead_items;
bool got_cleanup_lock = false;
+ bool got_blkno;
+
+ /* Get the next block for vacuum to process */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ got_blkno = heap_vac_scan_next_block_parallel(vacrel, &blkno, &all_visible_according_to_vm);
+ else
+ got_blkno = heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm);
+
+ if (!got_blkno)
+ break;
vacrel->scan_state->scanned_pages++;
@@ -911,46 +1100,10 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scan_state->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (!IsParallelWorker() &&
+ vacrel->scan_state->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
- /*
- * Consider if we definitely have enough space to process TIDs on page
- * already. If we are close to overrunning the available space for
- * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
- * this page.
- */
- if (TidStoreMemoryUsage(vacrel->dead_items) > vacrel->dead_items_info->max_bytes)
- {
- /*
- * Before beginning index vacuuming, we release any pin we may
- * hold on the visibility map page. This isn't necessary for
- * correctness, but we do it anyway to avoid holding the pin
- * across a lengthy, unrelated operation.
- */
- if (BufferIsValid(vmbuffer))
- {
- ReleaseBuffer(vmbuffer);
- vmbuffer = InvalidBuffer;
- }
-
- /* Perform a round of index and heap vacuuming */
- vacrel->consider_bypass_optimization = false;
- lazy_vacuum(vacrel);
-
- /*
- * Vacuum the Free Space Map to make newly-freed space visible on
- * upper-level FSM pages. Note we have not yet processed blkno.
- */
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
-
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
- }
-
/*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
@@ -1039,9 +1192,10 @@ lazy_scan_heap(LVRelState *vacrel)
* revisit this page. Since updating the FSM is desirable but not
* absolutely required, that's OK.
*/
- if (vacrel->nindexes == 0
- || !vacrel->do_index_vacuuming
- || !has_lpdead_items)
+ if (!IsParallelWorker() &&
+ (vacrel->nindexes == 0
+ || !vacrel->do_index_vacuuming
+ || !has_lpdead_items))
{
Size freespace = PageGetHeapFreeSpace(page);
@@ -1055,57 +1209,178 @@ lazy_scan_heap(LVRelState *vacrel)
* held the cleanup lock and lazy_scan_prune() was called.
*/
if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
- blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ blkno - vacrel->next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
+ BlockNumber fsm_vac_up_to;
+
+ /*
+ * If parallel heap vacuum scan is active, compute the minimum
+ * block number we scanned so far.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ parallel_heap_vacuum_compute_min_scanned_blkno(vacrel);
+ fsm_vac_up_to = vacrel->phvstate->min_scanned_blkno;
+ }
+ else
+ {
+ /* blkno is already processed */
+ fsm_vac_up_to = blkno + 1;
+ }
+
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ fsm_vac_up_to);
+ vacrel->next_fsm_block_to_vacuum = fsm_vac_up_to;
}
}
else
UnlockReleaseBuffer(buf);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(vacrel->dead_items) > vacrel->dead_items_info->max_bytes)
+ {
+ /*
+ * Before beginning index vacuuming, we release any pin we may
+ * hold on the visibility map page. This isn't necessary for
+ * correctness, but we do it anyway to avoid holding the pin
+ * across a lengthy, unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
+ /*
+ * In parallel heap scan, we pause the heap scan without invoking
+ * index and heap vacuuming, and return to the caller with
+ * scan_done being false. The parallel vacuum workers will exit as
+ * theirs jobs are done. The leader process will wait for all
+ * workers to finish and perform index and heap vacuuming, and
+ * then performs FSM vacuum too.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ /* Remember the last scanned block */
+ vacrel->phvstate->myscanstate->last_blkno = blkno;
+
+ /* Remember we might have some unprocessed blocks */
+ scan_done = false;
+
+ break;
+ }
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = blkno;
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ continue;
+ }
}
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
- /* report that everything is now scanned */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ return scan_done;
+}
- /* now we can compute the new value for pg_class.reltuples */
- vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scan_state->scanned_pages,
- vacrel->scan_state->live_tuples);
+/*
+ * A parallel scan variant of heap_vac_scan_next_block(). Similar to
+ * heap_vac_scan_next_block(), the block number and visibility status of the next
+ * block to process are set in *blkno and *all_visible_according_to_vm. The return
+ * value is false if there are no further blocks to process.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
+{
+ PHVState *phvstate = vacrel->phvstate;
+ BlockNumber next_block;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 mapbits = 0;
- /*
- * Also compute the total number of surviving heap entries. In the
- * (unlikely) scenario that new_live_tuples is -1, take it as zero.
- */
- vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->scan_state->recently_dead_tuples +
- vacrel->scan_state->missed_dead_tuples;
+ Assert(ParallelHeapVacuumIsActive(vacrel));
- /*
- * Do index vacuuming (call each index's ambulkdelete routine), then do
- * related heap vacuuming
- */
- if (vacrel->dead_items_info->num_items > 0)
- lazy_vacuum(vacrel);
+ for (;;)
+ {
+ next_block = table_block_parallelscan_nextpage(vacrel->rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
- /*
- * Vacuum the remainder of the Free Space Map. We must do this whether or
- * not there were indexes, and whether or not we bypassed index vacuuming.
- */
- if (blkno > next_fsm_block_to_vacuum)
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
+ /* Have we reached the end of the table? */
+ if (!BlockNumberIsValid(next_block) || next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
- /* report all blocks vacuumed */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
- /* Do final index cleanup (call each index's amvacuumcleanup routine) */
- if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
- lazy_cleanup_all_indexes(vacrel);
+ /* We always treat the last block as unsafe to skip */
+ if (next_block == vacrel->rel_pages - 1)
+ break;
+
+ mapbits = visibilitymap_get_status(vacrel->rel, next_block, &vmbuffer);
+
+ /*
+ * A block is unskippable if it is not all visible according to the
+ * visibility map.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
+ break;
+ }
+
+ /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
+ if (!vacrel->skipwithvm)
+ break;
+
+ /*
+ * Aggressive VACUUM caller can't skip pages just because they are
+ * all-visible.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+ if (vacrel->aggressive)
+ break;
+
+ /*
+ * All-visible block is safe to skip in non-aggressive case. But
+ * remember that the final range contains such a block for later.
+ */
+ vacrel->scan_state->skippedallvis = true;
+ }
+ }
+
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
+
+ *blkno = next_block;
+ *all_visible_according_to_vm = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ return true;
}
/*
@@ -1254,11 +1529,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/*
* Caller must scan the last page to determine whether it has tuples
- * (caller must have the opportunity to set vacrel->nonempty_pages).
- * This rule avoids having lazy_truncate_heap() take access-exclusive
- * lock on rel to attempt a truncation that fails anyway, just because
- * there are tuples on the last page (it is likely that there will be
- * tuples on other nearby pages as well, but those can be skipped).
+ * (caller must have the opportunity to set
+ * vacrel->scan_state->nonempty_pages). This rule avoids having
+ * lazy_truncate_heap() take access-exclusive lock on rel to attempt a
+ * truncation that fails anyway, just because there are tuples on the
+ * last page (it is likely that there will be tuples on other nearby
+ * pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
@@ -2117,7 +2393,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2493,7 +2769,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2943,12 +3219,8 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- /*
- * Initialize state for a parallel vacuum. As of now, only one worker can
- * be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
- */
- if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
+ /* Initialize state for a parallel vacuum */
+ if (nworkers >= 0)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -2966,11 +3238,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
+ {
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes.
+ * Since only one worker can be used for an index, we will invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
+ }
/*
* If parallel mode started, dead_items and dead_items_info spaces are
@@ -3010,9 +3291,19 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
};
int64 prog_val[2];
+ /*
+ * Protect both dead_items and dead_items_info from concurrent updates in
+ * parallel heap scan cases.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreLockExclusive(vacrel->dead_items);
+
TidStoreSetBlockOffsets(vacrel->dead_items, blkno, offsets, num_offsets);
vacrel->dead_items_info->num_items += num_offsets;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreUnlock(vacrel->dead_items);
+
/* update the progress information */
prog_val[0] = vacrel->dead_items_info->num_items;
prog_val[1] = TidStoreMemoryUsage(vacrel->dead_items);
@@ -3212,6 +3503,448 @@ update_relstats_all_indexes(LVRelState *vacrel)
}
}
+/*
+ * Compute the number of parallel workers for parallel vacuum heap scan.
+ *
+ * The calculation logic is borrowed from compute_parallel_worker().
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ int parallel_workers = 0;
+ int heap_parallel_threshold;
+ int heap_pages;
+
+ if (nrequested == 0)
+ {
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. Note that the upper limit of the
+ * min_parallel_table_scan_size GUC is chosen to prevent overflow
+ * here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
+ {
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
+ }
+ }
+ else
+ parallel_workers = nrequested;
+
+ return parallel_workers;
+}
+
+/* Estimate shared memory sizes required for parallel heap vacuum */
+static inline void
+heap_parallel_estimate_shared_memory_size(Relation rel, int nworkers, Size *pscan_len,
+ Size *shared_len, Size *pscanwork_len)
+{
+ Size size = 0;
+
+ size = add_size(size, SizeOfPHVShared);
+ size = add_size(size, mul_size(sizeof(LVRelScanState), nworkers));
+ *shared_len = size;
+
+ *pscan_len = table_block_parallelscan_estimate(rel);
+
+ *pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+}
+
+/*
+ * Compute the amount of space we'll need in the parallel heap vacuum
+ * DSM, and inform pcxt->estimator about our needs.
+ *
+ * nworkers is the number of workers for the table vacuum. Note that it could
+ * be different than pcxt->nworkers since it is the maximum of number of
+ * workers for table vacuum and index vacuum.
+ */
+void
+heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ /* space for PHVShared */
+ shm_toc_estimate_chunk(&pcxt->estimator, shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ shm_toc_estimate_chunk(&pcxt->estimator, pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up shared memory for parallel heap vacuum.
+ */
+void
+heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ LVRelState *vacrel = (LVRelState *) state;
+ PHVState *phvstate = vacrel->phvstate;
+ ParallelBlockTableScanDesc pscan;
+ PHVScanWorkerState *pscanwork;
+ PHVShared *shared;
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ phvstate = (PHVState *) palloc0(sizeof(PHVState));
+ phvstate->min_scanned_blkno = InvalidBlockNumber;
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ shared = shm_toc_allocate(pcxt->toc, shared_len);
+
+ /* Prepare the shared information */
+
+ MemSet(shared, 0, shared_len);
+ shared->aggressive = vacrel->aggressive;
+ shared->skipwithvm = vacrel->skipwithvm;
+ shared->cutoffs = vacrel->cutoffs;
+ shared->NewRelfrozenXid = vacrel->scan_state->NewRelfrozenXid;
+ shared->NewRelminMxid = vacrel->scan_state->NewRelminMxid;
+ shared->skippedallvis = vacrel->scan_state->skippedallvis;
+
+ /*
+ * XXX: we copy the contents of vistest to the shared area, but in order
+ * to do that, we need to either expose GlobalVisTest or to provide
+ * functions to copy contents of GlobalVisTest to somewhere. Currently we
+ * do the former but not sure it's the best choice.
+ *
+ * Alternative idea is to have each worker determine cutoff and have their
+ * own vistest. But we need to carefully consider it since parallel
+ * workers end up having different cutoff and horizon.
+ */
+ shared->vistest = *vacrel->vistest;
+
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_KEY_SCAN_SHARED, shared);
+
+ phvstate->shared = shared;
+
+ /* prepare the parallel block table scan description */
+ pscan = shm_toc_allocate(pcxt->toc, pscan_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_KEY_SCAN_DESC, pscan);
+
+ /* initialize parallel scan description */
+ table_block_parallelscan_initialize(rel, (ParallelTableScanDesc) pscan);
+
+ /* Disable sync scan to always start from the first block */
+ pscan->base.phs_syncscan = false;
+
+ phvstate->pscandesc = pscan;
+
+ /* prepare the workers' parallel block table scan state */
+ pscanwork = shm_toc_allocate(pcxt->toc, pscanwork_len);
+ MemSet(pscanwork, 0, pscanwork_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_KEY_SCAN_DESC_WORKER, pscanwork);
+ phvstate->scanstates = pscanwork;
+
+ vacrel->phvstate = phvstate;
+}
+
+/*
+ * Main function for parallel heap vacuum workers.
+ */
+void
+heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ LVRelState vacrel = {0};
+ PHVState *phvstate;
+ PHVShared *shared;
+ ParallelBlockTableScanDesc pscandesc;
+ PHVScanWorkerState *scanstate;
+ LVRelScanState *scan_state;
+ ErrorContextCallback errcallback;
+ bool scan_done;
+
+ phvstate = palloc(sizeof(PHVState));
+
+ pscandesc = (ParallelBlockTableScanDesc) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_KEY_SCAN_DESC,
+ false);
+ phvstate->pscandesc = pscandesc;
+
+ shared = (PHVShared *) shm_toc_lookup(pwcxt->toc, LV_PARALLEL_KEY_SCAN_SHARED,
+ false);
+ phvstate->shared = shared;
+
+ scanstate = (PHVScanWorkerState *) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_KEY_SCAN_DESC_WORKER,
+ false);
+
+ phvstate->myscanstate = &(scanstate[ParallelWorkerNumber]);
+ scan_state = &(shared->worker_scan_state[ParallelWorkerNumber]);
+
+ /* Prepare LVRelState */
+ vacrel.rel = rel;
+ vacrel.indrels = parallel_vacuum_get_table_indexes(pvs, &vacrel.nindexes);
+ vacrel.pvs = pvs;
+ vacrel.phvstate = phvstate;
+ vacrel.aggressive = shared->aggressive;
+ vacrel.skipwithvm = shared->skipwithvm;
+ vacrel.cutoffs = shared->cutoffs;
+ vacrel.vistest = &(shared->vistest);
+ vacrel.dead_items = parallel_vacuum_get_dead_items(pvs,
+ &vacrel.dead_items_info);
+ vacrel.rel_pages = RelationGetNumberOfBlocks(rel);
+ vacrel.scan_state = scan_state;
+
+ /* initialize per-worker relation statistics */
+ MemSet(scan_state, 0, sizeof(LVRelScanState));
+
+ /* Set fields necessary for heap scan */
+ vacrel.scan_state->NewRelfrozenXid = shared->NewRelfrozenXid;
+ vacrel.scan_state->NewRelminMxid = shared->NewRelminMxid;
+ vacrel.scan_state->skippedallvis = shared->skippedallvis;
+
+ /* Initialize the per-worker scan state if not yet */
+ if (!phvstate->myscanstate->initialized)
+ {
+ table_block_parallelscan_startblock_init(rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
+
+ phvstate->myscanstate->last_blkno = InvalidBlockNumber;
+ phvstate->myscanstate->maybe_have_blocks = false;
+ phvstate->myscanstate->initialized = true;
+ }
+
+ /*
+ * Setup error traceback support for ereport() for parallel table vacuum
+ * workers
+ */
+ vacrel.dbname = get_database_name(MyDatabaseId);
+ vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
+ vacrel.relname = pstrdup(RelationGetRelationName(rel));
+ vacrel.indname = NULL;
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = &vacrel;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in its
+ * scan state. Since this scan state might not be used in the next heap
+ * scan, we remember that it might have some unconsumed blocks so that the
+ * leader complete the scans after the heap scan phase finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+}
+
+/*
+ * Complete parallel heaps scans that have remaining blocks in their
+ * chunks.
+ */
+static void
+parallel_heap_complete_unfinished_scan(LVRelState *vacrel)
+{
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+
+ nworkers = parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
+ for (int i = 0; i < nworkers; i++)
+ {
+ PHVScanWorkerState *wstate = &(vacrel->phvstate->scanstates[i]);
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ if (!wstate->maybe_have_blocks)
+ continue;
+
+ /* Attach the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ Assert(scan_done);
+ }
+
+ /*
+ * We don't need to gather the scan results here because the leader's scan
+ * state got updated directly.
+ */
+}
+
+/*
+ * Compute the minimum block number we have scanned so far and update
+ * vacrel->min_scanned_blkno.
+ */
+static void
+parallel_heap_vacuum_compute_min_scanned_blkno(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+
+ /*
+ * We check all worker scan states here to compute the minimum block
+ * number among all scan states.
+ */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ PHVScanWorkerState *wstate = &(phvstate->scanstates[i]);
+
+ /* Skip if no worker has been initialized the scan state */
+ if (!wstate->initialized)
+ continue;
+
+ if (!BlockNumberIsValid(phvstate->min_scanned_blkno) ||
+ wstate->last_blkno < phvstate->min_scanned_blkno)
+ phvstate->min_scanned_blkno = wstate->last_blkno;
+ }
+}
+
+/* Accumulate each worker's scan results into the leader's */
+static void
+parallel_heap_vacuum_gather_scan_results(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* Gather the workers' scan results */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanState *ss = &(phvstate->shared->worker_scan_state[i]);
+
+ vacrel->scan_state->scanned_pages += ss->scanned_pages;
+ vacrel->scan_state->removed_pages += ss->removed_pages;
+ vacrel->scan_state->vm_new_frozen_pages += ss->vm_new_frozen_pages;
+ vacrel->scan_state->lpdead_item_pages += ss->lpdead_item_pages;
+ vacrel->scan_state->missed_dead_pages += ss->missed_dead_pages;
+ vacrel->scan_state->tuples_deleted += ss->tuples_deleted;
+ vacrel->scan_state->tuples_frozen += ss->tuples_frozen;
+ vacrel->scan_state->lpdead_items += ss->lpdead_items;
+ vacrel->scan_state->live_tuples += ss->live_tuples;
+ vacrel->scan_state->recently_dead_tuples += ss->recently_dead_tuples;
+ vacrel->scan_state->missed_dead_tuples += ss->missed_dead_tuples;
+
+ if (ss->nonempty_pages < vacrel->scan_state->nonempty_pages)
+ vacrel->scan_state->nonempty_pages = ss->nonempty_pages;
+
+ if (TransactionIdPrecedes(ss->NewRelfrozenXid, vacrel->scan_state->NewRelfrozenXid))
+ vacrel->scan_state->NewRelfrozenXid = ss->NewRelfrozenXid;
+
+ if (MultiXactIdPrecedesOrEquals(ss->NewRelminMxid, vacrel->scan_state->NewRelminMxid))
+ vacrel->scan_state->NewRelminMxid = ss->NewRelminMxid;
+
+ if (!vacrel->scan_state->skippedallvis && ss->skippedallvis)
+ vacrel->scan_state->skippedallvis = true;
+ }
+
+ /* Also, compute the minimum block number we scanned so far */
+ parallel_heap_vacuum_compute_min_scanned_blkno(vacrel);
+}
+
+/*
+ * A parallel variant of do_lazy_scan_heap(). The leader process launches parallel
+ * workers to scan the heap in parallel.
+ */
+static void
+do_parallel_lazy_scan_heap(LVRelState *vacrel)
+{
+ PHVScanWorkerState *scanstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* launcher workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ /* initialize parallel scan description to join as a worker */
+ scanstate = palloc0(sizeof(PHVScanWorkerState));
+ scanstate->last_blkno = InvalidBlockNumber;
+ table_block_parallelscan_startblock_init(vacrel->rel, &(scanstate->state),
+ vacrel->phvstate->pscandesc);
+ vacrel->phvstate->myscanstate = scanstate;
+
+ for (;;)
+ {
+ bool scan_done;
+
+ /*
+ * Scan the table until either we are close to overrunning the
+ * available space for dead_items TIDs or we reach the end of the
+ * table.
+ */
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* wait for parallel workers to finish and gather scan results */
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+ parallel_heap_vacuum_gather_scan_results(vacrel);
+
+ /* We reach the end of the table */
+ if (scan_done)
+ break;
+
+ /*
+ * The parallel heap scan paused in the middle of the table due to
+ * full of dead_items TIDs. We perform a round of index and heap
+ * vacuuming and FSM vacuum.
+ */
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_scanned_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_scanned_blkno was updated when gathering the workers' scan
+ * results.
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_scanned_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_scanned_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* Re-launch workers to restart parallel heap scan */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+ }
+
+ /*
+ * The parallel heap scan finished, but it's possible that some workers
+ * have allocated blocks but not processed them yet. This can happen for
+ * example when workers exit because they are full of dead_items TIDs and
+ * the leader process could launch fewer workers in the next cycle.
+ */
+ parallel_heap_complete_unfinished_scan(vacrel);
+}
+
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 50dd3d7d14d..3001be84ddf 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -6,15 +6,24 @@
* This file contains routines that are intended to support setting up, using,
* and tearing down a ParallelVacuumState.
*
- * In a parallel vacuum, we perform both index bulk deletion and index cleanup
- * with parallel worker processes. Individual indexes are processed by one
- * vacuum process. ParallelVacuumState contains shared information as well as
- * the memory space for storing dead items allocated in the DSA area. We
- * launch parallel worker processes at the start of parallel index
- * bulk-deletion and index cleanup and once all indexes are processed, the
- * parallel worker processes exit. Each time we process indexes in parallel,
- * the parallel context is re-initialized so that the same DSM can be used for
- * multiple passes of index bulk-deletion and index cleanup.
+ * In a parallel vacuum, we perform table scan or both index bulk deletion and
+ * index cleanup or all of them with parallel worker processes. Different
+ * numbers of workers are launched for the table vacuuming and index processing.
+ * ParallelVacuumState contains shared information as well as the memory space
+ * for storing dead items allocated in the DSA area.
+ *
+ * When initializing parallel table vacuum scan, we invoke table AM routines for
+ * estimating DSM sizes and initializing DSM memory. Parallel table vacuum
+ * workers invoke the table AM routine for vacuuming the table.
+ *
+ * For processing indexes in parallel, individual indexes are processed by one
+ * vacuum process. We launch parallel worker processes at the start of parallel index
+ * bulk-deletion and index cleanup and once all indexes are processed, the parallel
+ * worker processes exit.
+ *
+ * Each time we process table or indexes in parallel, the parallel context is
+ * re-initialized so that the same DSM can be used for multiple passes of table vacuum
+ * or index bulk-deletion and index cleanup.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -28,6 +37,7 @@
#include "access/amapi.h"
#include "access/table.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
@@ -65,6 +75,12 @@ typedef struct PVShared
int elevel;
uint64 queryid;
+ /*
+ * True if the caller wants parallel workers to invoke vacuum table scan
+ * callback.
+ */
+ bool do_vacuum_table_scan;
+
/*
* Fields for both index vacuum and cleanup.
*
@@ -101,6 +117,13 @@ typedef struct PVShared
*/
pg_atomic_uint32 cost_balance;
+ /*
+ * The number of workers for parallel table scan/vacuuming and index
+ * vacuuming, respectively.
+ */
+ int nworkers_for_table;
+ int nworkers_for_index;
+
/*
* Number of active parallel workers. This is used for computing the
* minimum threshold of the vacuum cost balance before a worker sleeps for
@@ -164,6 +187,9 @@ struct ParallelVacuumState
/* NULL for worker processes */
ParallelContext *pcxt;
+ /* Passed to parallel table scan workers. NULL for leader process */
+ ParallelWorkerContext *pwcxt;
+
/* Parent Heap Relation */
Relation heaprel;
@@ -193,6 +219,9 @@ struct ParallelVacuumState
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /* How many times parallel table vacuum scan is called? */
+ int num_table_scans;
+
/*
* False if the index is totally unsuitable target for all parallel
* processing. For example, the index could be <
@@ -224,8 +253,9 @@ struct ParallelVacuumState
PVIndVacStatus status;
};
-static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum);
+static void parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_for_table,
+ int *nworkers_for_index, bool *will_parallel_vacuum);
static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
@@ -244,7 +274,7 @@ static void parallel_vacuum_error_callback(void *arg);
ParallelVacuumState *
parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
int nrequested_workers, int vac_work_mem,
- int elevel, BufferAccessStrategy bstrategy)
+ int elevel, BufferAccessStrategy bstrategy, void *state)
{
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -258,6 +288,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
Size est_shared_len;
int nindexes_mwm = 0;
int parallel_workers = 0;
+ int nworkers_for_table;
+ int nworkers_for_index;
int querylen;
/*
@@ -265,15 +297,17 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
* relation
*/
Assert(nrequested_workers >= 0);
- Assert(nindexes > 0);
/*
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
- nrequested_workers,
- will_parallel_vacuum);
+ parallel_vacuum_compute_workers(rel, indrels, nindexes, nrequested_workers,
+ &nworkers_for_table, &nworkers_for_index,
+ will_parallel_vacuum);
+
+ parallel_workers = Max(nworkers_for_table, nworkers_for_index);
+
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- return NULL */
@@ -329,6 +363,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
else
querylen = 0; /* keep compiler quiet */
+ /* Estimate AM-specific space for parallel table vacuum */
+ if (nworkers_for_table > 0)
+ table_parallel_vacuum_estimate(rel, pcxt, nworkers_for_table, state);
+
InitializeParallelDSM(pcxt);
/* Prepare index vacuum stats */
@@ -373,6 +411,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
shared->relid = RelationGetRelid(rel);
shared->elevel = elevel;
shared->queryid = pgstat_get_my_query_id();
+ shared->nworkers_for_table = nworkers_for_table;
+ shared->nworkers_for_index = nworkers_for_index;
shared->maintenance_work_mem_worker =
(nindexes_mwm > 0) ?
maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
@@ -421,6 +461,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+ /* Prepare AM-specific DSM for parallel table vacuum */
+ if (nworkers_for_table > 0)
+ table_parallel_vacuum_initialize(rel, pcxt, nworkers_for_table, state);
+
/* Success -- return parallel vacuum state */
return pvs;
}
@@ -534,33 +578,48 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
}
/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers.
- * The index is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
+ * Compute the number of parallel worker processes to request for table
+ * vacuum and index vacuum/cleanup.
+ *
+ * For parallel table vacuum, it asks AM-specific routine to compute the
+ * number of parallel worker processes. The result is set to *nworkers_table.
*
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
+ * For parallel index vacuum, The index is eligible for parallel vacuum iff
+ * its size is greater than min_parallel_index_scan_size as invoking workers
+ * for very small indexes can hurt performance. nrequested is the number of
+ * parallel workers that user requested. If nrequested is 0, we compute the
+ * parallel degree based on nindexes, that is the number of indexes that
+ * support parallel vacuum. This function also sets will_parallel_vacuum to
+ * remember indexes that participate in parallel vacuum.
*/
-static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum)
+static void
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_for_table,
+ int *nworkers_for_index, bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
/*
* We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
+ {
+ *nworkers_for_table = 0;
+ *nworkers_for_index = 0;
+ return;
+ }
+
+ /*
+ * Compute the number of workers for parallel table scan. Cap by
+ * max_parallel_maintenance_workers.
+ */
+ parallel_workers_table = Min(table_parallel_vacuum_compute_workers(rel, nrequested),
+ max_parallel_maintenance_workers);
/*
* Compute the number of indexes that can participate in parallel vacuum.
@@ -591,17 +650,18 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
nindexes_parallel--;
/* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Compute the parallel degree for parallel index vacuum */
+ parallel_workers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers_index = Min(parallel_workers_index, max_parallel_maintenance_workers);
+ }
- return parallel_workers;
+ *nworkers_for_table = parallel_workers_table;
+ *nworkers_for_index = parallel_workers_index;
}
/*
@@ -669,8 +729,12 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
- /* Reinitialize parallel context to relaunch parallel workers */
- if (pvs->num_index_scans > 0)
+ /*
+ * Reinitialize parallel context to relaunch parallel workers if we
+ * have used the parallel context for either index vacuuming or table
+ * vacuuming.
+ */
+ if (pvs->num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -982,6 +1046,146 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
return true;
}
+/*
+ * Prepare DSM and shared vacuum delays, and launch parallel workers for parallel
+ * table vacuum. Return the number of parallel workers launched.
+ *
+ * The caller must call parallel_vacuum_table_scan_end() to finish the parallel
+ * table vacuum.
+ */
+int
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->shared->nworkers_for_table == 0)
+ return 0;
+
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ pvs->shared->do_vacuum_table_scan = true;
+
+ if (pvs->num_table_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * The number of workers might vary between table vacuum and index
+ * processing
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, pvs->shared->nworkers_for_table);
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have already
+ * accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+
+ /* Include the worker count for the leader itself */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+ }
+
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for table processing (planned: %d)",
+ "launched %d parallel vacuum workers for table processing (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, pvs->shared->nworkers_for_table)));
+
+ return pvs->pcxt->nworkers_launched;
+}
+
+/*
+ * Wait for all workers for parallel table vacuum scan, and gather statistics.
+ */
+void
+parallel_vacuum_table_scan_end(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->shared->nworkers_for_table == 0)
+ return;
+
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ /* Decrement the worker count for the leader itself */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->shared->do_vacuum_table_scan = false;
+ pvs->num_table_scans++;
+}
+
+/*
+ * Return the array of indexes associated to the given table to be vacuumed.
+ */
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
+{
+ *nindexes = pvs->nindexes;
+
+ return pvs->indrels;
+}
+
+/*
+ * Return the number of workers for parallel table vacuum.
+ */
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_table;
+}
+
+/*
+ * Return the number of workers for parallel index processing.
+ */
+int
+parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_index;
+}
+
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ Assert(VacuumActiveNWorkers);
+ Assert(pvs->shared->do_vacuum_table_scan);
+
+ /* Increment the active worker before starting the table vacuum */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Do table vacuum scan */
+ table_parallel_vacuum_relation_worker(pvs->heaprel, pvs, pvs->pwcxt);
+
+ /*
+ * We have completed the table vacuum so decrement the active worker
+ * count.
+ */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
/*
* Perform work within a launched parallel process.
*
@@ -1033,7 +1237,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
* matched to the leader's one.
*/
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
if (shared->maintenance_work_mem_worker > 0)
maintenance_work_mem = shared->maintenance_work_mem_worker;
@@ -1064,6 +1267,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.relname = pstrdup(RelationGetRelationName(rel));
pvs.heaprel = rel;
+ pvs.pwcxt = palloc(sizeof(ParallelWorkerContext));
+ pvs.pwcxt->toc = toc;
+ pvs.pwcxt->seg = seg;
+
/* These fields will be filled during index vacuum or cleanup */
pvs.indname = NULL;
pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
@@ -1081,8 +1288,16 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
/* Prepare to track buffer usage during parallel execution */
InstrStartParallelQuery();
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ /* Process table to perform vacuum */
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+ }
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c769b1aa3ef..c408183425a 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -99,80 +99,6 @@ typedef struct ProcArrayStruct
int pgprocnos[FLEXIBLE_ARRAY_MEMBER];
} ProcArrayStruct;
-/*
- * State for the GlobalVisTest* family of functions. Those functions can
- * e.g. be used to decide if a deleted row can be removed without violating
- * MVCC semantics: If the deleted row's xmax is not considered to be running
- * by anyone, the row can be removed.
- *
- * To avoid slowing down GetSnapshotData(), we don't calculate a precise
- * cutoff XID while building a snapshot (looking at the frequently changing
- * xmins scales badly). Instead we compute two boundaries while building the
- * snapshot:
- *
- * 1) definitely_needed, indicating that rows deleted by XIDs >=
- * definitely_needed are definitely still visible.
- *
- * 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
- * definitely be removed
- *
- * When testing an XID that falls in between the two (i.e. XID >= maybe_needed
- * && XID < definitely_needed), the boundaries can be recomputed (using
- * ComputeXidHorizons()) to get a more accurate answer. This is cheaper than
- * maintaining an accurate value all the time.
- *
- * As it is not cheap to compute accurate boundaries, we limit the number of
- * times that happens in short succession. See GlobalVisTestShouldUpdate().
- *
- *
- * There are three backend lifetime instances of this struct, optimized for
- * different types of relations. As e.g. a normal user defined table in one
- * database is inaccessible to backends connected to another database, a test
- * specific to a relation can be more aggressive than a test for a shared
- * relation. Currently we track four different states:
- *
- * 1) GlobalVisSharedRels, which only considers an XID's
- * effects visible-to-everyone if neither snapshots in any database, nor a
- * replication slot's xmin, nor a replication slot's catalog_xmin might
- * still consider XID as running.
- *
- * 2) GlobalVisCatalogRels, which only considers an XID's
- * effects visible-to-everyone if neither snapshots in the current
- * database, nor a replication slot's xmin, nor a replication slot's
- * catalog_xmin might still consider XID as running.
- *
- * I.e. the difference to GlobalVisSharedRels is that
- * snapshot in other databases are ignored.
- *
- * 3) GlobalVisDataRels, which only considers an XID's
- * effects visible-to-everyone if neither snapshots in the current
- * database, nor a replication slot's xmin consider XID as running.
- *
- * I.e. the difference to GlobalVisCatalogRels is that
- * replication slot's catalog_xmin is not taken into account.
- *
- * 4) GlobalVisTempRels, which only considers the current session, as temp
- * tables are not visible to other sessions.
- *
- * GlobalVisTestFor(relation) returns the appropriate state
- * for the relation.
- *
- * The boundaries are FullTransactionIds instead of TransactionIds to avoid
- * wraparound dangers. There e.g. would otherwise exist no procarray state to
- * prevent maybe_needed to become old enough after the GetSnapshotData()
- * call.
- *
- * The typedef is in the header.
- */
-struct GlobalVisState
-{
- /* XIDs >= are considered running by some backend */
- FullTransactionId definitely_needed;
-
- /* XIDs < are not considered to be running by any backend */
- FullTransactionId maybe_needed;
-};
-
/*
* Result of ComputeXidHorizons().
*/
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 04afb1a6a66..740b69d35ef 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -21,6 +21,7 @@
#include "access/skey.h"
#include "access/table.h" /* for backward compatibility */
#include "access/tableam.h"
+#include "commands/vacuum.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -401,6 +402,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
+extern int heap_parallel_vacuum_compute_workers(Relation rel, int requested);
+extern void heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index bb32de11ea0..c4f516dda14 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
#include "access/relscan.h"
#include "access/sdir.h"
#include "access/xact.h"
+#include "commands/vacuum.h"
#include "executor/tuptable.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
@@ -654,6 +655,47 @@ typedef struct TableAmRoutine
struct VacuumParams *params,
BufferAccessStrategy bstrategy);
+ /* ------------------------------------------------------------------------
+ * Callbacks for parallel table vacuum.
+ * ------------------------------------------------------------------------
+ */
+
+ /*
+ * Compute the number of parallel workers for parallel table vacuum. The
+ * function must return 0 to disable parallel table vacuum.
+ */
+ int (*parallel_vacuum_compute_workers) (Relation rel, int requested);
+
+ /*
+ * Estimate the size of shared memory that the parallel table vacuum needs
+ * for AM
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_estimate) (Relation rel,
+ ParallelContext *pcxt,
+ int nworkers,
+ void *state);
+
+ /*
+ * Initialize DSM space for parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_initialize) (Relation rel,
+ ParallelContext *pctx,
+ int nworkers,
+ void *state);
+
+ /*
+ * This callback is called for parallel table vacuum workers.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_relation_worker) (Relation rel,
+ ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
+
/*
* Prepare to analyze block `blockno` of `scan`. The scan has been started
* with table_beginscan_analyze(). See also
@@ -1715,6 +1757,52 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
}
+/* ----------------------------------------------------------------------------
+ * Parallel vacuum related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the number of parallel workers for a parallel vacuum scan of this
+ * relation.
+ */
+static inline int
+table_parallel_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
+
+/*
+ * Estimate the size of shared memory needed for a parallel vacuum scan of this
+ * of this relation.
+ */
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Initialize shared memory area for a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Start a parallel table vacuuming for this relation.
+ */
+static inline void
+table_parallel_vacuum_relation_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_relation_worker(rel, pvs, pwcxt);
+}
+
/*
* Prepare to analyze the next block in the read stream. The scan needs to
* have been started with table_beginscan_analyze(). Note that this routine
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 7613d00e26f..b70e50133fa 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -360,7 +360,8 @@ extern void VacuumUpdateCosts(void);
extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
int nindexes, int nrequested_workers,
int vac_work_mem, int elevel,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy,
+ void *state);
extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
@@ -370,6 +371,11 @@ extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
bool estimated_count);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs);
+extern Relation *parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index afc284e9c36..c9d7a39d605 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -17,6 +17,7 @@
#include "utils/relcache.h"
#include "utils/resowner.h"
#include "utils/snapshot.h"
+#include "utils/snapmgr_internal.h"
extern PGDLLIMPORT bool FirstSnapshotSet;
@@ -96,7 +97,6 @@ extern char *ExportSnapshot(Snapshot snapshot);
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
*/
-typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
diff --git a/src/include/utils/snapmgr_internal.h b/src/include/utils/snapmgr_internal.h
new file mode 100644
index 00000000000..241121872b7
--- /dev/null
+++ b/src/include/utils/snapmgr_internal.h
@@ -0,0 +1,89 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapmgr_internal.h
+ * This file contains declarations of structs for snapshot manager
+ * for internal use.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/snapmgr_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SNAPMGR_INTERNAL_H
+#define SNAPMGR_INTERNAL_H
+
+/*
+ * State for the GlobalVisTest* family of functions. Those functions can
+ * e.g. be used to decide if a deleted row can be removed without violating
+ * MVCC semantics: If the deleted row's xmax is not considered to be running
+ * by anyone, the row can be removed.
+ *
+ * To avoid slowing down GetSnapshotData(), we don't calculate a precise
+ * cutoff XID while building a snapshot (looking at the frequently changing
+ * xmins scales badly). Instead we compute two boundaries while building the
+ * snapshot:
+ *
+ * 1) definitely_needed, indicating that rows deleted by XIDs >=
+ * definitely_needed are definitely still visible.
+ *
+ * 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
+ * definitely be removed
+ *
+ * When testing an XID that falls in between the two (i.e. XID >= maybe_needed
+ * && XID < definitely_needed), the boundaries can be recomputed (using
+ * ComputeXidHorizons()) to get a more accurate answer. This is cheaper than
+ * maintaining an accurate value all the time.
+ *
+ * As it is not cheap to compute accurate boundaries, we limit the number of
+ * times that happens in short succession. See GlobalVisTestShouldUpdate().
+ *
+ *
+ * There are three backend lifetime instances of this struct, optimized for
+ * different types of relations. As e.g. a normal user defined table in one
+ * database is inaccessible to backends connected to another database, a test
+ * specific to a relation can be more aggressive than a test for a shared
+ * relation. Currently we track four different states:
+ *
+ * 1) GlobalVisSharedRels, which only considers an XID's
+ * effects visible-to-everyone if neither snapshots in any database, nor a
+ * replication slot's xmin, nor a replication slot's catalog_xmin might
+ * still consider XID as running.
+ *
+ * 2) GlobalVisCatalogRels, which only considers an XID's
+ * effects visible-to-everyone if neither snapshots in the current
+ * database, nor a replication slot's xmin, nor a replication slot's
+ * catalog_xmin might still consider XID as running.
+ *
+ * I.e. the difference to GlobalVisSharedRels is that
+ * snapshot in other databases are ignored.
+ *
+ * 3) GlobalVisDataRels, which only considers an XID's
+ * effects visible-to-everyone if neither snapshots in the current
+ * database, nor a replication slot's xmin consider XID as running.
+ *
+ * I.e. the difference to GlobalVisCatalogRels is that
+ * replication slot's catalog_xmin is not taken into account.
+ *
+ * 4) GlobalVisTempRels, which only considers the current session, as temp
+ * tables are not visible to other sessions.
+ *
+ * GlobalVisTestFor(relation) returns the appropriate state
+ * for the relation.
+ *
+ * The boundaries are FullTransactionIds instead of TransactionIds to avoid
+ * wraparound dangers. There e.g. would otherwise exist no procarray state to
+ * prevent maybe_needed to become old enough after the GetSnapshotData()
+ * call.
+ */
+typedef struct GlobalVisState
+{
+ /* XIDs >= are considered running by some backend */
+ FullTransactionId definitely_needed;
+
+ /* XIDs < are not considered to be running by any backend */
+ FullTransactionId maybe_needed;
+} GlobalVisState;
+
+#endif /* SNAPMGR_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c4e0477c0d4..a0a0c9faadf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1841,6 +1841,9 @@ PGresAttValue
PGresParamDesc
PGresult
PGresult_data
+PHVScanWorkerState
+PHVShared
+PHVState
PIO_STATUS_BLOCK
PLAINTREE
PLAssignStmt
--
2.43.5
v5-0001-Move-lazy-heap-scanning-related-variables-to-stru.patchapplication/octet-stream; name=v5-0001-Move-lazy-heap-scanning-related-variables-to-stru.patchDownload
From 56a15d51dab3fdfc0f9b0e902a1bff2b60551b30 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 15 Nov 2024 14:14:13 -0800
Subject: [PATCH v5 1/8] Move lazy heap scanning related variables to struct
LVRelScanState.
---
src/backend/access/heap/vacuumlazy.c | 300 ++++++++++++++-------------
src/tools/pgindent/typedefs.list | 1 +
2 files changed, 157 insertions(+), 144 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f2ca9430581..05406a0bc5a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -131,6 +131,47 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;
+/*
+ * Relation statistics collected during heap scanning.
+ */
+typedef struct LVRelScanState
+{
+ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation truncation */
+ BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
+
+ /* # pages newly set all-visible in the VM */
+ BlockNumber vm_new_visible_pages;
+
+ /*
+ * # pages newly set all-visible and all-frozen in the VM. This is a
+ * subset of vm_new_visible_pages. That is, vm_new_visible_pages includes
+ * all pages set all-visible, but vm_new_visible_frozen_pages includes
+ * only those which were also set all-frozen.
+ */
+ BlockNumber vm_new_visible_frozen_pages;
+
+ /* # all-visible pages newly set all-frozen in the VM */
+ BlockNumber vm_new_frozen_pages;
+
+ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
+ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+
+ /* Counters that follow are only for scanned_pages */
+ int64 tuples_deleted; /* # deleted from table */
+ int64 tuples_frozen; /* # newly frozen */
+ int64 lpdead_items; /* # deleted from indexes */
+ int64 live_tuples; /* # live tuples remaining */
+ int64 recently_dead_tuples; /* # dead, but not yet removable */
+ int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid. */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+} LVRelScanState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -157,10 +198,6 @@ typedef struct LVRelState
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
- /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
- TransactionId NewRelfrozenXid;
- MultiXactId NewRelminMxid;
- bool skippedallvis;
/* Error reporting state */
char *dbname;
@@ -186,43 +223,18 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
- BlockNumber removed_pages; /* # pages removed by relation truncation */
- BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
-
- /* # pages newly set all-visible in the VM */
- BlockNumber vm_new_visible_pages;
-
- /*
- * # pages newly set all-visible and all-frozen in the VM. This is a
- * subset of vm_new_visible_pages. That is, vm_new_visible_pages includes
- * all pages set all-visible, but vm_new_visible_frozen_pages includes
- * only those which were also set all-frozen.
- */
- BlockNumber vm_new_visible_frozen_pages;
- /* # all-visible pages newly set all-frozen in the VM */
- BlockNumber vm_new_frozen_pages;
+ /* Working state for heap scanning and vacuuming */
+ LVRelScanState *scan_state;
- BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
- BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-
- /* Statistics output by us, for table */
- double new_rel_tuples; /* new estimated total # of tuples */
- double new_live_tuples; /* new estimated total # of live tuples */
+ /* New estimated total # of tuples and total # of live tuples */
+ double new_rel_tuples;
+ double new_live_tuples;
/* Statistics output by index AMs */
IndexBulkDeleteResult **indstats;
/* Instrumentation counters */
int num_index_scans;
- /* Counters that follow are only for scanned_pages */
- int64 tuples_deleted; /* # deleted from table */
- int64 tuples_frozen; /* # newly frozen */
- int64 lpdead_items; /* # deleted from indexes */
- int64 live_tuples; /* # live tuples remaining */
- int64 recently_dead_tuples; /* # dead, but not yet removable */
- int64 missed_dead_tuples; /* # removable, but not removed */
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
@@ -309,6 +321,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
+ LVRelScanState *scan_state;
bool verbose,
instrument,
skipwithvm,
@@ -420,12 +433,23 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
}
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->new_frozen_tuple_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
+ scan_state = palloc(sizeof(LVRelScanState));
+ scan_state->scanned_pages = 0;
+ scan_state->removed_pages = 0;
+ scan_state->new_frozen_tuple_pages = 0;
+ scan_state->lpdead_item_pages = 0;
+ scan_state->missed_dead_pages = 0;
+ scan_state->nonempty_pages = 0;
+ scan_state->tuples_deleted = 0;
+ scan_state->tuples_frozen = 0;
+ scan_state->lpdead_items = 0;
+ scan_state->live_tuples = 0;
+ scan_state->recently_dead_tuples = 0;
+ scan_state->missed_dead_tuples = 0;
+ scan_state->vm_new_visible_pages = 0;
+ scan_state->vm_new_visible_frozen_pages = 0;
+ scan_state->vm_new_frozen_pages = 0;
+ vacrel->scan_state = scan_state;
/* dead_items_alloc allocates vacrel->dead_items later on */
/* Allocate/initialize output statistics state */
@@ -434,19 +458,6 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
- /* Initialize remaining counters (be tidy) */
- vacrel->num_index_scans = 0;
- vacrel->tuples_deleted = 0;
- vacrel->tuples_frozen = 0;
- vacrel->lpdead_items = 0;
- vacrel->live_tuples = 0;
- vacrel->recently_dead_tuples = 0;
- vacrel->missed_dead_tuples = 0;
-
- vacrel->vm_new_visible_pages = 0;
- vacrel->vm_new_visible_frozen_pages = 0;
- vacrel->vm_new_frozen_pages = 0;
-
/*
* Get cutoffs that determine which deleted tuples are considered DEAD,
* not just RECENTLY_DEAD, and which XIDs/MXIDs to freeze. Then determine
@@ -467,9 +478,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
/* Initialize state used to track oldest extant XID/MXID */
- vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
- vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- vacrel->skippedallvis = false;
+ vacrel->scan_state->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
+ vacrel->scan_state->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+ vacrel->scan_state->skippedallvis = false;
skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
{
@@ -550,15 +561,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
* Non-aggressive VACUUMs may advance them by any amount, or not at all.
*/
- Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
+ Assert(vacrel->scan_state->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
vacrel->cutoffs.relfrozenxid,
- vacrel->NewRelfrozenXid));
- Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
+ vacrel->scan_state->NewRelfrozenXid));
+ Assert(vacrel->scan_state->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
vacrel->cutoffs.relminmxid,
- vacrel->NewRelminMxid));
- if (vacrel->skippedallvis)
+ vacrel->scan_state->NewRelminMxid));
+ if (vacrel->scan_state->skippedallvis)
{
/*
* Must keep original relfrozenxid in a non-aggressive VACUUM that
@@ -566,8 +577,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* values will have missed unfrozen XIDs from the pages we skipped.
*/
Assert(!vacrel->aggressive);
- vacrel->NewRelfrozenXid = InvalidTransactionId;
- vacrel->NewRelminMxid = InvalidMultiXactId;
+ vacrel->scan_state->NewRelfrozenXid = InvalidTransactionId;
+ vacrel->scan_state->NewRelminMxid = InvalidMultiXactId;
}
/*
@@ -588,7 +599,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
*/
vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
new_rel_allvisible, vacrel->nindexes > 0,
- vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
+ vacrel->scan_state->NewRelfrozenXid, vacrel->scan_state->NewRelminMxid,
&frozenxid_updated, &minmulti_updated, false);
/*
@@ -604,8 +615,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
- vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples);
+ vacrel->scan_state->recently_dead_tuples +
+ vacrel->scan_state->missed_dead_tuples);
pgstat_progress_end_command();
if (instrument)
@@ -678,21 +689,21 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
- vacrel->removed_pages,
+ vacrel->scan_state->removed_pages,
new_rel_pages,
- vacrel->scanned_pages,
+ vacrel->scan_state->scanned_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->scanned_pages / orig_rel_pages);
+ 100.0 * vacrel->scan_state->scanned_pages / orig_rel_pages);
appendStringInfo(&buf,
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
- (long long) vacrel->tuples_deleted,
+ (long long) vacrel->scan_state->tuples_deleted,
(long long) vacrel->new_rel_tuples,
- (long long) vacrel->recently_dead_tuples);
- if (vacrel->missed_dead_tuples > 0)
+ (long long) vacrel->scan_state->recently_dead_tuples);
+ if (vacrel->scan_state->missed_dead_tuples > 0)
appendStringInfo(&buf,
_("tuples missed: %lld dead from %u pages not removed due to cleanup lock contention\n"),
- (long long) vacrel->missed_dead_tuples,
- vacrel->missed_dead_pages);
+ (long long) vacrel->scan_state->missed_dead_tuples,
+ vacrel->scan_state->missed_dead_pages);
diff = (int32) (ReadNextTransactionId() -
vacrel->cutoffs.OldestXmin);
appendStringInfo(&buf,
@@ -700,33 +711,33 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->cutoffs.OldestXmin, diff);
if (frozenxid_updated)
{
- diff = (int32) (vacrel->NewRelfrozenXid -
+ diff = (int32) (vacrel->scan_state->NewRelfrozenXid -
vacrel->cutoffs.relfrozenxid);
appendStringInfo(&buf,
_("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
- vacrel->NewRelfrozenXid, diff);
+ vacrel->scan_state->NewRelfrozenXid, diff);
}
if (minmulti_updated)
{
- diff = (int32) (vacrel->NewRelminMxid -
+ diff = (int32) (vacrel->scan_state->NewRelminMxid -
vacrel->cutoffs.relminmxid);
appendStringInfo(&buf,
_("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
- vacrel->NewRelminMxid, diff);
+ vacrel->scan_state->NewRelminMxid, diff);
}
appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %lld tuples frozen\n"),
- vacrel->new_frozen_tuple_pages,
+ vacrel->scan_state->new_frozen_tuple_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->new_frozen_tuple_pages /
+ 100.0 * vacrel->scan_state->new_frozen_tuple_pages /
orig_rel_pages,
- (long long) vacrel->tuples_frozen);
+ (long long) vacrel->scan_state->tuples_frozen);
appendStringInfo(&buf,
_("visibility map: %u pages set all-visible, %u pages set all-frozen (%u were all-visible)\n"),
- vacrel->vm_new_visible_pages,
- vacrel->vm_new_visible_frozen_pages +
- vacrel->vm_new_frozen_pages,
- vacrel->vm_new_frozen_pages);
+ vacrel->scan_state->vm_new_visible_pages,
+ vacrel->scan_state->vm_new_visible_frozen_pages +
+ vacrel->scan_state->vm_new_frozen_pages,
+ vacrel->scan_state->vm_new_frozen_pages);
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
@@ -746,10 +757,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
appendStringInfo(&buf, msgfmt,
- vacrel->lpdead_item_pages,
+ vacrel->scan_state->lpdead_item_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
- (long long) vacrel->lpdead_items);
+ 100.0 * vacrel->scan_state->lpdead_item_pages / orig_rel_pages,
+ (long long) vacrel->scan_state->lpdead_items);
for (int i = 0; i < vacrel->nindexes; i++)
{
IndexBulkDeleteResult *istat = vacrel->indstats[i];
@@ -882,7 +893,7 @@ lazy_scan_heap(LVRelState *vacrel)
bool has_lpdead_items;
bool got_cleanup_lock = false;
- vacrel->scanned_pages++;
+ vacrel->scan_state->scanned_pages++;
/* Report as block scanned, update error traceback information */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
@@ -900,7 +911,7 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (vacrel->scan_state->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
/*
@@ -1064,16 +1075,16 @@ lazy_scan_heap(LVRelState *vacrel)
/* now we can compute the new value for pg_class.reltuples */
vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scanned_pages,
- vacrel->live_tuples);
+ vacrel->scan_state->scanned_pages,
+ vacrel->scan_state->live_tuples);
/*
* Also compute the total number of surviving heap entries. In the
* (unlikely) scenario that new_live_tuples is -1, take it as zero.
*/
vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples;
+ Max(vacrel->new_live_tuples, 0) + vacrel->scan_state->recently_dead_tuples +
+ vacrel->scan_state->missed_dead_tuples;
/*
* Do index vacuuming (call each index's ambulkdelete routine), then do
@@ -1110,8 +1121,8 @@ lazy_scan_heap(LVRelState *vacrel)
* there are no further blocks to process.
*
* vacrel is an in/out parameter here. Vacuum options and information about
- * the relation are read. vacrel->skippedallvis is set if we skip a block
- * that's all-visible but not all-frozen, to ensure that we don't update
+ * the relation are read. vacrel->scan_state->skippedallvis is set if we skip
+ * a block that's all-visible but not all-frozen, to ensure that we don't update
* relfrozenxid in that case. vacrel also holds information about the next
* unskippable block, as bookkeeping for this function.
*/
@@ -1170,7 +1181,7 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
next_block = vacrel->next_unskippable_block;
if (skipsallvis)
- vacrel->skippedallvis = true;
+ vacrel->scan_state->skippedallvis = true;
}
}
@@ -1414,11 +1425,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
freespace = PageGetHeapFreeSpace(page);
@@ -1488,10 +1499,11 @@ lazy_scan_prune(LVRelState *vacrel,
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
- &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+ &vacrel->scan_state->NewRelfrozenXid,
+ &vacrel->scan_state->NewRelminMxid);
- Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
- Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+ Assert(MultiXactIdIsValid(vacrel->scan_state->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->scan_state->NewRelfrozenXid));
if (presult.nfrozen > 0)
{
@@ -1501,7 +1513,7 @@ lazy_scan_prune(LVRelState *vacrel,
* frozen tuples (don't confuse that with pages newly set all-frozen
* in VM).
*/
- vacrel->new_frozen_tuple_pages++;
+ vacrel->scan_state->new_frozen_tuple_pages++;
}
/*
@@ -1536,7 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if (presult.lpdead_items > 0)
{
- vacrel->lpdead_item_pages++;
+ vacrel->scan_state->lpdead_item_pages++;
/*
* deadoffsets are collected incrementally in
@@ -1551,15 +1563,15 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += presult.live_tuples;
- vacrel->recently_dead_tuples += presult.recently_dead_tuples;
+ vacrel->scan_state->tuples_deleted += presult.ndeleted;
+ vacrel->scan_state->tuples_frozen += presult.nfrozen;
+ vacrel->scan_state->lpdead_items += presult.lpdead_items;
+ vacrel->scan_state->live_tuples += presult.live_tuples;
+ vacrel->scan_state->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_state->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
@@ -1608,13 +1620,13 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
if (presult.all_frozen)
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
presult.all_frozen)
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
/*
@@ -1700,8 +1712,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
/*
@@ -1709,7 +1721,7 @@ lazy_scan_prune(LVRelState *vacrel,
* above, so we don't need to test the value of old_vmbits.
*/
else
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
}
@@ -1748,8 +1760,8 @@ lazy_scan_noprune(LVRelState *vacrel,
missed_dead_tuples;
bool hastup;
HeapTupleHeader tupleheader;
- TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ TransactionId NoFreezePageRelfrozenXid = vacrel->scan_state->NewRelfrozenXid;
+ MultiXactId NoFreezePageRelminMxid = vacrel->scan_state->NewRelminMxid;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1876,8 +1888,8 @@ lazy_scan_noprune(LVRelState *vacrel,
* this particular page until the next VACUUM. Remember its details now.
* (lazy_scan_prune expects a clean slate, so we have to do this last.)
*/
- vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = NoFreezePageRelminMxid;
+ vacrel->scan_state->NewRelfrozenXid = NoFreezePageRelfrozenXid;
+ vacrel->scan_state->NewRelminMxid = NoFreezePageRelminMxid;
/* Save any LP_DEAD items found on the page in dead_items */
if (vacrel->nindexes == 0)
@@ -1904,25 +1916,25 @@ lazy_scan_noprune(LVRelState *vacrel,
* indexes will be deleted during index vacuuming (and then marked
* LP_UNUSED in the heap)
*/
- vacrel->lpdead_item_pages++;
+ vacrel->scan_state->lpdead_item_pages++;
dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
- vacrel->lpdead_items += lpdead_items;
+ vacrel->scan_state->lpdead_items += lpdead_items;
}
/*
* Finally, add relevant page-local counts to whole-VACUUM counts
*/
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
- vacrel->missed_dead_tuples += missed_dead_tuples;
+ vacrel->scan_state->live_tuples += live_tuples;
+ vacrel->scan_state->recently_dead_tuples += recently_dead_tuples;
+ vacrel->scan_state->missed_dead_tuples += missed_dead_tuples;
if (missed_dead_tuples > 0)
- vacrel->missed_dead_pages++;
+ vacrel->scan_state->missed_dead_pages++;
/* Can't truncate this page */
if (hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_state->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
@@ -1951,7 +1963,7 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(vacrel->lpdead_item_pages > 0);
+ Assert(vacrel->scan_state->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
@@ -1985,7 +1997,7 @@ lazy_vacuum(LVRelState *vacrel)
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
- Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
+ Assert(vacrel->scan_state->lpdead_items == vacrel->dead_items_info->num_items);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2012,7 +2024,7 @@ lazy_vacuum(LVRelState *vacrel)
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
- bypass = (vacrel->lpdead_item_pages < threshold &&
+ bypass = (vacrel->scan_state->lpdead_item_pages < threshold &&
(TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L)));
}
@@ -2150,7 +2162,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
* place).
*/
Assert(vacrel->num_index_scans > 0 ||
- vacrel->dead_items_info->num_items == vacrel->lpdead_items);
+ vacrel->dead_items_info->num_items == vacrel->scan_state->lpdead_items);
Assert(allindexes || VacuumFailsafeActive);
/*
@@ -2259,8 +2271,8 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
- (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
- vacuumed_pages == vacrel->lpdead_item_pages));
+ (vacrel->dead_items_info->num_items == vacrel->scan_state->lpdead_items &&
+ vacuumed_pages == vacrel->scan_state->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
@@ -2376,14 +2388,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
if (all_frozen)
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
all_frozen)
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
/* Revert to the previous phase information for error traceback */
@@ -2459,7 +2471,7 @@ static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
double reltuples = vacrel->new_rel_tuples;
- bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
+ bool estimated_count = vacrel->scan_state->scanned_pages < vacrel->rel_pages;
const int progress_start_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_INDEXES_TOTAL
@@ -2640,7 +2652,7 @@ should_attempt_truncation(LVRelState *vacrel)
if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
return false;
- possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
+ possibly_freeable = vacrel->rel_pages - vacrel->scan_state->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
@@ -2666,7 +2678,7 @@ lazy_truncate_heap(LVRelState *vacrel)
/* Update error traceback information one last time */
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
- vacrel->nonempty_pages, InvalidOffsetNumber);
+ vacrel->scan_state->nonempty_pages, InvalidOffsetNumber);
/*
* Loop until no more truncating can be done.
@@ -2767,7 +2779,7 @@ lazy_truncate_heap(LVRelState *vacrel)
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
- vacrel->removed_pages += orig_rel_pages - new_rel_pages;
+ vacrel->scan_state->removed_pages += orig_rel_pages - new_rel_pages;
vacrel->rel_pages = new_rel_pages;
ereport(vacrel->verbose ? INFO : DEBUG2,
@@ -2775,7 +2787,7 @@ lazy_truncate_heap(LVRelState *vacrel)
vacrel->relname,
orig_rel_pages, new_rel_pages)));
orig_rel_pages = new_rel_pages;
- } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
+ } while (new_rel_pages > vacrel->scan_state->nonempty_pages && lock_waiter_detected);
}
/*
@@ -2803,7 +2815,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
- while (blkno > vacrel->nonempty_pages)
+ while (blkno > vacrel->scan_state->nonempty_pages)
{
Buffer buf;
Page page;
@@ -2915,7 +2927,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
- return vacrel->nonempty_pages;
+ return vacrel->scan_state->nonempty_pages;
}
/*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index fbdb932e6b6..c4e0477c0d4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1478,6 +1478,7 @@ LPVOID
LPWSTR
LSEG
LUID
+LVRelScanState
LVRelState
LVSavedErrInfo
LWLock
--
2.43.5
Dear Sawada-san,
Thanks for updating the patch. ISTM that 0001 and 0002 can be applied independently.
Therefore I can firstly post some comments only for them.
Comments for 0001:
```
+ /* New estimated total # of tuples and total # of live tuples */
```
There is a unnecessary blank.
```
+ scan_state = palloc(sizeof(LVRelScanState));
+ scan_state->scanned_pages = 0;
+ scan_state->removed_pages = 0;
+ scan_state->new_frozen_tuple_pages = 0;
+ scan_state->lpdead_item_pages = 0;
+ scan_state->missed_dead_pages = 0;
+ scan_state->nonempty_pages = 0;
+ scan_state->tuples_deleted = 0;
+ scan_state->tuples_frozen = 0;
+ scan_state->lpdead_items = 0;
+ scan_state->live_tuples = 0;
+ scan_state->recently_dead_tuples = 0;
+ scan_state->missed_dead_tuples = 0;
+ scan_state->vm_new_visible_pages = 0;
+ scan_state->vm_new_visible_frozen_pages = 0;
+ scan_state->vm_new_frozen_pages = 0;
+ vacrel->scan_state = scan_state;
```
Since most of attributes are initialized as zero, can you use palloc0() instead?
```
- * the relation are read. vacrel->skippedallvis is set if we skip a block
- * that's all-visible but not all-frozen, to ensure that we don't update
+ * the relation are read. vacrel->scan_state->skippedallvis is set if we skip
+ * a block that's all-visible but not all-frozen, to ensure that we don't update
* relfrozenxid in that case. vacrel also holds information about the next
```
A line exceeds 80-char limit.
```
+ /* How many time index vacuuming or cleaning up is executed? */
+ int num_index_scans;
+
```
Comments for 0002:
```
+ /* How many time index vacuuming or cleaning up is executed? */
+ int num_index_scans;
+
```
I feel this is bit confusing because LVRelState also has "num_index_scans".
How about "num_parallel_index_scans"?
Attached patch contains above changes.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
kuroda.diffsapplication/octet-stream; name=kuroda.diffsDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61b77af09b..c2fa06b674 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -227,7 +227,7 @@ typedef struct LVRelState
/* Working state for heap scanning and vacuuming */
LVRelScanState *scan_state;
- /* New estimated total # of tuples and total # of live tuples */
+ /* New estimated total # of tuples and total # of live tuples */
double new_rel_tuples;
double new_live_tuples;
/* Statistics output by index AMs */
@@ -321,7 +321,6 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
- LVRelScanState *scan_state;
bool verbose,
instrument,
skipwithvm,
@@ -433,23 +432,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
}
/* Initialize page counters explicitly (be tidy) */
- scan_state = palloc(sizeof(LVRelScanState));
- scan_state->scanned_pages = 0;
- scan_state->removed_pages = 0;
- scan_state->new_frozen_tuple_pages = 0;
- scan_state->lpdead_item_pages = 0;
- scan_state->missed_dead_pages = 0;
- scan_state->nonempty_pages = 0;
- scan_state->tuples_deleted = 0;
- scan_state->tuples_frozen = 0;
- scan_state->lpdead_items = 0;
- scan_state->live_tuples = 0;
- scan_state->recently_dead_tuples = 0;
- scan_state->missed_dead_tuples = 0;
- scan_state->vm_new_visible_pages = 0;
- scan_state->vm_new_visible_frozen_pages = 0;
- scan_state->vm_new_frozen_pages = 0;
- vacrel->scan_state = scan_state;
+ vacrel->scan_state = palloc0(sizeof(LVRelScanState));
/* dead_items_alloc allocates vacrel->dead_items later on */
/* Allocate/initialize output statistics state */
@@ -1122,9 +1105,9 @@ lazy_scan_heap(LVRelState *vacrel)
*
* vacrel is an in/out parameter here. Vacuum options and information about
* the relation are read. vacrel->scan_state->skippedallvis is set if we skip
- * a block that's all-visible but not all-frozen, to ensure that we don't update
- * relfrozenxid in that case. vacrel also holds information about the next
- * unskippable block, as bookkeeping for this function.
+ * a block that's all-visible but not all-frozen, to ensure that we don't
+ * update relfrozenxid in that case. vacrel also holds information about the
+ * next unskippable block, as bookkeeping for this function.
*/
static bool
heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 50dd3d7d14..11282e98a1 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -201,7 +201,7 @@ struct ParallelVacuumState
bool *will_parallel_vacuum;
/* How many time index vacuuming or cleaning up is executed? */
- int num_index_scans;
+ int num_parallel_index_scans;
/*
* The number of indexes that support parallel index bulk-deletion and
@@ -231,7 +231,7 @@ static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
PVIndStats *indstats);
-static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_parallel_index_scans,
bool vacuum);
static void parallel_vacuum_error_callback(void *arg);
@@ -631,7 +631,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
nworkers = pvs->nindexes_parallel_cleanup;
/* Add conditionally parallel-aware indexes if in the first time call */
- if (pvs->num_index_scans == 0)
+ if (pvs->num_parallel_index_scans == 0)
nworkers += pvs->nindexes_parallel_condcleanup;
}
@@ -659,7 +659,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
indstats->parallel_workers_can_process =
(pvs->will_parallel_vacuum[i] &&
parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
- pvs->num_index_scans,
+ pvs->num_parallel_index_scans,
vacuum));
}
@@ -670,7 +670,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (pvs->num_index_scans > 0)
+ if (pvs->num_parallel_index_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -766,7 +766,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
}
/* Increment the counter */
- pvs->num_index_scans++;
+ pvs->num_parallel_index_scans++;
}
/*
@@ -951,7 +951,8 @@ parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
* parallel index vacuum or parallel index cleanup.
*/
static bool
-parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
+parallel_vacuum_index_is_parallel_safe(Relation indrel,
+ int num_parallel_index_scans,
bool vacuum)
{
uint8 vacoptions;
@@ -975,7 +976,7 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
* VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
* parallel cleanup conditionally.
*/
- if (num_index_scans > 0 &&
+ if (num_parallel_index_scans > 0 &&
((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
return false;
On 12/19/24 23:05, Masahiko Sawada wrote:
On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <tomas@vondra.me> wrote:
On 12/13/24 00:04, Tomas Vondra wrote:
...
The main difference is here:
master / no parallel workers:
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
1 parallel worker:
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
essentially just those with deleted tuples, which is ~1/20 of pages.
That's close to the 15x speedup.This effect is clearest without indexes, but it does affect even runs
with indexes - having to scan the indexes makes it much less pronounced,
though. However, these indexes are pretty massive (about the same size
as the table) - multiple times larger than the table. Chances are it'd
be clearer on realistic data sets.So the question is - is this correct? And if yes, why doesn't the
regular (serial) vacuum do that?There's some more strange things, though. For example, how come the avg
read rate is 0.000 MB/s?avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
pages are in shared buffers (thanks to the DELETE earlier in that session).OK, after looking into this a bit more I think the reason is rather
simple - SKIP_PAGES_THRESHOLD.With serial runs, we end up scanning all pages, because even with an
update every 5000 tuples, that's still only ~25 pages apart, well within
the 32-page window. So we end up skipping no pages, scan and vacuum all
everything.But parallel runs have this skipping logic disabled, or rather the logic
that switches to sequential scans if the gap is less than 32 pages.IMHO this raises two questions:
1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
sequential scans is the pages are close enough. Maybe there is a reason
for this difference? Workers can reduce the difference between random
and sequential I/0, similarly to prefetching. But that just means the
workers should use a lower threshold, e.g. asSKIP_PAGES_THRESHOLD / nworkers
or something like that? I don't see this discussed in this thread.
Each parallel heap scan worker allocates a chunk of blocks which is
8192 blocks at maximum, so we would need to use the
SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we
need to evaluate the differences anyway. WIll do the benchmark test
and share the results.
Right. I don't think this really matters for small tables, and for large
tables the chunks should be fairly large (possibly up to 8192 blocks),
in which case we could apply SKIP_PAGE_THRESHOLD just like in the serial
case. There might be differences at boundaries between chunks, but that
seems like a minor / expected detail. I haven't checked know if the code
would need to change / how much.
2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
storage. If I can get an order of magnitude improvement (or more than
that) by disabling the threshold, and just doing random I/O, maybe
there's time to adjust it a bit.Yeah, you've started a thread for this so let's discuss it there.
OK. FWIW as suggested in the other thread, it doesn't seem to be merely
a question of VACUUM performance, as not skipping pages gives vacuum the
opportunity to do cleanup that would otherwise need to happen later.
If only for this reason, I think it would be good to keep the serial and
parallel vacuum consistent.
regards
--
Tomas Vondra
On Wed, Dec 25, 2024 at 8:52 AM Tomas Vondra <tomas@vondra.me> wrote:
On 12/19/24 23:05, Masahiko Sawada wrote:
On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <tomas@vondra.me> wrote:
On 12/13/24 00:04, Tomas Vondra wrote:
...
The main difference is here:
master / no parallel workers:
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
1 parallel worker:
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
essentially just those with deleted tuples, which is ~1/20 of pages.
That's close to the 15x speedup.This effect is clearest without indexes, but it does affect even runs
with indexes - having to scan the indexes makes it much less pronounced,
though. However, these indexes are pretty massive (about the same size
as the table) - multiple times larger than the table. Chances are it'd
be clearer on realistic data sets.So the question is - is this correct? And if yes, why doesn't the
regular (serial) vacuum do that?There's some more strange things, though. For example, how come the avg
read rate is 0.000 MB/s?avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
pages are in shared buffers (thanks to the DELETE earlier in that session).OK, after looking into this a bit more I think the reason is rather
simple - SKIP_PAGES_THRESHOLD.With serial runs, we end up scanning all pages, because even with an
update every 5000 tuples, that's still only ~25 pages apart, well within
the 32-page window. So we end up skipping no pages, scan and vacuum all
everything.But parallel runs have this skipping logic disabled, or rather the logic
that switches to sequential scans if the gap is less than 32 pages.IMHO this raises two questions:
1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
sequential scans is the pages are close enough. Maybe there is a reason
for this difference? Workers can reduce the difference between random
and sequential I/0, similarly to prefetching. But that just means the
workers should use a lower threshold, e.g. asSKIP_PAGES_THRESHOLD / nworkers
or something like that? I don't see this discussed in this thread.
Each parallel heap scan worker allocates a chunk of blocks which is
8192 blocks at maximum, so we would need to use the
SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we
need to evaluate the differences anyway. WIll do the benchmark test
and share the results.Right. I don't think this really matters for small tables, and for large
tables the chunks should be fairly large (possibly up to 8192 blocks),
in which case we could apply SKIP_PAGE_THRESHOLD just like in the serial
case. There might be differences at boundaries between chunks, but that
seems like a minor / expected detail. I haven't checked know if the code
would need to change / how much.2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
storage. If I can get an order of magnitude improvement (or more than
that) by disabling the threshold, and just doing random I/O, maybe
there's time to adjust it a bit.Yeah, you've started a thread for this so let's discuss it there.
OK. FWIW as suggested in the other thread, it doesn't seem to be merely
a question of VACUUM performance, as not skipping pages gives vacuum the
opportunity to do cleanup that would otherwise need to happen later.If only for this reason, I think it would be good to keep the serial and
parallel vacuum consistent.
I've not evaluated SKIP_PAGE_THRESHOLD optimization yet but I'd like
to share the latest patch set as cfbot reports some failures. Comments
from Kuroda-san are also incorporated in this version. Also, I'd like
to share the performance test results I did with the latest patch.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
v6-0004-raidxtree.h-support-shared-iteration.patchapplication/octet-stream; name=v6-0004-raidxtree.h-support-shared-iteration.patchDownload
From 15c1688c537764c2ef859ccfc9dd506c12eb970a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:29:51 -0700
Subject: [PATCH v6 4/8] raidxtree.h: support shared iteration.
This commit supports a shared iteration operation on a radix tree with
multiple processes. The radix tree must be in shared mode to start a
shared itereation. Parallel workers can attach the shared iteration
using the iterator handle given by the leader process. Same as the
normal interation, it's guarnteed that the shared iteration returns
key-values in an ascending order.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
---
src/include/lib/radixtree.h | 227 +++++++++++++++---
.../modules/test_radixtree/test_radixtree.c | 128 ++++++----
2 files changed, 281 insertions(+), 74 deletions(-)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 6432b51a246..bfe4c927fa8 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -136,6 +136,9 @@
* RT_LOCK_SHARE - Lock the radix tree in share mode
* RT_UNLOCK - Unlock the radix tree
* RT_GET_HANDLE - Return the handle of the radix tree
+ * RT_BEGIN_ITERATE_SHARED - Begin iterating in shared mode.
+ * RT_ATTACH_ITERATE_SHARED - Attach to the shared iterator.
+ * RT_GET_ITER_HANDLE - Get the handle of the shared iterator.
*
* Optional Interface
* ---------
@@ -179,6 +182,9 @@
#define RT_ATTACH RT_MAKE_NAME(attach)
#define RT_DETACH RT_MAKE_NAME(detach)
#define RT_GET_HANDLE RT_MAKE_NAME(get_handle)
+#define RT_BEGIN_ITERATE_SHARED RT_MAKE_NAME(begin_iterate_shared)
+#define RT_ATTACH_ITERATE_SHARED RT_MAKE_NAME(attach_iterate_shared)
+#define RT_GET_ITER_HANDLE RT_MAKE_NAME(get_iter_handle)
#define RT_LOCK_EXCLUSIVE RT_MAKE_NAME(lock_exclusive)
#define RT_LOCK_SHARE RT_MAKE_NAME(lock_share)
#define RT_UNLOCK RT_MAKE_NAME(unlock)
@@ -238,15 +244,19 @@
#define RT_SHRINK_NODE_16 RT_MAKE_NAME(shrink_child_16)
#define RT_SHRINK_NODE_48 RT_MAKE_NAME(shrink_child_48)
#define RT_SHRINK_NODE_256 RT_MAKE_NAME(shrink_child_256)
+#define RT_INITIALIZE_ITER RT_MAKE_NAME(initialize_iter)
#define RT_NODE_ITERATE_NEXT RT_MAKE_NAME(node_iterate_next)
#define RT_VERIFY_NODE RT_MAKE_NAME(verify_node)
/* type declarations */
#define RT_RADIX_TREE RT_MAKE_NAME(radix_tree)
#define RT_RADIX_TREE_CONTROL RT_MAKE_NAME(radix_tree_control)
+#define RT_ITER_CONTROL RT_MAKE_NAME(iter_control)
#define RT_ITER RT_MAKE_NAME(iter)
#ifdef RT_SHMEM
#define RT_HANDLE RT_MAKE_NAME(handle)
+#define RT_ITER_CONTROL_SHARED RT_MAKE_NAME(iter_control_shared)
+#define RT_ITER_HANDLE RT_MAKE_NAME(iter_handle)
#endif
#define RT_NODE RT_MAKE_NAME(node)
#define RT_CHILD_PTR RT_MAKE_NAME(child_ptr)
@@ -272,6 +282,7 @@ typedef struct RT_ITER RT_ITER;
#ifdef RT_SHMEM
typedef dsa_pointer RT_HANDLE;
+typedef dsa_pointer RT_ITER_HANDLE;
#endif
#ifdef RT_SHMEM
@@ -282,6 +293,9 @@ RT_SCOPE RT_HANDLE RT_GET_HANDLE(RT_RADIX_TREE * tree);
RT_SCOPE void RT_LOCK_EXCLUSIVE(RT_RADIX_TREE * tree);
RT_SCOPE void RT_LOCK_SHARE(RT_RADIX_TREE * tree);
RT_SCOPE void RT_UNLOCK(RT_RADIX_TREE * tree);
+RT_SCOPE RT_ITER *RT_BEGIN_ITERATE_SHARED(RT_RADIX_TREE * tree);
+RT_SCOPE RT_ITER_HANDLE RT_GET_ITER_HANDLE(RT_ITER * iter);
+RT_SCOPE RT_ITER *RT_ATTACH_ITERATE_SHARED(RT_RADIX_TREE * tree, RT_ITER_HANDLE handle);
#else
RT_SCOPE RT_RADIX_TREE *RT_CREATE(MemoryContext ctx);
#endif
@@ -689,6 +703,7 @@ typedef struct RT_RADIX_TREE_CONTROL
RT_HANDLE handle;
uint32 magic;
LWLock lock;
+ int tranche_id;
#endif
RT_PTR_ALLOC root;
@@ -742,11 +757,9 @@ typedef struct RT_NODE_ITER
int idx;
} RT_NODE_ITER;
-/* state for iterating over the whole radix tree */
-struct RT_ITER
+/* Contain the iteration state data */
+typedef struct RT_ITER_CONTROL
{
- RT_RADIX_TREE *tree;
-
/*
* A stack to track iteration for each level. Level 0 is the lowest (or
* leaf) level
@@ -757,8 +770,36 @@ struct RT_ITER
/* The key constructed during iteration */
uint64 key;
-};
+} RT_ITER_CONTROL;
+
+#ifdef RT_SHMEM
+/* Contain the shared iteration state data */
+typedef struct RT_ITER_CONTROL_SHARED
+{
+ /* Actual shared iteration state data */
+ RT_ITER_CONTROL common;
+
+ /* protect the control data */
+ LWLock lock;
+
+ RT_ITER_HANDLE handle;
+ pg_atomic_uint32 refcnt;
+} RT_ITER_CONTROL_SHARED;
+#endif
+
+/* state for iterating over the whole radix tree */
+struct RT_ITER
+{
+ RT_RADIX_TREE *tree;
+ /* pointing to either local memory or DSA */
+ RT_ITER_CONTROL *ctl;
+
+#ifdef RT_SHMEM
+ /* True if the iterator is for shared iteration */
+ bool shared;
+#endif
+};
/* verification (available only in assert-enabled builds) */
static void RT_VERIFY_NODE(RT_NODE * node);
@@ -1850,6 +1891,7 @@ RT_CREATE(MemoryContext ctx)
tree->ctl = (RT_RADIX_TREE_CONTROL *) dsa_get_address(dsa, dp);
tree->ctl->handle = dp;
tree->ctl->magic = RT_RADIX_TREE_MAGIC;
+ tree->ctl->tranche_id = tranche_id;
LWLockInitialize(&tree->ctl->lock, tranche_id);
#else
tree->ctl = (RT_RADIX_TREE_CONTROL *) palloc0(sizeof(RT_RADIX_TREE_CONTROL));
@@ -1902,6 +1944,9 @@ RT_ATTACH(dsa_area *dsa, RT_HANDLE handle)
dsa_pointer control;
tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE));
+ tree->iter_context = AllocSetContextCreate(CurrentMemoryContext,
+ RT_STR(RT_PREFIX) "_radix_tree iter context",
+ ALLOCSET_SMALL_SIZES);
/* Find the control object in shared memory */
control = handle;
@@ -2074,35 +2119,86 @@ RT_FREE(RT_RADIX_TREE * tree)
/***************** ITERATION *****************/
+/* Common routine to initialize the given iterator */
+static void
+RT_INITIALIZE_ITER(RT_RADIX_TREE * tree, RT_ITER * iter)
+{
+ RT_CHILD_PTR root;
+
+ iter->tree = tree;
+
+ Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
+ root.alloc = iter->tree->ctl->root;
+ RT_PTR_SET_LOCAL(tree, &root);
+
+ iter->ctl->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+
+ /* Set the root to start */
+ iter->ctl->cur_level = iter->ctl->top_level;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = root;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
+}
+
/*
* Create and return the iterator for the given radix tree.
*
- * Taking a lock in shared mode during the iteration is the caller's
- * responsibility.
+ * Taking a lock on a radix tree in shared mode during the iteration is the
+ * caller's responsibility.
*/
RT_SCOPE RT_ITER *
RT_BEGIN_ITERATE(RT_RADIX_TREE * tree)
{
RT_ITER *iter;
- RT_CHILD_PTR root;
iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
sizeof(RT_ITER));
- iter->tree = tree;
+ iter->ctl = (RT_ITER_CONTROL *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER_CONTROL));
- Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root));
- root.alloc = iter->tree->ctl->root;
- RT_PTR_SET_LOCAL(tree, &root);
+ RT_INITIALIZE_ITER(tree, iter);
- iter->top_level = iter->tree->ctl->start_shift / RT_SPAN;
+#ifdef RT_SHMEM
+ /* we will non-shared iteration on a shared radix tree */
+ iter->shared = false;
+#endif
- /* Set the root to start */
- iter->cur_level = iter->top_level;
- iter->node_iters[iter->cur_level].node = root;
- iter->node_iters[iter->cur_level].idx = 0;
+ return iter;
+}
+
+#ifdef RT_SHMEM
+/*
+ * Create and return the shared iterator for the given shard radix tree.
+ *
+ * Taking a lock on a radix tree in shared mode during the shared iteration to
+ * prevent concurrent writes is the caller's responsibility.
+ */
+RT_SCOPE RT_ITER *
+RT_BEGIN_ITERATE_SHARED(RT_RADIX_TREE * tree)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl_shared;
+ dsa_pointer dp;
+
+ /* The radix tree must be in shared mode */
+ Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
+
+ dp = dsa_allocate0(tree->dsa, sizeof(RT_ITER_CONTROL_SHARED));
+ ctl_shared = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, dp);
+ ctl_shared->handle = dp;
+ LWLockInitialize(&ctl_shared->lock, tree->ctl->tranche_id);
+ pg_atomic_init_u32(&ctl_shared->refcnt, 1);
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+
+ iter->ctl = (RT_ITER_CONTROL *) ctl_shared;
+ iter->shared = true;
+
+ RT_INITIALIZE_ITER(tree, iter);
return iter;
}
+#endif
/*
* Scan the inner node and return the next child pointer if one exists, otherwise
@@ -2116,12 +2212,18 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
RT_CHILD_PTR node;
RT_PTR_ALLOC *slot = NULL;
+ node_iter = &(iter->ctl->node_iters[level]);
+ node = node_iter->node;
+
#ifdef RT_SHMEM
- Assert(iter->tree->ctl->magic == RT_RADIX_TREE_MAGIC);
-#endif
- node_iter = &(iter->node_iters[level]);
- node = node_iter->node;
+ /*
+ * Since the iterator is shared, the local pointer of the node might be
+ * set by other backends, we need to make sure to use the local pointer.
+ */
+ if (iter->shared)
+ RT_PTR_SET_LOCAL(iter->tree, &node);
+#endif
Assert(node.local != NULL);
@@ -2194,8 +2296,8 @@ RT_NODE_ITERATE_NEXT(RT_ITER * iter, int level)
}
/* Update the key */
- iter->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
- iter->key |= (((uint64) key_chunk) << (level * RT_SPAN));
+ iter->ctl->key &= ~(((uint64) RT_CHUNK_MASK) << (level * RT_SPAN));
+ iter->ctl->key |= (((uint64) key_chunk) << (level * RT_SPAN));
return slot;
}
@@ -2209,18 +2311,29 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
{
RT_PTR_ALLOC *slot = NULL;
- while (iter->cur_level <= iter->top_level)
+#ifdef RT_SHMEM
+ /* Prevent the shared iterator from being updated concurrently */
+ if (iter->shared)
+ LWLockAcquire(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock, LW_EXCLUSIVE);
+#endif
+
+ while (iter->ctl->cur_level <= iter->ctl->top_level)
{
RT_CHILD_PTR node;
- slot = RT_NODE_ITERATE_NEXT(iter, iter->cur_level);
+ slot = RT_NODE_ITERATE_NEXT(iter, iter->ctl->cur_level);
- if (iter->cur_level == 0 && slot != NULL)
+ if (iter->ctl->cur_level == 0 && slot != NULL)
{
/* Found a value at the leaf node */
- *key_p = iter->key;
+ *key_p = iter->ctl->key;
node.alloc = *slot;
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
if (RT_CHILDPTR_IS_VALUE(*slot))
return (RT_VALUE_TYPE *) slot;
else
@@ -2236,17 +2349,23 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
node.alloc = *slot;
RT_PTR_SET_LOCAL(iter->tree, &node);
- iter->cur_level--;
- iter->node_iters[iter->cur_level].node = node;
- iter->node_iters[iter->cur_level].idx = 0;
+ iter->ctl->cur_level--;
+ iter->ctl->node_iters[iter->ctl->cur_level].node = node;
+ iter->ctl->node_iters[iter->ctl->cur_level].idx = 0;
}
else
{
/* Not found the child slot, move up the tree */
- iter->cur_level++;
+ iter->ctl->cur_level++;
}
+
}
+#ifdef RT_SHMEM
+ if (iter->shared)
+ LWLockRelease(&((RT_ITER_CONTROL_SHARED *) iter->ctl)->lock);
+#endif
+
/* We've visited all nodes, so the iteration finished */
return NULL;
}
@@ -2257,9 +2376,45 @@ RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p)
RT_SCOPE void
RT_END_ITERATE(RT_ITER * iter)
{
+#ifdef RT_SHMEM
+ RT_ITER_CONTROL_SHARED *ctl = (RT_ITER_CONTROL_SHARED *) iter->ctl;
+
+ if (iter->shared &&
+ pg_atomic_sub_fetch_u32(&ctl->refcnt, 1) == 0)
+ dsa_free(iter->tree->dsa, ctl->handle);
+#endif
pfree(iter);
}
+#ifdef RT_SHMEM
+RT_SCOPE RT_ITER_HANDLE
+RT_GET_ITER_HANDLE(RT_ITER * iter)
+{
+ Assert(iter->shared);
+ return ((RT_ITER_CONTROL_SHARED *) iter->ctl)->handle;
+
+}
+
+RT_SCOPE RT_ITER *
+RT_ATTACH_ITERATE_SHARED(RT_RADIX_TREE * tree, RT_ITER_HANDLE handle)
+{
+ RT_ITER *iter;
+ RT_ITER_CONTROL_SHARED *ctl;
+
+ iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context,
+ sizeof(RT_ITER));
+ iter->tree = tree;
+ ctl = (RT_ITER_CONTROL_SHARED *) dsa_get_address(tree->dsa, handle);
+ iter->ctl = (RT_ITER_CONTROL *) ctl;
+ iter->shared = true;
+
+ /* For every iterator, increase the refcnt by 1 */
+ pg_atomic_add_fetch_u32(&ctl->refcnt, 1);
+
+ return iter;
+}
+#endif
+
/***************** DELETION *****************/
#ifdef RT_USE_DELETE
@@ -2959,7 +3114,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_PTR_ALLOC
#undef RT_INVALID_PTR_ALLOC
#undef RT_HANDLE
+#undef RT_ITER_HANDLE
+#undef RT_ITER_CONTROL
+#undef RT_ITER_HANDLE
#undef RT_ITER
+#undef RT_SHARED_ITER
#undef RT_NODE
#undef RT_NODE_ITER
#undef RT_NODE_KIND_4
@@ -2996,6 +3155,11 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_LOCK_SHARE
#undef RT_UNLOCK
#undef RT_GET_HANDLE
+#undef RT_BEGIN_ITERATE_SHARED
+#undef RT_ATTACH_ITERATE_SHARED
+#undef RT_GET_ITER_HANDLE
+#undef RT_ATTACH_ITER
+#undef RT_GET_ITER_HANDLE
#undef RT_FIND
#undef RT_SET
#undef RT_BEGIN_ITERATE
@@ -3052,5 +3216,6 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_SHRINK_NODE_256
#undef RT_NODE_DELETE
#undef RT_NODE_INSERT
+#undef RT_INITIALIZE_ITER
#undef RT_NODE_ITERATE_NEXT
#undef RT_VERIFY_NODE
diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c
index 8b379567970..3043d0af6a4 100644
--- a/src/test/modules/test_radixtree/test_radixtree.c
+++ b/src/test/modules/test_radixtree/test_radixtree.c
@@ -161,13 +161,87 @@ test_empty(void)
#endif
}
+/* Iteration test for test_basic() */
+static void
+test_iterate_basic(rt_radix_tree *radixtree, uint64 *keys, int children,
+ bool asc, bool shared)
+{
+ rt_iter *iter;
+
+#ifdef TEST_SHARED_RT
+ if (!shared)
+ iter = rt_begin_iterate(radixtree);
+ else
+ iter = rt_begin_iterate_shared(radixtree);
+#else
+ iter = rt_begin_iterate(radixtree);
+#endif
+
+ for (int i = 0; i < children; i++)
+ {
+ uint64 expected;
+ uint64 iterkey;
+ TestValueType *iterval;
+
+ /* iteration is ordered by key, so adjust expected value accordingly */
+ if (asc)
+ expected = keys[i];
+ else
+ expected = keys[children - 1 - i];
+
+ iterval = rt_iterate_next(iter, &iterkey);
+
+ EXPECT_TRUE(iterval != NULL);
+ EXPECT_EQ_U64(iterkey, expected);
+ EXPECT_EQ_U64(*iterval, expected);
+ }
+
+ rt_end_iterate(iter);
+}
+
+/* Iteration test for test_random() */
+static void
+test_iterate_random(rt_radix_tree *radixtree, uint64 *keys, int num_keys,
+ bool shared)
+{
+ rt_iter *iter;
+
+#ifdef TEST_SHARED_RT
+ if (!shared)
+ iter = rt_begin_iterate(radixtree);
+ else
+ iter = rt_begin_iterate_shared(radixtree);
+#else
+ iter = rt_begin_iterate(radixtree);
+#endif
+
+ for (int i = 0; i < num_keys; i++)
+ {
+ uint64 expected;
+ uint64 iterkey;
+ TestValueType *iterval;
+
+ /* skip duplicate keys */
+ if (i < num_keys - 1 && keys[i + 1] == keys[i])
+ continue;
+
+ expected = keys[i];
+ iterval = rt_iterate_next(iter, &iterkey);
+
+ EXPECT_TRUE(iterval != NULL);
+ EXPECT_EQ_U64(iterkey, expected);
+ EXPECT_EQ_U64(*iterval, expected);
+ }
+
+ rt_end_iterate(iter);
+}
+
/* Basic set, find, and delete tests */
static void
test_basic(rt_node_class_test_elem *test_info, int shift, bool asc)
{
MemoryContext radixtree_ctx;
rt_radix_tree *radixtree;
- rt_iter *iter;
uint64 *keys;
int children = test_info->nkeys;
#ifdef TEST_SHARED_RT
@@ -250,28 +324,12 @@ test_basic(rt_node_class_test_elem *test_info, int shift, bool asc)
}
/* test that iteration returns the expected keys and values */
- iter = rt_begin_iterate(radixtree);
-
- for (int i = 0; i < children; i++)
- {
- uint64 expected;
- uint64 iterkey;
- TestValueType *iterval;
-
- /* iteration is ordered by key, so adjust expected value accordingly */
- if (asc)
- expected = keys[i];
- else
- expected = keys[children - 1 - i];
-
- iterval = rt_iterate_next(iter, &iterkey);
-
- EXPECT_TRUE(iterval != NULL);
- EXPECT_EQ_U64(iterkey, expected);
- EXPECT_EQ_U64(*iterval, expected);
- }
+ test_iterate_basic(radixtree, keys, children, asc, false);
- rt_end_iterate(iter);
+#ifdef TEST_SHARED_RT
+ /* test shared-iteration as well */
+ test_iterate_basic(radixtree, keys, children, asc, true);
+#endif
/* delete all keys again */
for (int i = 0; i < children; i++)
@@ -302,7 +360,6 @@ test_random(void)
{
MemoryContext radixtree_ctx;
rt_radix_tree *radixtree;
- rt_iter *iter;
pg_prng_state state;
/* limit memory usage by limiting the key space */
@@ -395,27 +452,12 @@ test_random(void)
}
/* test that iteration returns the expected keys and values */
- iter = rt_begin_iterate(radixtree);
-
- for (int i = 0; i < num_keys; i++)
- {
- uint64 expected;
- uint64 iterkey;
- TestValueType *iterval;
+ test_iterate_random(radixtree, keys, num_keys, false);
- /* skip duplicate keys */
- if (i < num_keys - 1 && keys[i + 1] == keys[i])
- continue;
-
- expected = keys[i];
- iterval = rt_iterate_next(iter, &iterkey);
-
- EXPECT_TRUE(iterval != NULL);
- EXPECT_EQ_U64(iterkey, expected);
- EXPECT_EQ_U64(*iterval, expected);
- }
-
- rt_end_iterate(iter);
+#ifdef TEST_SHARED_RT
+ /* test shared-iteration as well */
+ test_iterate_random(radixtree, keys, num_keys, true);
+#endif
/* reset random number generator for deletion */
pg_prng_seed(&state, seed);
--
2.43.5
v6-0006-radixtree.h-Add-RT_NUM_KEY-API-to-get-the-number-.patchapplication/octet-stream; name=v6-0006-radixtree.h-Add-RT_NUM_KEY-API-to-get-the-number-.patchDownload
From 3a6062c76a69ebc34117e5f4277ba2e7d2269321 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 13 Dec 2024 16:54:46 -0800
Subject: [PATCH v6 6/8] radixtree.h: Add RT_NUM_KEY API to get the number of
keys.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/include/lib/radixtree.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index bfe4c927fa8..12d8217762e 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -126,6 +126,7 @@
* RT_ITERATE_NEXT - Return next key-value pair, if any
* RT_END_ITERATE - End iteration
* RT_MEMORY_USAGE - Get the memory as measured by space in memory context blocks
+ * RT_NUM_KEYS - Get the number of key-value pairs in radix tree
*
* Interface for Shared Memory
* ---------
@@ -197,6 +198,7 @@
#define RT_DELETE RT_MAKE_NAME(delete)
#endif
#define RT_MEMORY_USAGE RT_MAKE_NAME(memory_usage)
+#define RT_NUM_KEYS RT_MAKE_NAME(num_keys)
#define RT_DUMP_NODE RT_MAKE_NAME(dump_node)
#define RT_STATS RT_MAKE_NAME(stats)
@@ -313,6 +315,7 @@ RT_SCOPE RT_VALUE_TYPE *RT_ITERATE_NEXT(RT_ITER * iter, uint64 *key_p);
RT_SCOPE void RT_END_ITERATE(RT_ITER * iter);
RT_SCOPE uint64 RT_MEMORY_USAGE(RT_RADIX_TREE * tree);
+RT_SCOPE int64 RT_NUM_KEYS(RT_RADIX_TREE * tree);
#ifdef RT_DEBUG
RT_SCOPE void RT_STATS(RT_RADIX_TREE * tree);
@@ -2844,6 +2847,15 @@ RT_MEMORY_USAGE(RT_RADIX_TREE * tree)
return total;
}
+RT_SCOPE int64
+RT_NUM_KEYS(RT_RADIX_TREE * tree)
+{
+#ifdef RT_SHMEM
+ Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC);
+#endif
+ return tree->ctl->num_keys;
+}
+
/*
* Perform some sanity checks on the given node.
*/
@@ -3167,6 +3179,7 @@ RT_DUMP_NODE(RT_NODE * node)
#undef RT_END_ITERATE
#undef RT_DELETE
#undef RT_MEMORY_USAGE
+#undef RT_NUM_KEYS
#undef RT_DUMP_NODE
#undef RT_STATS
--
2.43.5
v6-0005-Support-shared-itereation-on-TidStore.patchapplication/octet-stream; name=v6-0005-Support-shared-itereation-on-TidStore.patchDownload
From d9df05156392d9df46d177c2ffaa9bf70974c187 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:34:57 -0700
Subject: [PATCH v6 5/8] Support shared itereation on TidStore.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/common/tidstore.c | 59 ++++++++++++++++++
src/include/access/tidstore.h | 3 +
.../modules/test_tidstore/test_tidstore.c | 62 ++++++++++++++-----
3 files changed, 110 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index 27f20cf1972..399adf4af31 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -483,6 +483,7 @@ TidStoreBeginIterate(TidStore *ts)
iter = palloc0(sizeof(TidStoreIter));
iter->ts = ts;
+ /* begin iteration on the radix tree */
if (TidStoreIsShared(ts))
iter->tree_iter.shared = shared_ts_begin_iterate(ts->tree.shared);
else
@@ -533,6 +534,56 @@ TidStoreEndIterate(TidStoreIter *iter)
pfree(iter);
}
+/*
+ * Prepare to iterate through a shared TidStore in shared mode. This function
+ * is aimed to start the iteration on the given TidStore with parallel workers.
+ *
+ * The TidStoreIter struct is created in the caller's memory context, and it
+ * will be freed in TidStoreEndIterate.
+ *
+ * The caller is responsible for locking TidStore until the iteration is
+ * finished.
+ */
+TidStoreIter *
+TidStoreBeginIterateShared(TidStore *ts)
+{
+ TidStoreIter *iter;
+
+ if (!TidStoreIsShared(ts))
+ elog(ERROR, "cannot begin shared iteration on local TidStore");
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* begin the shared iteration on radix tree */
+ iter->tree_iter.shared =
+ (shared_ts_iter *) shared_ts_begin_iterate_shared(ts->tree.shared);
+
+ return iter;
+}
+
+/*
+ * Attach to the shared TidStore iterator. 'iter_handle' is the dsa_pointer
+ * returned by TidStoreGetSharedIterHandle(). The returned object is allocated
+ * in backend-local memory using CurrentMemoryContext.
+ */
+TidStoreIter *
+TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle)
+{
+ TidStoreIter *iter;
+
+ Assert(TidStoreIsShared(ts));
+
+ iter = palloc0(sizeof(TidStoreIter));
+ iter->ts = ts;
+
+ /* Attach to the shared iterator */
+ iter->tree_iter.shared = shared_ts_attach_iterate_shared(ts->tree.shared,
+ iter_handle);
+
+ return iter;
+}
+
/*
* Return the memory usage of TidStore.
*/
@@ -564,6 +615,14 @@ TidStoreGetHandle(TidStore *ts)
return (dsa_pointer) shared_ts_get_handle(ts->tree.shared);
}
+dsa_pointer
+TidStoreGetSharedIterHandle(TidStoreIter *iter)
+{
+ Assert(TidStoreIsShared(iter->ts));
+
+ return (dsa_pointer) shared_ts_get_iter_handle(iter->tree_iter.shared);
+}
+
/*
* Given a TidStoreIterResult returned by TidStoreIterateNext(), extract the
* offset numbers. Returns the number of offsets filled in, if <=
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index 041091df278..c886cef0f7d 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -37,6 +37,9 @@ extern void TidStoreDetach(TidStore *ts);
extern void TidStoreLockExclusive(TidStore *ts);
extern void TidStoreLockShare(TidStore *ts);
extern void TidStoreUnlock(TidStore *ts);
+extern TidStoreIter *TidStoreBeginIterateShared(TidStore *ts);
+extern TidStoreIter *TidStoreAttachIterateShared(TidStore *ts, dsa_pointer iter_handle);
+extern dsa_pointer TidStoreGetSharedIterHandle(TidStoreIter *iter);
extern void TidStoreDestroy(TidStore *ts);
extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
int num_offsets);
diff --git a/src/test/modules/test_tidstore/test_tidstore.c b/src/test/modules/test_tidstore/test_tidstore.c
index eb16e0fbfa6..36654cf0110 100644
--- a/src/test/modules/test_tidstore/test_tidstore.c
+++ b/src/test/modules/test_tidstore/test_tidstore.c
@@ -33,6 +33,7 @@ PG_FUNCTION_INFO_V1(test_is_full);
PG_FUNCTION_INFO_V1(test_destroy);
static TidStore *tidstore = NULL;
+static bool tidstore_is_shared;
static size_t tidstore_empty_size;
/* array for verification of some tests */
@@ -107,6 +108,7 @@ test_create(PG_FUNCTION_ARGS)
LWLockRegisterTranche(tranche_id, "test_tidstore");
tidstore = TidStoreCreateShared(tidstore_max_size, tranche_id);
+ tidstore_is_shared = true;
/*
* Remain attached until end of backend or explicitly detached so that
@@ -115,8 +117,11 @@ test_create(PG_FUNCTION_ARGS)
dsa_pin_mapping(TidStoreGetDSA(tidstore));
}
else
+ {
/* VACUUM uses insert only, so we test the other option. */
tidstore = TidStoreCreateLocal(tidstore_max_size, false);
+ tidstore_is_shared = false;
+ }
tidstore_empty_size = TidStoreMemoryUsage(tidstore);
@@ -212,14 +217,42 @@ do_set_block_offsets(PG_FUNCTION_ARGS)
PG_RETURN_INT64(blkno);
}
+/* Collect TIDs stored in the tidstore, in order */
+static void
+check_iteration(TidStore *tidstore, int *num_iter_tids, bool shared_iter)
+{
+ TidStoreIter *iter;
+ TidStoreIterResult *iter_result;
+
+ TidStoreLockShare(tidstore);
+
+ if (shared_iter)
+ iter = TidStoreBeginIterateShared(tidstore);
+ else
+ iter = TidStoreBeginIterate(tidstore);
+
+ while ((iter_result = TidStoreIterateNext(iter)) != NULL)
+ {
+ OffsetNumber offsets[MaxOffsetNumber];
+ int num_offsets;
+
+ num_offsets = TidStoreGetBlockOffsets(iter_result, offsets, lengthof(offsets));
+ Assert(num_offsets <= lengthof(offsets));
+ for (int i = 0; i < num_offsets; i++)
+ ItemPointerSet(&(items.iter_tids[(*num_iter_tids)++]), iter_result->blkno,
+ offsets[i]);
+ }
+
+ TidStoreEndIterate(iter);
+ TidStoreUnlock(tidstore);
+}
+
/*
* Verify TIDs in store against the array.
*/
Datum
check_set_block_offsets(PG_FUNCTION_ARGS)
{
- TidStoreIter *iter;
- TidStoreIterResult *iter_result;
int num_iter_tids = 0;
int num_lookup_tids = 0;
BlockNumber prevblkno = 0;
@@ -261,22 +294,23 @@ check_set_block_offsets(PG_FUNCTION_ARGS)
}
/* Collect TIDs stored in the tidstore, in order */
+ check_iteration(tidstore, &num_iter_tids, false);
- TidStoreLockShare(tidstore);
- iter = TidStoreBeginIterate(tidstore);
- while ((iter_result = TidStoreIterateNext(iter)) != NULL)
+ /* If the tidstore is shared, check the shared-iteration as well */
+ if (tidstore_is_shared)
{
- OffsetNumber offsets[MaxOffsetNumber];
- int num_offsets;
+ int num_iter_tids_shared = 0;
- num_offsets = TidStoreGetBlockOffsets(iter_result, offsets, lengthof(offsets));
- Assert(num_offsets <= lengthof(offsets));
- for (int i = 0; i < num_offsets; i++)
- ItemPointerSet(&(items.iter_tids[num_iter_tids++]), iter_result->blkno,
- offsets[i]);
+ check_iteration(tidstore, &num_iter_tids_shared, true);
+
+ /*
+ * verify that normal iteration and shared iteration returned the
+ * number of TIDs.
+ */
+ if (num_lookup_tids != num_iter_tids_shared)
+ elog(ERROR, "shared-iteration should have %d TIDs, have %d aaa",
+ items.num_tids, num_iter_tids_shared);
}
- TidStoreEndIterate(iter);
- TidStoreUnlock(tidstore);
/*
* Sort verification and lookup arrays and test that all arrays are the
--
2.43.5
v6-0008-Support-parallel-heap-vacuum-during-lazy-vacuum.patchapplication/octet-stream; name=v6-0008-Support-parallel-heap-vacuum-during-lazy-vacuum.patchDownload
From 4cc9be274dd46febf446cfc62b275182860d4226 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 24 Oct 2024 17:37:45 -0700
Subject: [PATCH v6 8/8] Support parallel heap vacuum during lazy vacuum.
This commit further extends parallel vacuum to perform the heap vacuum
phase with parallel workers. It leverages the shared TidStore iteration.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
doc/src/sgml/ref/vacuum.sgml | 17 +-
src/backend/access/heap/vacuumlazy.c | 280 +++++++++++++++++++-------
src/backend/commands/vacuumparallel.c | 10 +-
src/include/commands/vacuum.h | 2 +-
4 files changed, 223 insertions(+), 86 deletions(-)
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index aae0bbcd577..104157b5a56 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -278,20 +278,21 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<term><literal>PARALLEL</literal></term>
<listitem>
<para>
- Perform scanning heap, index vacuum, and index cleanup phases of
- <command>VACUUM</command> in parallel using
+ Perform scanning heap, vacuuming heap, index vacuum, and index cleanup
+ phases of <command>VACUUM</command> in parallel using
<replaceable class="parameter">integer</replaceable> background workers
(for the details of each vacuum phase, please refer to
<xref linkend="vacuum-phases"/>).
</para>
<para>
For heap tables, the number of workers used to perform the scanning
- heap is determined based on the size of table. A table can participate in
- parallel scanning heap if and only if the size of the table is more than
- <xref linkend="guc-min-parallel-table-scan-size"/>. During scanning heap,
- the heap table's blocks will be divided into ranges and shared among the
- cooperating processes. Each worker process will complete the scanning of
- its given range of blocks before requesting an additional range of blocks.
+ heap and vacuuming heap is determined based on the size of table. A table
+ can participate in parallel scanning heap if and only if the size of the
+ table is more than <xref linkend="guc-min-parallel-table-scan-size"/>.
+ During scanning heap, the heap table's blocks will be divided into ranges
+ and shared among the cooperating processes. Each worker process will
+ complete the scanning of its given range of blocks before requesting an
+ additional range of blocks.
</para>
<para>
The number of workers used to perform parallel index vacuum and index
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6502930258a..4841c7715e3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -20,6 +20,41 @@
* that there only needs to be one call to lazy_vacuum, after the initial pass
* completes.
*
+ * Parallel Vacuum
+ * ----------------
+ * Lazy vacuum on heap tables supports parallel processing for three vacuum
+ * phases: scanning heap, vacuuming indexes, and vacuuming heap. Before the
+ * scanning heap phase, we initialize parallel vacuum state, ParallelVacuumState,
+ * and allocate the TID store in a DSA area if we can use parallel mode for any
+ * of these three phases.
+ *
+ * We could require different number of parallel vacuum workers for each phase
+ * for various factors such as table size, number of indexes, and the number
+ * of pages having dead tuples. Parallel workers are launched at the beginning
+ * of each phase and exit at the end of each phase.
+ *
+ * For scanning the heap table with parallel workers, we utilize the
+ * table_block_parallelscan_xxx facility which splits the table into several
+ * chunks and parallel workers allocate chunks to scan. If dead_items TIDs is
+ * close to overrunning the available space during parallel heap scan, parallel
+ * workers exit and leader process gathers the scan results. Then, it performs
+ * a round of index and heap vacuuming that could also use the parallelism. After
+ * vacuuming both indexes and heap table, the leader process vacuums FSM to make
+ * newly-freed space visible. Then, it relaunches parallel workers to resume the
+ * scanning heap phase with parallel workers again. In order to be able to resume
+ * the parallel heap scan from the previous status, the workers' parallel scan
+ * descriptions are stored in the shared memory (DSM) space to share among parallel
+ * workers. If the leader could launch fewer workers to resume the parallel heap
+ * scan, some blocks are remained as un-scanned. The leader process serially deals
+ * with such blocks at the end of scanning heap phase (see
+ * parallel_heap_complete_unfinished_scan()).
+ *
+ * At the beginning of the vacuuming heap phase, the leader launches parallel
+ * workers and initiates the shared iteration on the shared TID store. At the
+ * end of the phase, the leader process waits for all workers to finish and gather
+ * the workers' results.
+ *
+ *
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
@@ -172,6 +207,7 @@ typedef struct LVRelScanState
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ BlockNumber vacuumed_pages; /* # pages vacuumed in one second pass */
/* Counters that follow are only for scanned_pages */
int64 tuples_deleted; /* # deleted from table */
@@ -205,11 +241,15 @@ typedef struct PHVShared
* The final value is OR of worker's skippedallvis.
*/
bool skippedallvis;
+ bool do_index_vacuuming;
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState vistest;
+ dsa_pointer shared_iter_handle;
+ bool do_heap_vacuum;
+
/* per-worker scan stats for parallel heap vacuum scan */
LVRelScanState worker_scan_state[FLEXIBLE_ARRAY_MEMBER];
} PHVShared;
@@ -257,6 +297,14 @@ typedef struct PHVState
/* Assigned per-worker scan state */
PHVScanWorkerState *myscanstate;
+ /*
+ * The number of parallel workers to launch for parallel heap scanning.
+ * Note that the number of parallel workers for parallel heap vacuuming
+ * could vary but is less than num_heapscan_workers. So this works also as
+ * the maximum number of workers for parallel heap scanning and vacuuming.
+ */
+ int num_heapscan_workers;
+
/*
* All blocks up to this value has been scanned, i.e. the minimum of all
* PHVScanWorkerState->last_blkno. This field is updated by
@@ -374,6 +422,7 @@ static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
static void lazy_vacuum(LVRelState *vacrel);
static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
static void lazy_vacuum_heap_rel(LVRelState *vacrel);
+static void do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter);
static void lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
Buffer buffer, OffsetNumber *deadoffsets,
int num_offsets, Buffer vmbuffer);
@@ -404,6 +453,7 @@ static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
static void parallel_heap_vacuum_compute_min_scanned_blkno(LVRelState *vacrel);
static void parallel_heap_vacuum_gather_scan_results(LVRelState *vacrel);
static void parallel_heap_complete_unfinished_scan(LVRelState *vacrel);
+static int compute_heap_vacuum_parallel_workers(Relation rel, BlockNumber nblocks);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -551,6 +601,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
scan_state->lpdead_item_pages = 0;
scan_state->missed_dead_pages = 0;
scan_state->nonempty_pages = 0;
+ scan_state->vacuumed_pages = 0;
scan_state->tuples_deleted = 0;
scan_state->tuples_frozen = 0;
scan_state->lpdead_items = 0;
@@ -2456,46 +2507,14 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
return allindexes;
}
-/*
- * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
- *
- * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
- * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
- *
- * We may also be able to truncate the line pointer array of the heap pages we
- * visit. If there is a contiguous group of LP_UNUSED items at the end of the
- * array, it can be reclaimed as free space. These LP_UNUSED items usually
- * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
- * each page to LP_UNUSED, and then consider if it's possible to truncate the
- * page's line pointer array).
- *
- * Note: the reason for doing this as a second pass is we cannot remove the
- * tuples until we've removed their index entries, and we want to process
- * index entry removal in batches as large as possible.
- */
static void
-lazy_vacuum_heap_rel(LVRelState *vacrel)
+do_lazy_vacuum_heap_rel(LVRelState *vacrel, TidStoreIter *iter)
{
- BlockNumber vacuumed_pages = 0;
Buffer vmbuffer = InvalidBuffer;
- LVSavedErrInfo saved_err_info;
- TidStoreIter *iter;
- TidStoreIterResult *iter_result;
-
- Assert(vacrel->do_index_vacuuming);
- Assert(vacrel->do_index_cleanup);
- Assert(vacrel->num_index_scans > 0);
-
- /* Report that we are now vacuuming the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
- /* Update error traceback information */
- update_vacuum_error_info(vacrel, &saved_err_info,
- VACUUM_ERRCB_PHASE_VACUUM_HEAP,
- InvalidBlockNumber, InvalidOffsetNumber);
+ /* LVSavedErrInfo saved_err_info; */
+ TidStoreIterResult *iter_result;
- iter = TidStoreBeginIterate(vacrel->dead_items);
while ((iter_result = TidStoreIterateNext(iter)) != NULL)
{
BlockNumber blkno;
@@ -2533,26 +2552,106 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
UnlockReleaseBuffer(buf);
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
- vacuumed_pages++;
+ vacrel->scan_state->vacuumed_pages++;
}
- TidStoreEndIterate(iter);
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
+}
+
+/*
+ * lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
+ *
+ * This routine marks LP_DEAD items in vacrel->dead_items as LP_UNUSED. Pages
+ * that never had lazy_scan_prune record LP_DEAD items are not visited at all.
+ *
+ * We may also be able to truncate the line pointer array of the heap pages we
+ * visit. If there is a contiguous group of LP_UNUSED items at the end of the
+ * array, it can be reclaimed as free space. These LP_UNUSED items usually
+ * start out as LP_DEAD items recorded by lazy_scan_prune (we set items from
+ * each page to LP_UNUSED, and then consider if it's possible to truncate the
+ * page's line pointer array).
+ *
+ * Note: the reason for doing this as a second pass is we cannot remove the
+ * tuples until we've removed their index entries, and we want to process
+ * index entry removal in batches as large as possible.
+ */
+static void
+lazy_vacuum_heap_rel(LVRelState *vacrel)
+{
+ LVSavedErrInfo saved_err_info;
+ TidStoreIter *iter;
+ int nworkers = 0;
+
+ Assert(vacrel->do_index_vacuuming);
+ Assert(vacrel->do_index_cleanup);
+ Assert(vacrel->num_index_scans > 0);
+
+ /* Report that we are now vacuuming the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
+
+ /* Update error traceback information */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ vacrel->scan_state->vacuumed_pages = 0;
+
+ /* Compute parallel workers required to scan blocks to vacuum */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ nworkers = compute_heap_vacuum_parallel_workers(vacrel->rel,
+ TidStoreNumBlocks(vacrel->dead_items));
+
+ if (nworkers > 0)
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ iter = TidStoreBeginIterateShared(vacrel->dead_items);
+
+ /* launch workers */
+ phvstate->shared->do_heap_vacuum = true;
+ phvstate->shared->shared_iter_handle = TidStoreGetSharedIterHandle(iter);
+ phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs,
+ nworkers);
+ }
+ else
+ iter = TidStoreBeginIterate(vacrel->dead_items);
+
+ /* do the real work */
+ do_lazy_vacuum_heap_rel(vacrel, iter);
+
+ if (ParallelHeapVacuumIsActive(vacrel) && nworkers > 0)
+ {
+ PHVState *phvstate = vacrel->phvstate;
+
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+
+ /* Gather the heap vacuum statistics that workers collected */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanState *ss = &(phvstate->shared->worker_scan_state[i]);
+
+ vacrel->scan_state->vacuumed_pages += ss->vacuumed_pages;
+ }
+ }
+
+ TidStoreEndIterate(iter);
+
/*
* We set all LP_DEAD items from the first heap pass to LP_UNUSED during
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
(vacrel->dead_items_info->num_items == vacrel->scan_state->lpdead_items &&
- vacuumed_pages == vacrel->scan_state->lpdead_item_pages));
+ vacrel->scan_state->vacuumed_pages == vacrel->scan_state->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
vacrel->relname, (long long) vacrel->dead_items_info->num_items,
- vacuumed_pages)));
+ vacrel->scan_state->vacuumed_pages)));
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3261,6 +3360,11 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
{
vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs,
&vacrel->dead_items_info);
+
+ if (ParallelHeapVacuumIsActive(vacrel))
+ vacrel->phvstate->num_heapscan_workers =
+ parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
return;
}
}
@@ -3508,37 +3612,41 @@ update_relstats_all_indexes(LVRelState *vacrel)
*
* The calculation logic is borrowed from compute_parallel_worker().
*/
-int
-heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+static int
+compute_heap_vacuum_parallel_workers(Relation rel, BlockNumber nblocks)
{
int parallel_workers = 0;
int heap_parallel_threshold;
int heap_pages;
- if (nrequested == 0)
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation.Note that the upper limit of the min_parallel_table_scan_size
+ * GUC is chosen to prevent overflow here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = BlockNumberIsValid(nblocks) ?
+ nblocks : RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
{
- /*
- * Select the number of workers based on the log of the size of the
- * relation. Note that the upper limit of the
- * min_parallel_table_scan_size GUC is chosen to prevent overflow
- * here.
- */
- heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
- heap_pages = RelationGetNumberOfBlocks(rel);
- while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
- {
- parallel_workers++;
- heap_parallel_threshold *= 3;
- if (heap_parallel_threshold > INT_MAX / 3)
- break;
- }
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
}
- else
- parallel_workers = nrequested;
return parallel_workers;
}
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ if (nrequested == 0)
+ return compute_heap_vacuum_parallel_workers(rel, InvalidBlockNumber);
+ else
+ return nrequested;
+}
+
/* Estimate shared memory sizes required for parallel heap vacuum */
static inline void
heap_parallel_estimate_shared_memory_size(Relation rel, int nworkers, Size *pscan_len,
@@ -3620,6 +3728,7 @@ heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
shared->NewRelfrozenXid = vacrel->scan_state->NewRelfrozenXid;
shared->NewRelminMxid = vacrel->scan_state->NewRelminMxid;
shared->skippedallvis = vacrel->scan_state->skippedallvis;
+ shared->do_index_vacuuming = vacrel->do_index_vacuuming;
/*
* XXX: we copy the contents of vistest to the shared area, but in order
@@ -3672,7 +3781,6 @@ heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
PHVScanWorkerState *scanstate;
LVRelScanState *scan_state;
ErrorContextCallback errcallback;
- bool scan_done;
phvstate = palloc(sizeof(PHVState));
@@ -3709,10 +3817,11 @@ heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
/* initialize per-worker relation statistics */
MemSet(scan_state, 0, sizeof(LVRelScanState));
- /* Set fields necessary for heap scan */
+ /* Set fields necessary for heap scan and vacuum */
vacrel.scan_state->NewRelfrozenXid = shared->NewRelfrozenXid;
vacrel.scan_state->NewRelminMxid = shared->NewRelminMxid;
vacrel.scan_state->skippedallvis = shared->skippedallvis;
+ vacrel.do_index_vacuuming = shared->do_index_vacuuming;
/* Initialize the per-worker scan state if not yet */
if (!phvstate->myscanstate->initialized)
@@ -3734,25 +3843,44 @@ heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
vacrel.relname = pstrdup(RelationGetRelationName(rel));
vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
errcallback.callback = vacuum_error_callback;
errcallback.arg = &vacrel;
errcallback.previous = error_context_stack;
error_context_stack = &errcallback;
- scan_done = do_lazy_scan_heap(&vacrel);
+ if (shared->do_heap_vacuum)
+ {
+ TidStoreIter *iter;
+
+ iter = TidStoreAttachIterateShared(vacrel.dead_items, shared->shared_iter_handle);
+
+ /* Join parallel heap vacuum */
+ vacrel.phase = VACUUM_ERRCB_PHASE_VACUUM_HEAP;
+ do_lazy_vacuum_heap_rel(&vacrel, iter);
+
+ TidStoreEndIterate(iter);
+ }
+ else
+ {
+ bool scan_done;
+
+ /* Join parallel heap scan */
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in
+ * its scan state. Since this scan state might not be used in the next
+ * heap scan, we remember that it might have some unconsumed blocks so
+ * that the leader complete the scans after the heap scan phase
+ * finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+ }
/* Pop the error context stack */
error_context_stack = errcallback.previous;
-
- /*
- * If the leader or a worker finishes the heap scan because dead_items
- * TIDs is close to the limit, it might have some allocated blocks in its
- * scan state. Since this scan state might not be used in the next heap
- * scan, we remember that it might have some unconsumed blocks so that the
- * leader complete the scans after the heap scan phase finishes.
- */
- phvstate->myscanstate->maybe_have_blocks = !scan_done;
}
/*
@@ -3874,7 +4002,10 @@ do_parallel_lazy_scan_heap(LVRelState *vacrel)
Assert(!IsParallelWorker());
/* launcher workers */
- vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+ vacrel->phvstate->shared->do_heap_vacuum = false;
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs,
+ vacrel->phvstate->num_heapscan_workers);
/* initialize parallel scan description to join as a worker */
scanstate = palloc0(sizeof(PHVScanWorkerState));
@@ -3933,7 +4064,8 @@ do_parallel_lazy_scan_heap(LVRelState *vacrel)
/* Re-launch workers to restart parallel heap scan */
vacrel->phvstate->nworkers_launched =
- parallel_vacuum_table_scan_begin(vacrel->pvs);
+ parallel_vacuum_table_scan_begin(vacrel->pvs,
+ vacrel->phvstate->num_heapscan_workers);
}
/*
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 9f8c8f09576..2a096ed4128 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -1054,8 +1054,10 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
* table vacuum.
*/
int
-parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs, int nworkers_request)
{
+ int nworkers;
+
Assert(!IsParallelWorker());
if (pvs->shared->nworkers_for_table == 0)
@@ -1069,11 +1071,13 @@ parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
if (pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
+ nworkers = Min(nworkers_request, pvs->shared->nworkers_for_table);
+
/*
* The number of workers might vary between table vacuum and index
* processing
*/
- ReinitializeParallelWorkers(pvs->pcxt, pvs->shared->nworkers_for_table);
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
LaunchParallelWorkers(pvs->pcxt);
if (pvs->pcxt->nworkers_launched > 0)
@@ -1097,7 +1101,7 @@ parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
(errmsg(ngettext("launched %d parallel vacuum worker for table processing (planned: %d)",
"launched %d parallel vacuum workers for table processing (planned: %d)",
pvs->pcxt->nworkers_launched),
- pvs->pcxt->nworkers_launched, pvs->shared->nworkers_for_table)));
+ pvs->pcxt->nworkers_launched, nworkers)));
return pvs->pcxt->nworkers_launched;
}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d45866d61e5..7bec04395e9 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -371,7 +371,7 @@ extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
bool estimated_count);
-extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs, int nworkers_request);
extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
extern int parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs);
--
2.43.5
v6-0007-Add-TidStoreNumBlocks-API-to-get-the-number-of-bl.patchapplication/octet-stream; name=v6-0007-Add-TidStoreNumBlocks-API-to-get-the-number-of-bl.patchDownload
From 6cd17b504ca51178dcb5a7e03d65b5071b9483e6 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 13 Dec 2024 16:55:52 -0800
Subject: [PATCH v6 7/8] Add TidStoreNumBlocks API to get the number of blocks
in TidStore.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/common/tidstore.c | 12 ++++++++++++
src/include/access/tidstore.h | 1 +
2 files changed, 13 insertions(+)
diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index 399adf4af31..c43b3d8ac69 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -596,6 +596,18 @@ TidStoreMemoryUsage(TidStore *ts)
return local_ts_memory_usage(ts->tree.local);
}
+/*
+ * Return the number of entries in TidStore.
+ */
+BlockNumber
+TidStoreNumBlocks(TidStore *ts)
+{
+ if (TidStoreIsShared(ts))
+ return shared_ts_num_keys(ts->tree.shared);
+ else
+ return local_ts_num_keys(ts->tree.local);
+}
+
/*
* Return the DSA area where the TidStore lives.
*/
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index c886cef0f7d..fd739d20da1 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -51,6 +51,7 @@ extern int TidStoreGetBlockOffsets(TidStoreIterResult *result,
int max_offsets);
extern void TidStoreEndIterate(TidStoreIter *iter);
extern size_t TidStoreMemoryUsage(TidStore *ts);
+extern BlockNumber TidStoreNumBlocks(TidStore *ts);
extern dsa_pointer TidStoreGetHandle(TidStore *ts);
extern dsa_area *TidStoreGetDSA(TidStore *ts);
--
2.43.5
v6-0003-Support-parallel-heap-scan-during-lazy-vacuum.patchapplication/octet-stream; name=v6-0003-Support-parallel-heap-scan-during-lazy-vacuum.patchDownload
From a4ab09e39033423616064c0da4e6a704fdba03a4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 1 Jul 2024 15:17:46 +0900
Subject: [PATCH v6 3/8] Support parallel heap scan during lazy vacuum.
Commit 40d964ec99 allowed vacuum command to process indexes in
parallel. This change extends the parallel vacuum to support parallel
heap scan during lazy vacuum.
---
doc/src/sgml/ref/vacuum.sgml | 58 +-
src/backend/access/heap/heapam_handler.c | 6 +
src/backend/access/heap/vacuumlazy.c | 929 ++++++++++++++++++++---
src/backend/commands/vacuumparallel.c | 305 ++++++--
src/backend/storage/ipc/procarray.c | 74 --
src/include/access/heapam.h | 8 +
src/include/access/tableam.h | 88 +++
src/include/commands/vacuum.h | 8 +-
src/include/utils/snapmgr.h | 2 +-
src/include/utils/snapmgr_internal.h | 91 +++
src/tools/pgindent/typedefs.list | 3 +
11 files changed, 1332 insertions(+), 240 deletions(-)
create mode 100644 src/include/utils/snapmgr_internal.h
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 9110938fab6..aae0bbcd577 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -277,27 +277,43 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<varlistentry>
<term><literal>PARALLEL</literal></term>
<listitem>
- <para>
- Perform index vacuum and index cleanup phases of <command>VACUUM</command>
- in parallel using <replaceable class="parameter">integer</replaceable>
- background workers (for the details of each vacuum phase, please
- refer to <xref linkend="vacuum-phases"/>). The number of workers used
- to perform the operation is equal to the number of indexes on the
- relation that support parallel vacuum which is limited by the number of
- workers specified with <literal>PARALLEL</literal> option if any which is
- further limited by <xref linkend="guc-max-parallel-maintenance-workers"/>.
- An index can participate in parallel vacuum if and only if the size of the
- index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
- Please note that it is not guaranteed that the number of parallel workers
- specified in <replaceable class="parameter">integer</replaceable> will be
- used during execution. It is possible for a vacuum to run with fewer
- workers than specified, or even with no workers at all. Only one worker
- can be used per index. So parallel workers are launched only when there
- are at least <literal>2</literal> indexes in the table. Workers for
- vacuum are launched before the start of each phase and exit at the end of
- the phase. These behaviors might change in a future release. This
- option can't be used with the <literal>FULL</literal> option.
- </para>
+ <para>
+ Perform scanning heap, index vacuum, and index cleanup phases of
+ <command>VACUUM</command> in parallel using
+ <replaceable class="parameter">integer</replaceable> background workers
+ (for the details of each vacuum phase, please refer to
+ <xref linkend="vacuum-phases"/>).
+ </para>
+ <para>
+ For heap tables, the number of workers used to perform the scanning
+ heap is determined based on the size of table. A table can participate in
+ parallel scanning heap if and only if the size of the table is more than
+ <xref linkend="guc-min-parallel-table-scan-size"/>. During scanning heap,
+ the heap table's blocks will be divided into ranges and shared among the
+ cooperating processes. Each worker process will complete the scanning of
+ its given range of blocks before requesting an additional range of blocks.
+ </para>
+ <para>
+ The number of workers used to perform parallel index vacuum and index
+ cleanup is equal to the number of indexes on the relation that support
+ parallel vacuum. An index can participate in parallel vacuum if and only
+ if the size of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
+ Only one worker can be used per index. So parallel workers for index vacuum
+ and index cleanup are launched only when there are at least <literal>2</literal>
+ indexes in the table.
+ </para>
+ <para>
+ Workers for vacuum are launched before the start of each phase and exit
+ at the end of the phase. The number of workers for each phase is limited by
+ the number of workers specified with <literal>PARALLEL</literal> option if
+ any which is futher limited by <xref linkend="guc-max-parallel-maintenance-workers"/>.
+ Please note that in any parallel vacuum phase, it is not guaanteed that the
+ number of parallel workers specified in <replaceable class="parameter">integer</replaceable>
+ will be used during execution. It is possible for a vacuum to run with fewer
+ workers than specified, or even with no workers at all. These behaviors might
+ change in a future release. This option can't be used with the <literal>FULL</literal>
+ option.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e817f8f8f84..9484a2fdb3f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2656,6 +2656,12 @@ static const TableAmRoutine heapam_methods = {
.relation_copy_data = heapam_relation_copy_data,
.relation_copy_for_cluster = heapam_relation_copy_for_cluster,
.relation_vacuum = heap_vacuum_rel,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,
+ .parallel_vacuum_estimate = heap_parallel_vacuum_estimate,
+ .parallel_vacuum_initialize = heap_parallel_vacuum_initialize,
+ .parallel_vacuum_relation_worker = heap_parallel_vacuum_worker,
+
.scan_analyze_next_block = heapam_scan_analyze_next_block,
.scan_analyze_next_tuple = heapam_scan_analyze_next_tuple,
.index_build_range_scan = heapam_index_build_range_scan,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 116c0612ca5..6502930258a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -48,6 +48,7 @@
#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
+#include "optimizer/paths.h"
#include "pgstat.h"
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
@@ -115,10 +116,24 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * Macro to check if we are in a parallel vacuum. If true, we are in the
- * parallel mode and the DSM segment is initialized.
+ * DSM keys for heap parallel vacuum scan. Unlike other parallel execution code, we
+ * we don't need to worry about DSM keys conflicting with plan_node_id, but need to
+ * avoid conflicting with DSM keys used in vacuumparallel.c.
+ */
+#define LV_PARALLEL_KEY_SCAN_SHARED 0xFFFF0001
+#define LV_PARALLEL_KEY_SCAN_DESC 0xFFFF0002
+#define LV_PARALLEL_KEY_SCAN_DESC_WORKER 0xFFFF0003
+
+/*
+ * Macros to check if we are in parallel heap vacuuming, parallel index vacuuming,
+ * or both. If ParallelVacuumIsActive() is true, we are in the parallel mode, meaning
+ * that we have dead items TIDs on shared memory area.
*/
#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
+#define ParallelIndexVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_index((vacrel)->pvs) > 0)
+#define ParallelHeapVacuumIsActive(vacrel) \
+ (ParallelVacuumIsActive(vacrel) && parallel_vacuum_get_nworkers_table((vacrel)->pvs) > 0)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -172,6 +187,87 @@ typedef struct LVRelScanState
bool skippedallvis;
} LVRelScanState;
+/*
+ * Struct for information that needs to be shared among parallel vacuum workers
+ */
+typedef struct PHVShared
+{
+ bool aggressive;
+ bool skipwithvm;
+
+ /* The current oldest extant XID/MXID shared by the leader process */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+
+ /*
+ * Have we skipped any all-visible pages?
+ *
+ * The final value is OR of worker's skippedallvis.
+ */
+ bool skippedallvis;
+
+ /* VACUUM operation's cutoffs for freezing and pruning */
+ struct VacuumCutoffs cutoffs;
+ GlobalVisState vistest;
+
+ /* per-worker scan stats for parallel heap vacuum scan */
+ LVRelScanState worker_scan_state[FLEXIBLE_ARRAY_MEMBER];
+} PHVShared;
+#define SizeOfPHVShared (offsetof(PHVShared, worker_scan_state))
+
+/* Per-worker scan state for parallel heap vacuum scan */
+typedef struct PHVScanWorkerState
+{
+ bool initialized;
+
+ /* per-worker parallel table scan state */
+ ParallelBlockTableScanWorkerData state;
+
+ /*
+ * True if a parallel vacuum scan worker allocated blocks in state but
+ * might have not scanned all of them. The leader process will take over
+ * for scanning these remaining blocks.
+ */
+ bool maybe_have_blocks;
+
+ /* last block number the worker scanned */
+ BlockNumber last_blkno;
+} PHVScanWorkerState;
+
+/* Struct for parallel heap vacuum */
+typedef struct PHVState
+{
+ /* Parallel scan description shared among parallel workers */
+ ParallelBlockTableScanDesc pscandesc;
+
+ /* Shared information */
+ PHVShared *shared;
+
+ /*
+ * Points to all per-worker scan state array stored on DSM area.
+ *
+ * During parallel heap scan, each worker allocates some chunks of blocks
+ * to scan in its scan state, and could exit while leaving some chunks
+ * un-scanned if the size of dead_items TIDs is close to overrunning the
+ * the available space. We store the scan states on shared memory area so
+ * that workers can resume heap scans from the previous point.
+ */
+ PHVScanWorkerState *scanstates;
+
+ /* Assigned per-worker scan state */
+ PHVScanWorkerState *myscanstate;
+
+ /*
+ * All blocks up to this value has been scanned, i.e. the minimum of all
+ * PHVScanWorkerState->last_blkno. This field is updated by
+ * parallel_heap_vacuum_compute_min_scanned_blkno().
+ */
+ BlockNumber min_scanned_blkno;
+
+ /* The number of workers launched for parallel heap vacuum */
+ int nworkers_launched;
+} PHVState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -183,6 +279,9 @@ typedef struct LVRelState
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
+ /* Parallel heap vacuum state and sizes for each struct */
+ PHVState *phvstate;
+
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
@@ -223,6 +322,8 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
+ BlockNumber next_fsm_block_to_vacuum; /* next block to check for FSM
+ * vacuum */
/* Working state for heap scanning and vacuuming */
LVRelScanState *scan_state;
@@ -254,8 +355,11 @@ typedef struct LVSavedErrInfo
/* non-export function prototypes */
static void lazy_scan_heap(LVRelState *vacrel);
+static bool do_lazy_scan_heap(LVRelState *vacrel);
static bool heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
bool *all_visible_according_to_vm);
+static bool heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm);
static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -296,6 +400,11 @@ static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_relstats_all_indexes(LVRelState *vacrel);
+static void do_parallel_lazy_scan_heap(LVRelState *vacrel);
+static void parallel_heap_vacuum_compute_min_scanned_blkno(LVRelState *vacrel);
+static void parallel_heap_vacuum_gather_scan_results(LVRelState *vacrel);
+static void parallel_heap_complete_unfinished_scan(LVRelState *vacrel);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -432,6 +541,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup == VACOPTVALUE_AUTO);
}
+ vacrel->next_fsm_block_to_vacuum = 0;
+
/* Initialize page counters explicitly (be tidy) */
scan_state = palloc(sizeof(LVRelScanState));
scan_state->scanned_pages = 0;
@@ -452,6 +563,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->scan_state = scan_state;
/* dead_items_alloc allocates vacrel->dead_items later on */
+ /* dead_items_alloc allocates vacrel->dead_items later on */
+
/* Allocate/initialize output statistics state */
vacrel->new_rel_tuples = 0;
vacrel->new_live_tuples = 0;
@@ -861,12 +974,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel)
{
- BlockNumber rel_pages = vacrel->rel_pages,
- blkno,
- next_fsm_block_to_vacuum = 0;
- bool all_visible_according_to_vm;
-
- Buffer vmbuffer = InvalidBuffer;
+ BlockNumber rel_pages = vacrel->rel_pages;
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -886,12 +994,93 @@ lazy_scan_heap(LVRelState *vacrel)
vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
- while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
+ /*
+ * Do the actual work. If parallel heap vacuum is active, we scan and
+ * vacuum heap using parallel workers.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ do_parallel_lazy_scan_heap(vacrel);
+ else
+ {
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* We must have scanned all heap pages */
+ Assert(scan_done);
+ }
+
+ /* report that everything is now scanned */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, rel_pages);
+
+ /* now we can compute the new value for pg_class.reltuples */
+ vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
+ vacrel->scan_state->scanned_pages,
+ vacrel->scan_state->live_tuples);
+
+ /*
+ * Also compute the total number of surviving heap entries. In the
+ * (unlikely) scenario that new_live_tuples is -1, take it as zero.
+ */
+ vacrel->new_rel_tuples =
+ Max(vacrel->new_live_tuples, 0) + vacrel->scan_state->recently_dead_tuples +
+ vacrel->scan_state->missed_dead_tuples;
+
+ /*
+ * Do index vacuuming (call each index's ambulkdelete routine), then do
+ * related heap vacuuming
+ */
+ if (vacrel->dead_items_info->num_items > 0)
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the remainder of the Free Space Map. We must do this whether or
+ * not there were indexes, and whether or not we bypassed index vacuuming.
+ */
+ if (rel_pages > vacrel->next_fsm_block_to_vacuum)
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ rel_pages);
+
+ /* report all blocks vacuumed */
+ pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, rel_pages);
+
+ /* Do final index cleanup (call each index's amvacuumcleanup routine) */
+ if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
+ lazy_cleanup_all_indexes(vacrel);
+}
+
+/*
+ * Workhorse for lazy_scan_heap().
+ *
+ * Return true if we processed all blocks, otherwise false if we exit from this function
+ * while not completing the heap scan due to full of dead item TIDs. In serial heap scan
+ * case, this function always returns true. In parallel heap vacuum scan, this function
+ * is called by both worker processes and the leader process, and could return false.
+ */
+static bool
+do_lazy_scan_heap(LVRelState *vacrel)
+{
+ bool all_visible_according_to_vm;
+ BlockNumber blkno;
+ Buffer vmbuffer = InvalidBuffer;
+ bool scan_done = true;
+
+ while (true)
{
Buffer buf;
Page page;
bool has_lpdead_items;
bool got_cleanup_lock = false;
+ bool got_blkno;
+
+ /* Get the next block for vacuum to process */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ got_blkno = heap_vac_scan_next_block_parallel(vacrel, &blkno, &all_visible_according_to_vm);
+ else
+ got_blkno = heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm);
+
+ if (!got_blkno)
+ break;
vacrel->scan_state->scanned_pages++;
@@ -911,46 +1100,10 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scan_state->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (!IsParallelWorker() &&
+ vacrel->scan_state->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
- /*
- * Consider if we definitely have enough space to process TIDs on page
- * already. If we are close to overrunning the available space for
- * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
- * this page.
- */
- if (TidStoreMemoryUsage(vacrel->dead_items) > vacrel->dead_items_info->max_bytes)
- {
- /*
- * Before beginning index vacuuming, we release any pin we may
- * hold on the visibility map page. This isn't necessary for
- * correctness, but we do it anyway to avoid holding the pin
- * across a lengthy, unrelated operation.
- */
- if (BufferIsValid(vmbuffer))
- {
- ReleaseBuffer(vmbuffer);
- vmbuffer = InvalidBuffer;
- }
-
- /* Perform a round of index and heap vacuuming */
- vacrel->consider_bypass_optimization = false;
- lazy_vacuum(vacrel);
-
- /*
- * Vacuum the Free Space Map to make newly-freed space visible on
- * upper-level FSM pages. Note we have not yet processed blkno.
- */
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
-
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
- }
-
/*
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
@@ -1039,9 +1192,10 @@ lazy_scan_heap(LVRelState *vacrel)
* revisit this page. Since updating the FSM is desirable but not
* absolutely required, that's OK.
*/
- if (vacrel->nindexes == 0
- || !vacrel->do_index_vacuuming
- || !has_lpdead_items)
+ if (!IsParallelWorker() &&
+ (vacrel->nindexes == 0
+ || !vacrel->do_index_vacuuming
+ || !has_lpdead_items))
{
Size freespace = PageGetHeapFreeSpace(page);
@@ -1055,57 +1209,178 @@ lazy_scan_heap(LVRelState *vacrel)
* held the cleanup lock and lazy_scan_prune() was called.
*/
if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
- blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+ blkno - vacrel->next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
{
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
- blkno);
- next_fsm_block_to_vacuum = blkno;
+ BlockNumber fsm_vac_up_to;
+
+ /*
+ * If parallel heap vacuum scan is active, compute the minimum
+ * block number we scanned so far.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ parallel_heap_vacuum_compute_min_scanned_blkno(vacrel);
+ fsm_vac_up_to = vacrel->phvstate->min_scanned_blkno;
+ }
+ else
+ {
+ /* blkno is already processed */
+ fsm_vac_up_to = blkno + 1;
+ }
+
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ fsm_vac_up_to);
+ vacrel->next_fsm_block_to_vacuum = fsm_vac_up_to;
}
}
else
UnlockReleaseBuffer(buf);
+
+ /*
+ * Consider if we definitely have enough space to process TIDs on page
+ * already. If we are close to overrunning the available space for
+ * dead_items TIDs, pause and do a cycle of vacuuming before we tackle
+ * this page.
+ */
+ if (TidStoreMemoryUsage(vacrel->dead_items) > vacrel->dead_items_info->max_bytes)
+ {
+ /*
+ * Before beginning index vacuuming, we release any pin we may
+ * hold on the visibility map page. This isn't necessary for
+ * correctness, but we do it anyway to avoid holding the pin
+ * across a lengthy, unrelated operation.
+ */
+ if (BufferIsValid(vmbuffer))
+ {
+ ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
+ }
+
+ /*
+ * In parallel heap scan, we pause the heap scan without invoking
+ * index and heap vacuuming, and return to the caller with
+ * scan_done being false. The parallel vacuum workers will exit as
+ * theirs jobs are done. The leader process will wait for all
+ * workers to finish and perform index and heap vacuuming, and
+ * then performs FSM vacuum too.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ {
+ /* Remember the last scanned block */
+ vacrel->phvstate->myscanstate->last_blkno = blkno;
+
+ /* Remember we might have some unprocessed blocks */
+ scan_done = false;
+
+ break;
+ }
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = blkno;
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ continue;
+ }
}
vacrel->blkno = InvalidBlockNumber;
if (BufferIsValid(vmbuffer))
ReleaseBuffer(vmbuffer);
- /* report that everything is now scanned */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+ return scan_done;
+}
- /* now we can compute the new value for pg_class.reltuples */
- vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scan_state->scanned_pages,
- vacrel->scan_state->live_tuples);
+/*
+ * A parallel scan variant of heap_vac_scan_next_block(). Similar to
+ * heap_vac_scan_next_block(), the block number and visibility status of the next
+ * block to process are set in *blkno and *all_visible_according_to_vm. The return
+ * value is false if there are no further blocks to process.
+ *
+ * In parallel vacuum scan, we don't use the SKIP_PAGES_THRESHOLD optimization.
+ */
+static bool
+heap_vac_scan_next_block_parallel(LVRelState *vacrel, BlockNumber *blkno,
+ bool *all_visible_according_to_vm)
+{
+ PHVState *phvstate = vacrel->phvstate;
+ BlockNumber next_block;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 mapbits = 0;
- /*
- * Also compute the total number of surviving heap entries. In the
- * (unlikely) scenario that new_live_tuples is -1, take it as zero.
- */
- vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->scan_state->recently_dead_tuples +
- vacrel->scan_state->missed_dead_tuples;
+ Assert(ParallelHeapVacuumIsActive(vacrel));
- /*
- * Do index vacuuming (call each index's ambulkdelete routine), then do
- * related heap vacuuming
- */
- if (vacrel->dead_items_info->num_items > 0)
- lazy_vacuum(vacrel);
+ for (;;)
+ {
+ next_block = table_block_parallelscan_nextpage(vacrel->rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
- /*
- * Vacuum the remainder of the Free Space Map. We must do this whether or
- * not there were indexes, and whether or not we bypassed index vacuuming.
- */
- if (blkno > next_fsm_block_to_vacuum)
- FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum, blkno);
+ /* Have we reached the end of the table? */
+ if (!BlockNumberIsValid(next_block) || next_block >= vacrel->rel_pages)
+ {
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
- /* report all blocks vacuumed */
- pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+ *blkno = vacrel->rel_pages;
+ return false;
+ }
- /* Do final index cleanup (call each index's amvacuumcleanup routine) */
- if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
- lazy_cleanup_all_indexes(vacrel);
+ /* We always treat the last block as unsafe to skip */
+ if (next_block == vacrel->rel_pages - 1)
+ break;
+
+ mapbits = visibilitymap_get_status(vacrel->rel, next_block, &vmbuffer);
+
+ /*
+ * A block is unskippable if it is not all visible according to the
+ * visibility map.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
+ break;
+ }
+
+ /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
+ if (!vacrel->skipwithvm)
+ break;
+
+ /*
+ * Aggressive VACUUM caller can't skip pages just because they are
+ * all-visible.
+ */
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0)
+ {
+ if (vacrel->aggressive)
+ break;
+
+ /*
+ * All-visible block is safe to skip in non-aggressive case. But
+ * remember that the final range contains such a block for later.
+ */
+ vacrel->scan_state->skippedallvis = true;
+ }
+ }
+
+ if (BufferIsValid(vmbuffer))
+ ReleaseBuffer(vmbuffer);
+
+ *blkno = next_block;
+ *all_visible_according_to_vm = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
+
+ return true;
}
/*
@@ -1254,11 +1529,12 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/*
* Caller must scan the last page to determine whether it has tuples
- * (caller must have the opportunity to set vacrel->nonempty_pages).
- * This rule avoids having lazy_truncate_heap() take access-exclusive
- * lock on rel to attempt a truncation that fails anyway, just because
- * there are tuples on the last page (it is likely that there will be
- * tuples on other nearby pages as well, but those can be skipped).
+ * (caller must have the opportunity to set
+ * vacrel->scan_state->nonempty_pages). This rule avoids having
+ * lazy_truncate_heap() take access-exclusive lock on rel to attempt a
+ * truncation that fails anyway, just because there are tuples on the
+ * last page (it is likely that there will be tuples on other nearby
+ * pages as well, but those can be skipped).
*
* Implement this by always treating the last block as unsafe to skip.
*/
@@ -2117,7 +2393,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2493,7 +2769,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
progress_start_val[1] = vacrel->nindexes;
pgstat_progress_update_multi_param(2, progress_start_index, progress_start_val);
- if (!ParallelVacuumIsActive(vacrel))
+ if (!ParallelIndexVacuumIsActive(vacrel))
{
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
@@ -2943,12 +3219,8 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- /*
- * Initialize state for a parallel vacuum. As of now, only one worker can
- * be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
- */
- if (nworkers >= 0 && vacrel->nindexes > 1 && vacrel->do_index_vacuuming)
+ /* Initialize state for a parallel vacuum */
+ if (nworkers >= 0)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -2966,11 +3238,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
+ {
+ /*
+ * We initialize parallel heap scan/vacuuming or index vacuuming
+ * or both based on the table size and the number of indexes.
+ * Since only one worker can be used for an index, we will invoke
+ * parallelism for index vacuuming only if there are at least two
+ * indexes on a table.
+ */
vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
vacrel->nindexes, nworkers,
vac_work_mem,
vacrel->verbose ? INFO : DEBUG2,
- vacrel->bstrategy);
+ vacrel->bstrategy, (void *) vacrel);
+ }
/*
* If parallel mode started, dead_items and dead_items_info spaces are
@@ -3010,9 +3291,19 @@ dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *offsets,
};
int64 prog_val[2];
+ /*
+ * Protect both dead_items and dead_items_info from concurrent updates in
+ * parallel heap scan cases.
+ */
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreLockExclusive(vacrel->dead_items);
+
TidStoreSetBlockOffsets(vacrel->dead_items, blkno, offsets, num_offsets);
vacrel->dead_items_info->num_items += num_offsets;
+ if (ParallelHeapVacuumIsActive(vacrel))
+ TidStoreUnlock(vacrel->dead_items);
+
/* update the progress information */
prog_val[0] = vacrel->dead_items_info->num_items;
prog_val[1] = TidStoreMemoryUsage(vacrel->dead_items);
@@ -3212,6 +3503,448 @@ update_relstats_all_indexes(LVRelState *vacrel)
}
}
+/*
+ * Compute the number of parallel workers for parallel vacuum heap scan.
+ *
+ * The calculation logic is borrowed from compute_parallel_worker().
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int nrequested)
+{
+ int parallel_workers = 0;
+ int heap_parallel_threshold;
+ int heap_pages;
+
+ if (nrequested == 0)
+ {
+ /*
+ * Select the number of workers based on the log of the size of the
+ * relation. Note that the upper limit of the
+ * min_parallel_table_scan_size GUC is chosen to prevent overflow
+ * here.
+ */
+ heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
+ heap_pages = RelationGetNumberOfBlocks(rel);
+ while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
+ {
+ parallel_workers++;
+ heap_parallel_threshold *= 3;
+ if (heap_parallel_threshold > INT_MAX / 3)
+ break;
+ }
+ }
+ else
+ parallel_workers = nrequested;
+
+ return parallel_workers;
+}
+
+/* Estimate shared memory sizes required for parallel heap vacuum */
+static inline void
+heap_parallel_estimate_shared_memory_size(Relation rel, int nworkers, Size *pscan_len,
+ Size *shared_len, Size *pscanwork_len)
+{
+ Size size = 0;
+
+ size = add_size(size, SizeOfPHVShared);
+ size = add_size(size, mul_size(sizeof(LVRelScanState), nworkers));
+ *shared_len = size;
+
+ *pscan_len = table_block_parallelscan_estimate(rel);
+
+ *pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers);
+}
+
+/*
+ * Compute the amount of space we'll need in the parallel heap vacuum
+ * DSM, and inform pcxt->estimator about our needs.
+ *
+ * nworkers is the number of workers for the table vacuum. Note that it could
+ * be different than pcxt->nworkers since it is the maximum of number of
+ * workers for table vacuum and index vacuum.
+ */
+void
+heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ /* space for PHVShared */
+ shm_toc_estimate_chunk(&pcxt->estimator, shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for ParallelBlockTableScanDesc */
+ shm_toc_estimate_chunk(&pcxt->estimator, pscan_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* space for per-worker scan state, PHVScanWorkerState */
+ shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/*
+ * Set up shared memory for parallel heap vacuum.
+ */
+void
+heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state)
+{
+ LVRelState *vacrel = (LVRelState *) state;
+ PHVState *phvstate = vacrel->phvstate;
+ ParallelBlockTableScanDesc pscan;
+ PHVScanWorkerState *pscanwork;
+ PHVShared *shared;
+ Size pscan_len;
+ Size shared_len;
+ Size pscanwork_len;
+
+ phvstate = (PHVState *) palloc0(sizeof(PHVState));
+ phvstate->min_scanned_blkno = InvalidBlockNumber;
+
+ heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len,
+ &shared_len, &pscanwork_len);
+
+ shared = shm_toc_allocate(pcxt->toc, shared_len);
+
+ /* Prepare the shared information */
+
+ MemSet(shared, 0, shared_len);
+ shared->aggressive = vacrel->aggressive;
+ shared->skipwithvm = vacrel->skipwithvm;
+ shared->cutoffs = vacrel->cutoffs;
+ shared->NewRelfrozenXid = vacrel->scan_state->NewRelfrozenXid;
+ shared->NewRelminMxid = vacrel->scan_state->NewRelminMxid;
+ shared->skippedallvis = vacrel->scan_state->skippedallvis;
+
+ /*
+ * XXX: we copy the contents of vistest to the shared area, but in order
+ * to do that, we need to either expose GlobalVisTest or to provide
+ * functions to copy contents of GlobalVisTest to somewhere. Currently we
+ * do the former but not sure it's the best choice.
+ *
+ * Alternative idea is to have each worker determine cutoff and have their
+ * own vistest. But we need to carefully consider it since parallel
+ * workers end up having different cutoff and horizon.
+ */
+ shared->vistest = *vacrel->vistest;
+
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_KEY_SCAN_SHARED, shared);
+
+ phvstate->shared = shared;
+
+ /* prepare the parallel block table scan description */
+ pscan = shm_toc_allocate(pcxt->toc, pscan_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_KEY_SCAN_DESC, pscan);
+
+ /* initialize parallel scan description */
+ table_block_parallelscan_initialize(rel, (ParallelTableScanDesc) pscan);
+
+ /* Disable sync scan to always start from the first block */
+ pscan->base.phs_syncscan = false;
+
+ phvstate->pscandesc = pscan;
+
+ /* prepare the workers' parallel block table scan state */
+ pscanwork = shm_toc_allocate(pcxt->toc, pscanwork_len);
+ MemSet(pscanwork, 0, pscanwork_len);
+ shm_toc_insert(pcxt->toc, LV_PARALLEL_KEY_SCAN_DESC_WORKER, pscanwork);
+ phvstate->scanstates = pscanwork;
+
+ vacrel->phvstate = phvstate;
+}
+
+/*
+ * Main function for parallel heap vacuum workers.
+ */
+void
+heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ LVRelState vacrel = {0};
+ PHVState *phvstate;
+ PHVShared *shared;
+ ParallelBlockTableScanDesc pscandesc;
+ PHVScanWorkerState *scanstate;
+ LVRelScanState *scan_state;
+ ErrorContextCallback errcallback;
+ bool scan_done;
+
+ phvstate = palloc(sizeof(PHVState));
+
+ pscandesc = (ParallelBlockTableScanDesc) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_KEY_SCAN_DESC,
+ false);
+ phvstate->pscandesc = pscandesc;
+
+ shared = (PHVShared *) shm_toc_lookup(pwcxt->toc, LV_PARALLEL_KEY_SCAN_SHARED,
+ false);
+ phvstate->shared = shared;
+
+ scanstate = (PHVScanWorkerState *) shm_toc_lookup(pwcxt->toc,
+ LV_PARALLEL_KEY_SCAN_DESC_WORKER,
+ false);
+
+ phvstate->myscanstate = &(scanstate[ParallelWorkerNumber]);
+ scan_state = &(shared->worker_scan_state[ParallelWorkerNumber]);
+
+ /* Prepare LVRelState */
+ vacrel.rel = rel;
+ vacrel.indrels = parallel_vacuum_get_table_indexes(pvs, &vacrel.nindexes);
+ vacrel.pvs = pvs;
+ vacrel.phvstate = phvstate;
+ vacrel.aggressive = shared->aggressive;
+ vacrel.skipwithvm = shared->skipwithvm;
+ vacrel.cutoffs = shared->cutoffs;
+ vacrel.vistest = &(shared->vistest);
+ vacrel.dead_items = parallel_vacuum_get_dead_items(pvs,
+ &vacrel.dead_items_info);
+ vacrel.rel_pages = RelationGetNumberOfBlocks(rel);
+ vacrel.scan_state = scan_state;
+
+ /* initialize per-worker relation statistics */
+ MemSet(scan_state, 0, sizeof(LVRelScanState));
+
+ /* Set fields necessary for heap scan */
+ vacrel.scan_state->NewRelfrozenXid = shared->NewRelfrozenXid;
+ vacrel.scan_state->NewRelminMxid = shared->NewRelminMxid;
+ vacrel.scan_state->skippedallvis = shared->skippedallvis;
+
+ /* Initialize the per-worker scan state if not yet */
+ if (!phvstate->myscanstate->initialized)
+ {
+ table_block_parallelscan_startblock_init(rel,
+ &(phvstate->myscanstate->state),
+ phvstate->pscandesc);
+
+ phvstate->myscanstate->last_blkno = InvalidBlockNumber;
+ phvstate->myscanstate->maybe_have_blocks = false;
+ phvstate->myscanstate->initialized = true;
+ }
+
+ /*
+ * Setup error traceback support for ereport() for parallel table vacuum
+ * workers
+ */
+ vacrel.dbname = get_database_name(MyDatabaseId);
+ vacrel.relnamespace = get_database_name(RelationGetNamespace(rel));
+ vacrel.relname = pstrdup(RelationGetRelationName(rel));
+ vacrel.indname = NULL;
+ vacrel.phase = VACUUM_ERRCB_PHASE_SCAN_HEAP;
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = &vacrel;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ scan_done = do_lazy_scan_heap(&vacrel);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * If the leader or a worker finishes the heap scan because dead_items
+ * TIDs is close to the limit, it might have some allocated blocks in its
+ * scan state. Since this scan state might not be used in the next heap
+ * scan, we remember that it might have some unconsumed blocks so that the
+ * leader complete the scans after the heap scan phase finishes.
+ */
+ phvstate->myscanstate->maybe_have_blocks = !scan_done;
+}
+
+/*
+ * Complete parallel heaps scans that have remaining blocks in their
+ * chunks.
+ */
+static void
+parallel_heap_complete_unfinished_scan(LVRelState *vacrel)
+{
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+
+ nworkers = parallel_vacuum_get_nworkers_table(vacrel->pvs);
+
+ for (int i = 0; i < nworkers; i++)
+ {
+ PHVScanWorkerState *wstate = &(vacrel->phvstate->scanstates[i]);
+ bool scan_done PG_USED_FOR_ASSERTS_ONLY;
+
+ if (!wstate->maybe_have_blocks)
+ continue;
+
+ /* Attach the worker's scan state and do heap scan */
+ vacrel->phvstate->myscanstate = wstate;
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ Assert(scan_done);
+ }
+
+ /*
+ * We don't need to gather the scan results here because the leader's scan
+ * state got updated directly.
+ */
+}
+
+/*
+ * Compute the minimum block number we have scanned so far and update
+ * vacrel->min_scanned_blkno.
+ */
+static void
+parallel_heap_vacuum_compute_min_scanned_blkno(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+
+ /*
+ * We check all worker scan states here to compute the minimum block
+ * number among all scan states.
+ */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ PHVScanWorkerState *wstate = &(phvstate->scanstates[i]);
+
+ /* Skip if no worker has been initialized the scan state */
+ if (!wstate->initialized)
+ continue;
+
+ if (!BlockNumberIsValid(phvstate->min_scanned_blkno) ||
+ wstate->last_blkno < phvstate->min_scanned_blkno)
+ phvstate->min_scanned_blkno = wstate->last_blkno;
+ }
+}
+
+/* Accumulate each worker's scan results into the leader's */
+static void
+parallel_heap_vacuum_gather_scan_results(LVRelState *vacrel)
+{
+ PHVState *phvstate = vacrel->phvstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* Gather the workers' scan results */
+ for (int i = 0; i < phvstate->nworkers_launched; i++)
+ {
+ LVRelScanState *ss = &(phvstate->shared->worker_scan_state[i]);
+
+ vacrel->scan_state->scanned_pages += ss->scanned_pages;
+ vacrel->scan_state->removed_pages += ss->removed_pages;
+ vacrel->scan_state->vm_new_frozen_pages += ss->vm_new_frozen_pages;
+ vacrel->scan_state->lpdead_item_pages += ss->lpdead_item_pages;
+ vacrel->scan_state->missed_dead_pages += ss->missed_dead_pages;
+ vacrel->scan_state->tuples_deleted += ss->tuples_deleted;
+ vacrel->scan_state->tuples_frozen += ss->tuples_frozen;
+ vacrel->scan_state->lpdead_items += ss->lpdead_items;
+ vacrel->scan_state->live_tuples += ss->live_tuples;
+ vacrel->scan_state->recently_dead_tuples += ss->recently_dead_tuples;
+ vacrel->scan_state->missed_dead_tuples += ss->missed_dead_tuples;
+
+ if (ss->nonempty_pages < vacrel->scan_state->nonempty_pages)
+ vacrel->scan_state->nonempty_pages = ss->nonempty_pages;
+
+ if (TransactionIdPrecedes(ss->NewRelfrozenXid, vacrel->scan_state->NewRelfrozenXid))
+ vacrel->scan_state->NewRelfrozenXid = ss->NewRelfrozenXid;
+
+ if (MultiXactIdPrecedesOrEquals(ss->NewRelminMxid, vacrel->scan_state->NewRelminMxid))
+ vacrel->scan_state->NewRelminMxid = ss->NewRelminMxid;
+
+ if (!vacrel->scan_state->skippedallvis && ss->skippedallvis)
+ vacrel->scan_state->skippedallvis = true;
+ }
+
+ /* Also, compute the minimum block number we scanned so far */
+ parallel_heap_vacuum_compute_min_scanned_blkno(vacrel);
+}
+
+/*
+ * A parallel variant of do_lazy_scan_heap(). The leader process launches parallel
+ * workers to scan the heap in parallel.
+ */
+static void
+do_parallel_lazy_scan_heap(LVRelState *vacrel)
+{
+ PHVScanWorkerState *scanstate;
+
+ Assert(ParallelHeapVacuumIsActive(vacrel));
+ Assert(!IsParallelWorker());
+
+ /* launcher workers */
+ vacrel->phvstate->nworkers_launched = parallel_vacuum_table_scan_begin(vacrel->pvs);
+
+ /* initialize parallel scan description to join as a worker */
+ scanstate = palloc0(sizeof(PHVScanWorkerState));
+ scanstate->last_blkno = InvalidBlockNumber;
+ table_block_parallelscan_startblock_init(vacrel->rel, &(scanstate->state),
+ vacrel->phvstate->pscandesc);
+ vacrel->phvstate->myscanstate = scanstate;
+
+ for (;;)
+ {
+ bool scan_done;
+
+ /*
+ * Scan the table until either we are close to overrunning the
+ * available space for dead_items TIDs or we reach the end of the
+ * table.
+ */
+ scan_done = do_lazy_scan_heap(vacrel);
+
+ /* wait for parallel workers to finish and gather scan results */
+ parallel_vacuum_table_scan_end(vacrel->pvs);
+ parallel_heap_vacuum_gather_scan_results(vacrel);
+
+ /* We reach the end of the table */
+ if (scan_done)
+ break;
+
+ /*
+ * The parallel heap scan paused in the middle of the table due to
+ * full of dead_items TIDs. We perform a round of index and heap
+ * vacuuming and FSM vacuum.
+ */
+
+ /* Perform a round of index and heap vacuuming */
+ vacrel->consider_bypass_optimization = false;
+ lazy_vacuum(vacrel);
+
+ /*
+ * Vacuum the Free Space Map to make newly-freed space visible on
+ * upper-level FSM pages.
+ */
+ if (vacrel->phvstate->min_scanned_blkno > vacrel->next_fsm_block_to_vacuum)
+ {
+ /*
+ * min_scanned_blkno was updated when gathering the workers' scan
+ * results.
+ */
+ FreeSpaceMapVacuumRange(vacrel->rel, vacrel->next_fsm_block_to_vacuum,
+ vacrel->phvstate->min_scanned_blkno + 1);
+ vacrel->next_fsm_block_to_vacuum = vacrel->phvstate->min_scanned_blkno;
+ }
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+ /* Re-launch workers to restart parallel heap scan */
+ vacrel->phvstate->nworkers_launched =
+ parallel_vacuum_table_scan_begin(vacrel->pvs);
+ }
+
+ /*
+ * The parallel heap scan finished, but it's possible that some workers
+ * have allocated blocks but not processed them yet. This can happen for
+ * example when workers exit because they are full of dead_items TIDs and
+ * the leader process could launch fewer workers in the next cycle.
+ */
+ parallel_heap_complete_unfinished_scan(vacrel);
+}
+
/*
* Error context callback for errors occurring during vacuum. The error
* context messages for index phases should match the messages set in parallel
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 08011fde23f..9f8c8f09576 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -6,15 +6,24 @@
* This file contains routines that are intended to support setting up, using,
* and tearing down a ParallelVacuumState.
*
- * In a parallel vacuum, we perform both index bulk deletion and index cleanup
- * with parallel worker processes. Individual indexes are processed by one
- * vacuum process. ParallelVacuumState contains shared information as well as
- * the memory space for storing dead items allocated in the DSA area. We
- * launch parallel worker processes at the start of parallel index
- * bulk-deletion and index cleanup and once all indexes are processed, the
- * parallel worker processes exit. Each time we process indexes in parallel,
- * the parallel context is re-initialized so that the same DSM can be used for
- * multiple passes of index bulk-deletion and index cleanup.
+ * In a parallel vacuum, we perform table scan or both index bulk deletion and
+ * index cleanup or all of them with parallel worker processes. Different
+ * numbers of workers are launched for the table vacuuming and index processing.
+ * ParallelVacuumState contains shared information as well as the memory space
+ * for storing dead items allocated in the DSA area.
+ *
+ * When initializing parallel table vacuum scan, we invoke table AM routines for
+ * estimating DSM sizes and initializing DSM memory. Parallel table vacuum
+ * workers invoke the table AM routine for vacuuming the table.
+ *
+ * For processing indexes in parallel, individual indexes are processed by one
+ * vacuum process. We launch parallel worker processes at the start of parallel index
+ * bulk-deletion and index cleanup and once all indexes are processed, the parallel
+ * worker processes exit.
+ *
+ * Each time we process table or indexes in parallel, the parallel context is
+ * re-initialized so that the same DSM can be used for multiple passes of table vacuum
+ * or index bulk-deletion and index cleanup.
*
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -28,6 +37,7 @@
#include "access/amapi.h"
#include "access/table.h"
+#include "access/tableam.h"
#include "access/xact.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
@@ -65,6 +75,12 @@ typedef struct PVShared
int elevel;
uint64 queryid;
+ /*
+ * True if the caller wants parallel workers to invoke vacuum table scan
+ * callback.
+ */
+ bool do_vacuum_table_scan;
+
/*
* Fields for both index vacuum and cleanup.
*
@@ -101,6 +117,13 @@ typedef struct PVShared
*/
pg_atomic_uint32 cost_balance;
+ /*
+ * The number of workers for parallel table scan/vacuuming and index
+ * vacuuming, respectively.
+ */
+ int nworkers_for_table;
+ int nworkers_for_index;
+
/*
* Number of active parallel workers. This is used for computing the
* minimum threshold of the vacuum cost balance before a worker sleeps for
@@ -164,6 +187,9 @@ struct ParallelVacuumState
/* NULL for worker processes */
ParallelContext *pcxt;
+ /* Passed to parallel table scan workers. NULL for leader process */
+ ParallelWorkerContext *pwcxt;
+
/* Parent Heap Relation */
Relation heaprel;
@@ -193,6 +219,9 @@ struct ParallelVacuumState
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /* How many times parallel table vacuum scan is called? */
+ int num_table_scans;
+
/*
* False if the index is totally unsuitable target for all parallel
* processing. For example, the index could be <
@@ -224,8 +253,9 @@ struct ParallelVacuumState
PVIndVacStatus status;
};
-static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum);
+static void parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_for_table,
+ int *nworkers_for_index, bool *will_parallel_vacuum);
static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
@@ -244,7 +274,7 @@ static void parallel_vacuum_error_callback(void *arg);
ParallelVacuumState *
parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
int nrequested_workers, int vac_work_mem,
- int elevel, BufferAccessStrategy bstrategy)
+ int elevel, BufferAccessStrategy bstrategy, void *state)
{
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -258,6 +288,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
Size est_shared_len;
int nindexes_mwm = 0;
int parallel_workers = 0;
+ int nworkers_for_table;
+ int nworkers_for_index;
int querylen;
/*
@@ -265,15 +297,17 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
* relation
*/
Assert(nrequested_workers >= 0);
- Assert(nindexes > 0);
/*
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
- nrequested_workers,
- will_parallel_vacuum);
+ parallel_vacuum_compute_workers(rel, indrels, nindexes, nrequested_workers,
+ &nworkers_for_table, &nworkers_for_index,
+ will_parallel_vacuum);
+
+ parallel_workers = Max(nworkers_for_table, nworkers_for_index);
+
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- return NULL */
@@ -329,6 +363,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
else
querylen = 0; /* keep compiler quiet */
+ /* Estimate AM-specific space for parallel table vacuum */
+ if (nworkers_for_table > 0)
+ table_parallel_vacuum_estimate(rel, pcxt, nworkers_for_table, state);
+
InitializeParallelDSM(pcxt);
/* Prepare index vacuum stats */
@@ -373,6 +411,8 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
shared->relid = RelationGetRelid(rel);
shared->elevel = elevel;
shared->queryid = pgstat_get_my_query_id();
+ shared->nworkers_for_table = nworkers_for_table;
+ shared->nworkers_for_index = nworkers_for_index;
shared->maintenance_work_mem_worker =
(nindexes_mwm > 0) ?
maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
@@ -421,6 +461,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+ /* Prepare AM-specific DSM for parallel table vacuum */
+ if (nworkers_for_table > 0)
+ table_parallel_vacuum_initialize(rel, pcxt, nworkers_for_table, state);
+
/* Success -- return parallel vacuum state */
return pvs;
}
@@ -534,33 +578,48 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
}
/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers.
- * The index is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
+ * Compute the number of parallel worker processes to request for table
+ * vacuum and index vacuum/cleanup.
+ *
+ * For parallel table vacuum, it asks AM-specific routine to compute the
+ * number of parallel worker processes. The result is set to *nworkers_table.
*
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
+ * For parallel index vacuum, The index is eligible for parallel vacuum iff
+ * its size is greater than min_parallel_index_scan_size as invoking workers
+ * for very small indexes can hurt performance. nrequested is the number of
+ * parallel workers that user requested. If nrequested is 0, we compute the
+ * parallel degree based on nindexes, that is the number of indexes that
+ * support parallel vacuum. This function also sets will_parallel_vacuum to
+ * remember indexes that participate in parallel vacuum.
*/
-static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
- bool *will_parallel_vacuum)
+static void
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_for_table,
+ int *nworkers_for_index, bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int parallel_workers_table = 0;
+ int parallel_workers_index = 0;
/*
* We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
+ {
+ *nworkers_for_table = 0;
+ *nworkers_for_index = 0;
+ return;
+ }
+
+ /*
+ * Compute the number of workers for parallel table scan. Cap by
+ * max_parallel_maintenance_workers.
+ */
+ parallel_workers_table = Min(table_parallel_vacuum_compute_workers(rel, nrequested),
+ max_parallel_maintenance_workers);
/*
* Compute the number of indexes that can participate in parallel vacuum.
@@ -591,17 +650,18 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
nindexes_parallel--;
/* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Compute the parallel degree for parallel index vacuum */
+ parallel_workers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers_index = Min(parallel_workers_index, max_parallel_maintenance_workers);
+ }
- return parallel_workers;
+ *nworkers_for_table = parallel_workers_table;
+ *nworkers_for_index = parallel_workers_index;
}
/*
@@ -669,8 +729,12 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
- /* Reinitialize parallel context to relaunch parallel workers */
- if (pvs->num_index_scans > 0)
+ /*
+ * Reinitialize parallel context to relaunch parallel workers if we
+ * have used the parallel context for either index vacuuming or table
+ * vacuuming.
+ */
+ if (pvs->num_index_scans > 0 || pvs->num_table_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -982,6 +1046,146 @@ parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
return true;
}
+/*
+ * Prepare DSM and shared vacuum delays, and launch parallel workers for parallel
+ * table vacuum. Return the number of parallel workers launched.
+ *
+ * The caller must call parallel_vacuum_table_scan_end() to finish the parallel
+ * table vacuum.
+ */
+int
+parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->shared->nworkers_for_table == 0)
+ return 0;
+
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ pvs->shared->do_vacuum_table_scan = true;
+
+ if (pvs->num_table_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * The number of workers might vary between table vacuum and index
+ * processing
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, pvs->shared->nworkers_for_table);
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have already
+ * accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+
+ /* Include the worker count for the leader itself */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+ }
+
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for table processing (planned: %d)",
+ "launched %d parallel vacuum workers for table processing (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, pvs->shared->nworkers_for_table)));
+
+ return pvs->pcxt->nworkers_launched;
+}
+
+/*
+ * Wait for all workers for parallel table vacuum scan, and gather statistics.
+ */
+void
+parallel_vacuum_table_scan_end(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ if (pvs->shared->nworkers_for_table == 0)
+ return;
+
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ /* Decrement the worker count for the leader itself */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->shared->do_vacuum_table_scan = false;
+ pvs->num_table_scans++;
+}
+
+/*
+ * Return the array of indexes associated to the given table to be vacuumed.
+ */
+Relation *
+parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes)
+{
+ *nindexes = pvs->nindexes;
+
+ return pvs->indrels;
+}
+
+/*
+ * Return the number of workers for parallel table vacuum.
+ */
+int
+parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_table;
+}
+
+/*
+ * Return the number of workers for parallel index processing.
+ */
+int
+parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs)
+{
+ return pvs->shared->nworkers_for_index;
+}
+
+/*
+ * A parallel worker invokes table-AM specified vacuum scan callback.
+ */
+static void
+parallel_vacuum_process_table(ParallelVacuumState *pvs)
+{
+ Assert(VacuumActiveNWorkers);
+ Assert(pvs->shared->do_vacuum_table_scan);
+
+ /* Increment the active worker before starting the table vacuum */
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Do table vacuum scan */
+ table_parallel_vacuum_relation_worker(pvs->heaprel, pvs, pvs->pwcxt);
+
+ /*
+ * We have completed the table vacuum so decrement the active worker
+ * count.
+ */
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
/*
* Perform work within a launched parallel process.
*
@@ -1033,7 +1237,6 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
* matched to the leader's one.
*/
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
if (shared->maintenance_work_mem_worker > 0)
maintenance_work_mem = shared->maintenance_work_mem_worker;
@@ -1064,6 +1267,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
pvs.relname = pstrdup(RelationGetRelationName(rel));
pvs.heaprel = rel;
+ pvs.pwcxt = palloc(sizeof(ParallelWorkerContext));
+ pvs.pwcxt->toc = toc;
+ pvs.pwcxt->seg = seg;
+
/* These fields will be filled during index vacuum or cleanup */
pvs.indname = NULL;
pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
@@ -1081,8 +1288,16 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
/* Prepare to track buffer usage during parallel execution */
InstrStartParallelQuery();
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&pvs);
+ if (pvs.shared->do_vacuum_table_scan)
+ {
+ /* Process table to perform vacuum */
+ parallel_vacuum_process_table(&pvs);
+ }
+ else
+ {
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+ }
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 2e54c11f880..4813a07860d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -99,80 +99,6 @@ typedef struct ProcArrayStruct
int pgprocnos[FLEXIBLE_ARRAY_MEMBER];
} ProcArrayStruct;
-/*
- * State for the GlobalVisTest* family of functions. Those functions can
- * e.g. be used to decide if a deleted row can be removed without violating
- * MVCC semantics: If the deleted row's xmax is not considered to be running
- * by anyone, the row can be removed.
- *
- * To avoid slowing down GetSnapshotData(), we don't calculate a precise
- * cutoff XID while building a snapshot (looking at the frequently changing
- * xmins scales badly). Instead we compute two boundaries while building the
- * snapshot:
- *
- * 1) definitely_needed, indicating that rows deleted by XIDs >=
- * definitely_needed are definitely still visible.
- *
- * 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
- * definitely be removed
- *
- * When testing an XID that falls in between the two (i.e. XID >= maybe_needed
- * && XID < definitely_needed), the boundaries can be recomputed (using
- * ComputeXidHorizons()) to get a more accurate answer. This is cheaper than
- * maintaining an accurate value all the time.
- *
- * As it is not cheap to compute accurate boundaries, we limit the number of
- * times that happens in short succession. See GlobalVisTestShouldUpdate().
- *
- *
- * There are three backend lifetime instances of this struct, optimized for
- * different types of relations. As e.g. a normal user defined table in one
- * database is inaccessible to backends connected to another database, a test
- * specific to a relation can be more aggressive than a test for a shared
- * relation. Currently we track four different states:
- *
- * 1) GlobalVisSharedRels, which only considers an XID's
- * effects visible-to-everyone if neither snapshots in any database, nor a
- * replication slot's xmin, nor a replication slot's catalog_xmin might
- * still consider XID as running.
- *
- * 2) GlobalVisCatalogRels, which only considers an XID's
- * effects visible-to-everyone if neither snapshots in the current
- * database, nor a replication slot's xmin, nor a replication slot's
- * catalog_xmin might still consider XID as running.
- *
- * I.e. the difference to GlobalVisSharedRels is that
- * snapshot in other databases are ignored.
- *
- * 3) GlobalVisDataRels, which only considers an XID's
- * effects visible-to-everyone if neither snapshots in the current
- * database, nor a replication slot's xmin consider XID as running.
- *
- * I.e. the difference to GlobalVisCatalogRels is that
- * replication slot's catalog_xmin is not taken into account.
- *
- * 4) GlobalVisTempRels, which only considers the current session, as temp
- * tables are not visible to other sessions.
- *
- * GlobalVisTestFor(relation) returns the appropriate state
- * for the relation.
- *
- * The boundaries are FullTransactionIds instead of TransactionIds to avoid
- * wraparound dangers. There e.g. would otherwise exist no procarray state to
- * prevent maybe_needed to become old enough after the GetSnapshotData()
- * call.
- *
- * The typedef is in the header.
- */
-struct GlobalVisState
-{
- /* XIDs >= are considered running by some backend */
- FullTransactionId definitely_needed;
-
- /* XIDs < are not considered to be running by any backend */
- FullTransactionId maybe_needed;
-};
-
/*
* Result of ComputeXidHorizons().
*/
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7d06dad83fc..94438eff25c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -21,6 +21,7 @@
#include "access/skey.h"
#include "access/table.h" /* for backward compatibility */
#include "access/tableam.h"
+#include "commands/vacuum.h"
#include "nodes/lockoptions.h"
#include "nodes/primnodes.h"
#include "storage/bufpage.h"
@@ -401,6 +402,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
+extern int heap_parallel_vacuum_compute_workers(Relation rel, int requested);
+extern void heap_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt,
+ int nworkers, void *state);
+extern void heap_parallel_vacuum_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 09b9b394e0e..d7d74514a60 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
#include "access/relscan.h"
#include "access/sdir.h"
#include "access/xact.h"
+#include "commands/vacuum.h"
#include "executor/tuptable.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
@@ -654,6 +655,47 @@ typedef struct TableAmRoutine
struct VacuumParams *params,
BufferAccessStrategy bstrategy);
+ /* ------------------------------------------------------------------------
+ * Callbacks for parallel table vacuum.
+ * ------------------------------------------------------------------------
+ */
+
+ /*
+ * Compute the number of parallel workers for parallel table vacuum. The
+ * function must return 0 to disable parallel table vacuum.
+ */
+ int (*parallel_vacuum_compute_workers) (Relation rel, int requested);
+
+ /*
+ * Estimate the size of shared memory that the parallel table vacuum needs
+ * for AM
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_estimate) (Relation rel,
+ ParallelContext *pcxt,
+ int nworkers,
+ void *state);
+
+ /*
+ * Initialize DSM space for parallel table vacuum.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_initialize) (Relation rel,
+ ParallelContext *pctx,
+ int nworkers,
+ void *state);
+
+ /*
+ * This callback is called for parallel table vacuum workers.
+ *
+ * Not called if parallel table vacuum is disabled.
+ */
+ void (*parallel_vacuum_relation_worker) (Relation rel,
+ ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt);
+
/*
* Prepare to analyze block `blockno` of `scan`. The scan has been started
* with table_beginscan_analyze(). See also
@@ -1715,6 +1757,52 @@ table_relation_vacuum(Relation rel, struct VacuumParams *params,
rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
}
+/* ----------------------------------------------------------------------------
+ * Parallel vacuum related functions.
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * Return the number of parallel workers for a parallel vacuum scan of this
+ * relation.
+ */
+static inline int
+table_parallel_vacuum_compute_workers(Relation rel, int requested)
+{
+ return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
+}
+
+/*
+ * Estimate the size of shared memory needed for a parallel vacuum scan of this
+ * of this relation.
+ */
+static inline void
+table_parallel_vacuum_estimate(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_estimate(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Initialize shared memory area for a parallel vacuum scan of this relation.
+ */
+static inline void
+table_parallel_vacuum_initialize(Relation rel, ParallelContext *pcxt, int nworkers,
+ void *state)
+{
+ rel->rd_tableam->parallel_vacuum_initialize(rel, pcxt, nworkers, state);
+}
+
+/*
+ * Start a parallel table vacuuming for this relation.
+ */
+static inline void
+table_parallel_vacuum_relation_worker(Relation rel, ParallelVacuumState *pvs,
+ ParallelWorkerContext *pwcxt)
+{
+ rel->rd_tableam->parallel_vacuum_relation_worker(rel, pvs, pwcxt);
+}
+
/*
* Prepare to analyze the next block in the read stream. The scan needs to
* have been started with table_beginscan_analyze(). Note that this routine
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index e7b7753b691..d45866d61e5 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -360,7 +360,8 @@ extern void VacuumUpdateCosts(void);
extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
int nindexes, int nrequested_workers,
int vac_work_mem, int elevel,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy,
+ void *state);
extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
@@ -370,6 +371,11 @@ extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
bool estimated_count);
+extern int parallel_vacuum_table_scan_begin(ParallelVacuumState *pvs);
+extern void parallel_vacuum_table_scan_end(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_table(ParallelVacuumState *pvs);
+extern int parallel_vacuum_get_nworkers_index(ParallelVacuumState *pvs);
+extern Relation *parallel_vacuum_get_table_indexes(ParallelVacuumState *pvs, int *nindexes);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..3b6fb603544 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -17,6 +17,7 @@
#include "utils/relcache.h"
#include "utils/resowner.h"
#include "utils/snapshot.h"
+#include "utils/snapmgr_internal.h"
extern PGDLLIMPORT bool FirstSnapshotSet;
@@ -95,7 +96,6 @@ extern char *ExportSnapshot(Snapshot snapshot);
* These live in procarray.c because they're intimately linked to the
* procarray contents, but thematically they better fit into snapmgr.h.
*/
-typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
diff --git a/src/include/utils/snapmgr_internal.h b/src/include/utils/snapmgr_internal.h
new file mode 100644
index 00000000000..4363adf7f62
--- /dev/null
+++ b/src/include/utils/snapmgr_internal.h
@@ -0,0 +1,91 @@
+/*-------------------------------------------------------------------------
+ *
+ * snapmgr_internal.h
+ * This file contains declarations of structs for snapshot manager
+ * for internal use.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/utils/snapmgr_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SNAPMGR_INTERNAL_H
+#define SNAPMGR_INTERNAL_H
+
+#include "access/transam.h"
+
+/*
+ * State for the GlobalVisTest* family of functions. Those functions can
+ * e.g. be used to decide if a deleted row can be removed without violating
+ * MVCC semantics: If the deleted row's xmax is not considered to be running
+ * by anyone, the row can be removed.
+ *
+ * To avoid slowing down GetSnapshotData(), we don't calculate a precise
+ * cutoff XID while building a snapshot (looking at the frequently changing
+ * xmins scales badly). Instead we compute two boundaries while building the
+ * snapshot:
+ *
+ * 1) definitely_needed, indicating that rows deleted by XIDs >=
+ * definitely_needed are definitely still visible.
+ *
+ * 2) maybe_needed, indicating that rows deleted by XIDs < maybe_needed can
+ * definitely be removed
+ *
+ * When testing an XID that falls in between the two (i.e. XID >= maybe_needed
+ * && XID < definitely_needed), the boundaries can be recomputed (using
+ * ComputeXidHorizons()) to get a more accurate answer. This is cheaper than
+ * maintaining an accurate value all the time.
+ *
+ * As it is not cheap to compute accurate boundaries, we limit the number of
+ * times that happens in short succession. See GlobalVisTestShouldUpdate().
+ *
+ *
+ * There are three backend lifetime instances of this struct, optimized for
+ * different types of relations. As e.g. a normal user defined table in one
+ * database is inaccessible to backends connected to another database, a test
+ * specific to a relation can be more aggressive than a test for a shared
+ * relation. Currently we track four different states:
+ *
+ * 1) GlobalVisSharedRels, which only considers an XID's
+ * effects visible-to-everyone if neither snapshots in any database, nor a
+ * replication slot's xmin, nor a replication slot's catalog_xmin might
+ * still consider XID as running.
+ *
+ * 2) GlobalVisCatalogRels, which only considers an XID's
+ * effects visible-to-everyone if neither snapshots in the current
+ * database, nor a replication slot's xmin, nor a replication slot's
+ * catalog_xmin might still consider XID as running.
+ *
+ * I.e. the difference to GlobalVisSharedRels is that
+ * snapshot in other databases are ignored.
+ *
+ * 3) GlobalVisDataRels, which only considers an XID's
+ * effects visible-to-everyone if neither snapshots in the current
+ * database, nor a replication slot's xmin consider XID as running.
+ *
+ * I.e. the difference to GlobalVisCatalogRels is that
+ * replication slot's catalog_xmin is not taken into account.
+ *
+ * 4) GlobalVisTempRels, which only considers the current session, as temp
+ * tables are not visible to other sessions.
+ *
+ * GlobalVisTestFor(relation) returns the appropriate state
+ * for the relation.
+ *
+ * The boundaries are FullTransactionIds instead of TransactionIds to avoid
+ * wraparound dangers. There e.g. would otherwise exist no procarray state to
+ * prevent maybe_needed to become old enough after the GetSnapshotData()
+ * call.
+ */
+typedef struct GlobalVisState
+{
+ /* XIDs >= are considered running by some backend */
+ FullTransactionId definitely_needed;
+
+ /* XIDs < are not considered to be running by any backend */
+ FullTransactionId maybe_needed;
+} GlobalVisState;
+
+#endif /* SNAPMGR_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 80202d4a824..ede0da49ce0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1842,6 +1842,9 @@ PGresAttValue
PGresParamDesc
PGresult
PGresult_data
+PHVScanWorkerState
+PHVShared
+PHVState
PIO_STATUS_BLOCK
PLAINTREE
PLAssignStmt
--
2.43.5
v6-0002-Remember-the-number-of-times-parallel-index-vacuu.patchapplication/octet-stream; name=v6-0002-Remember-the-number-of-times-parallel-index-vacuu.patchDownload
From 26028c5b9838dc3c1b688d23bc1285c455f4409f Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 13 Dec 2024 15:54:32 -0800
Subject: [PATCH v6 2/8] Remember the number of times parallel index
vacuuming/cleanup is executed in ParallelVacuumState.
Previously, the caller can passes an arbitrary value for
'num_index_scans' to parallel index vacuuming or cleaning up APIs, but
it didn't make sense since if the caller needs to be careful about
counting how many times it executed index vacuuming or cleaning
up. Otherwise, it fails to reinitialize parallel DSM.
This commit changes parallel vacuum APIs so that ParallelVacuumState
has the counter num_index_scans and re-initialize parallel DSM based
on that.
An upcoming patch for parallel table scan will do a similar thing.
Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
src/backend/access/heap/vacuumlazy.c | 4 +---
src/backend/commands/vacuumparallel.c | 27 +++++++++++++++------------
src/include/commands/vacuum.h | 4 +---
3 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 75cd67395f4..116c0612ca5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2143,8 +2143,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, old_live_tuples,
- vacrel->num_index_scans);
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2514,7 +2513,6 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
{
/* Outsource everything to parallel variant */
parallel_vacuum_cleanup_all_indexes(vacrel->pvs, reltuples,
- vacrel->num_index_scans,
estimated_count);
}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 0d92e694d6a..08011fde23f 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -200,6 +200,9 @@ struct ParallelVacuumState
*/
bool *will_parallel_vacuum;
+ /* How many time index vacuuming or cleaning up is executed? */
+ int num_index_scans;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -223,8 +226,7 @@ struct ParallelVacuumState
static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
bool *will_parallel_vacuum);
-static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
- bool vacuum);
+static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum);
static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
@@ -497,8 +499,7 @@ parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
* Do parallel index bulk-deletion with parallel workers.
*/
void
-parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
- int num_index_scans)
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
{
Assert(!IsParallelWorker());
@@ -509,7 +510,7 @@ parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tup
pvs->shared->reltuples = num_table_tuples;
pvs->shared->estimated_count = true;
- parallel_vacuum_process_all_indexes(pvs, num_index_scans, true);
+ parallel_vacuum_process_all_indexes(pvs, true);
}
/*
@@ -517,7 +518,7 @@ parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tup
*/
void
parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
- int num_index_scans, bool estimated_count)
+ bool estimated_count)
{
Assert(!IsParallelWorker());
@@ -529,7 +530,7 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
pvs->shared->reltuples = num_table_tuples;
pvs->shared->estimated_count = estimated_count;
- parallel_vacuum_process_all_indexes(pvs, num_index_scans, false);
+ parallel_vacuum_process_all_indexes(pvs, false);
}
/*
@@ -608,8 +609,7 @@ parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
* must be used by the parallel vacuum leader process.
*/
static void
-parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
- bool vacuum)
+parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum)
{
int nworkers;
PVIndVacStatus new_status;
@@ -631,7 +631,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
nworkers = pvs->nindexes_parallel_cleanup;
/* Add conditionally parallel-aware indexes if in the first time call */
- if (num_index_scans == 0)
+ if (pvs->num_index_scans == 0)
nworkers += pvs->nindexes_parallel_condcleanup;
}
@@ -659,7 +659,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
indstats->parallel_workers_can_process =
(pvs->will_parallel_vacuum[i] &&
parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
- num_index_scans,
+ pvs->num_index_scans,
vacuum));
}
@@ -670,7 +670,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
if (nworkers > 0)
{
/* Reinitialize parallel context to relaunch parallel workers */
- if (num_index_scans > 0)
+ if (pvs->num_index_scans > 0)
ReinitializeParallelDSM(pvs->pcxt);
/*
@@ -764,6 +764,9 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
VacuumSharedCostBalance = NULL;
VacuumActiveNWorkers = NULL;
}
+
+ /* Increment the counter */
+ pvs->num_index_scans++;
}
/*
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 12d0b61950d..e7b7753b691 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -366,11 +366,9 @@ extern TidStore *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs,
VacDeadItemsInfo **dead_items_info_p);
extern void parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs);
extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
- long num_table_tuples,
- int num_index_scans);
+ long num_table_tuples);
extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
long num_table_tuples,
- int num_index_scans,
bool estimated_count);
extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
--
2.43.5
v6-0001-Move-lazy-heap-scanning-related-variables-to-stru.patchapplication/octet-stream; name=v6-0001-Move-lazy-heap-scanning-related-variables-to-stru.patchDownload
From b18a0b8d0e4ccd5dda981527adecacfc14ce91c3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 15 Nov 2024 14:14:13 -0800
Subject: [PATCH v6 1/8] Move lazy heap scanning related variables to struct
LVRelScanState.
---
src/backend/access/heap/vacuumlazy.c | 304 ++++++++++++++-------------
src/tools/pgindent/typedefs.list | 1 +
2 files changed, 159 insertions(+), 146 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09fab08b8e1..75cd67395f4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -131,6 +131,47 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE,
} VacErrPhase;
+/*
+ * Relation statistics collected during heap scanning.
+ */
+typedef struct LVRelScanState
+{
+ BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
+ BlockNumber removed_pages; /* # pages removed by relation truncation */
+ BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
+
+ /* # pages newly set all-visible in the VM */
+ BlockNumber vm_new_visible_pages;
+
+ /*
+ * # pages newly set all-visible and all-frozen in the VM. This is a
+ * subset of vm_new_visible_pages. That is, vm_new_visible_pages includes
+ * all pages set all-visible, but vm_new_visible_frozen_pages includes
+ * only those which were also set all-frozen.
+ */
+ BlockNumber vm_new_visible_frozen_pages;
+
+ /* # all-visible pages newly set all-frozen in the VM */
+ BlockNumber vm_new_frozen_pages;
+
+ BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
+ BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+
+ /* Counters that follow are only for scanned_pages */
+ int64 tuples_deleted; /* # deleted from table */
+ int64 tuples_frozen; /* # newly frozen */
+ int64 lpdead_items; /* # deleted from indexes */
+ int64 live_tuples; /* # live tuples remaining */
+ int64 recently_dead_tuples; /* # dead, but not yet removable */
+ int64 missed_dead_tuples; /* # removable, but not removed */
+
+ /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid. */
+ TransactionId NewRelfrozenXid;
+ MultiXactId NewRelminMxid;
+ bool skippedallvis;
+} LVRelScanState;
+
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -157,10 +198,6 @@ typedef struct LVRelState
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
- /* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
- TransactionId NewRelfrozenXid;
- MultiXactId NewRelminMxid;
- bool skippedallvis;
/* Error reporting state */
char *dbname;
@@ -186,43 +223,18 @@ typedef struct LVRelState
VacDeadItemsInfo *dead_items_info;
BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
- BlockNumber removed_pages; /* # pages removed by relation truncation */
- BlockNumber new_frozen_tuple_pages; /* # pages with newly frozen tuples */
-
- /* # pages newly set all-visible in the VM */
- BlockNumber vm_new_visible_pages;
-
- /*
- * # pages newly set all-visible and all-frozen in the VM. This is a
- * subset of vm_new_visible_pages. That is, vm_new_visible_pages includes
- * all pages set all-visible, but vm_new_visible_frozen_pages includes
- * only those which were also set all-frozen.
- */
- BlockNumber vm_new_visible_frozen_pages;
- /* # all-visible pages newly set all-frozen in the VM */
- BlockNumber vm_new_frozen_pages;
+ /* Working state for heap scanning and vacuuming */
+ LVRelScanState *scan_state;
- BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
- BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-
- /* Statistics output by us, for table */
- double new_rel_tuples; /* new estimated total # of tuples */
- double new_live_tuples; /* new estimated total # of live tuples */
+ /* New estimated total # of tuples and total # of live tuples */
+ double new_rel_tuples;
+ double new_live_tuples;
/* Statistics output by index AMs */
IndexBulkDeleteResult **indstats;
/* Instrumentation counters */
int num_index_scans;
- /* Counters that follow are only for scanned_pages */
- int64 tuples_deleted; /* # deleted from table */
- int64 tuples_frozen; /* # newly frozen */
- int64 lpdead_items; /* # deleted from indexes */
- int64 live_tuples; /* # live tuples remaining */
- int64 recently_dead_tuples; /* # dead, but not yet removable */
- int64 missed_dead_tuples; /* # removable, but not removed */
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
@@ -309,6 +321,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelState *vacrel;
+ LVRelScanState *scan_state;
bool verbose,
instrument,
skipwithvm,
@@ -420,12 +433,23 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
}
/* Initialize page counters explicitly (be tidy) */
- vacrel->scanned_pages = 0;
- vacrel->removed_pages = 0;
- vacrel->new_frozen_tuple_pages = 0;
- vacrel->lpdead_item_pages = 0;
- vacrel->missed_dead_pages = 0;
- vacrel->nonempty_pages = 0;
+ scan_state = palloc(sizeof(LVRelScanState));
+ scan_state->scanned_pages = 0;
+ scan_state->removed_pages = 0;
+ scan_state->new_frozen_tuple_pages = 0;
+ scan_state->lpdead_item_pages = 0;
+ scan_state->missed_dead_pages = 0;
+ scan_state->nonempty_pages = 0;
+ scan_state->tuples_deleted = 0;
+ scan_state->tuples_frozen = 0;
+ scan_state->lpdead_items = 0;
+ scan_state->live_tuples = 0;
+ scan_state->recently_dead_tuples = 0;
+ scan_state->missed_dead_tuples = 0;
+ scan_state->vm_new_visible_pages = 0;
+ scan_state->vm_new_visible_frozen_pages = 0;
+ scan_state->vm_new_frozen_pages = 0;
+ vacrel->scan_state = scan_state;
/* dead_items_alloc allocates vacrel->dead_items later on */
/* Allocate/initialize output statistics state */
@@ -434,19 +458,6 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->indstats = (IndexBulkDeleteResult **)
palloc0(vacrel->nindexes * sizeof(IndexBulkDeleteResult *));
- /* Initialize remaining counters (be tidy) */
- vacrel->num_index_scans = 0;
- vacrel->tuples_deleted = 0;
- vacrel->tuples_frozen = 0;
- vacrel->lpdead_items = 0;
- vacrel->live_tuples = 0;
- vacrel->recently_dead_tuples = 0;
- vacrel->missed_dead_tuples = 0;
-
- vacrel->vm_new_visible_pages = 0;
- vacrel->vm_new_visible_frozen_pages = 0;
- vacrel->vm_new_frozen_pages = 0;
-
/*
* Get cutoffs that determine which deleted tuples are considered DEAD,
* not just RECENTLY_DEAD, and which XIDs/MXIDs to freeze. Then determine
@@ -467,9 +478,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
vacrel->vistest = GlobalVisTestFor(rel);
/* Initialize state used to track oldest extant XID/MXID */
- vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
- vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- vacrel->skippedallvis = false;
+ vacrel->scan_state->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
+ vacrel->scan_state->NewRelminMxid = vacrel->cutoffs.OldestMxact;
+ vacrel->scan_state->skippedallvis = false;
skipwithvm = true;
if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
{
@@ -550,15 +561,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* value >= FreezeLimit, and relminmxid to a value >= MultiXactCutoff.
* Non-aggressive VACUUMs may advance them by any amount, or not at all.
*/
- Assert(vacrel->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
+ Assert(vacrel->scan_state->NewRelfrozenXid == vacrel->cutoffs.OldestXmin ||
TransactionIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.FreezeLimit :
vacrel->cutoffs.relfrozenxid,
- vacrel->NewRelfrozenXid));
- Assert(vacrel->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
+ vacrel->scan_state->NewRelfrozenXid));
+ Assert(vacrel->scan_state->NewRelminMxid == vacrel->cutoffs.OldestMxact ||
MultiXactIdPrecedesOrEquals(vacrel->aggressive ? vacrel->cutoffs.MultiXactCutoff :
vacrel->cutoffs.relminmxid,
- vacrel->NewRelminMxid));
- if (vacrel->skippedallvis)
+ vacrel->scan_state->NewRelminMxid));
+ if (vacrel->scan_state->skippedallvis)
{
/*
* Must keep original relfrozenxid in a non-aggressive VACUUM that
@@ -566,8 +577,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* values will have missed unfrozen XIDs from the pages we skipped.
*/
Assert(!vacrel->aggressive);
- vacrel->NewRelfrozenXid = InvalidTransactionId;
- vacrel->NewRelminMxid = InvalidMultiXactId;
+ vacrel->scan_state->NewRelfrozenXid = InvalidTransactionId;
+ vacrel->scan_state->NewRelminMxid = InvalidMultiXactId;
}
/*
@@ -588,7 +599,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
*/
vac_update_relstats(rel, new_rel_pages, vacrel->new_live_tuples,
new_rel_allvisible, vacrel->nindexes > 0,
- vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
+ vacrel->scan_state->NewRelfrozenXid, vacrel->scan_state->NewRelminMxid,
&frozenxid_updated, &minmulti_updated, false);
/*
@@ -604,8 +615,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
pgstat_report_vacuum(RelationGetRelid(rel),
rel->rd_rel->relisshared,
Max(vacrel->new_live_tuples, 0),
- vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples);
+ vacrel->scan_state->recently_dead_tuples +
+ vacrel->scan_state->missed_dead_tuples);
pgstat_progress_end_command();
if (instrument)
@@ -678,21 +689,21 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->relname,
vacrel->num_index_scans);
appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
- vacrel->removed_pages,
+ vacrel->scan_state->removed_pages,
new_rel_pages,
- vacrel->scanned_pages,
+ vacrel->scan_state->scanned_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->scanned_pages / orig_rel_pages);
+ 100.0 * vacrel->scan_state->scanned_pages / orig_rel_pages);
appendStringInfo(&buf,
_("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
- (long long) vacrel->tuples_deleted,
+ (long long) vacrel->scan_state->tuples_deleted,
(long long) vacrel->new_rel_tuples,
- (long long) vacrel->recently_dead_tuples);
- if (vacrel->missed_dead_tuples > 0)
+ (long long) vacrel->scan_state->recently_dead_tuples);
+ if (vacrel->scan_state->missed_dead_tuples > 0)
appendStringInfo(&buf,
_("tuples missed: %lld dead from %u pages not removed due to cleanup lock contention\n"),
- (long long) vacrel->missed_dead_tuples,
- vacrel->missed_dead_pages);
+ (long long) vacrel->scan_state->missed_dead_tuples,
+ vacrel->scan_state->missed_dead_pages);
diff = (int32) (ReadNextTransactionId() -
vacrel->cutoffs.OldestXmin);
appendStringInfo(&buf,
@@ -700,33 +711,33 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
vacrel->cutoffs.OldestXmin, diff);
if (frozenxid_updated)
{
- diff = (int32) (vacrel->NewRelfrozenXid -
+ diff = (int32) (vacrel->scan_state->NewRelfrozenXid -
vacrel->cutoffs.relfrozenxid);
appendStringInfo(&buf,
_("new relfrozenxid: %u, which is %d XIDs ahead of previous value\n"),
- vacrel->NewRelfrozenXid, diff);
+ vacrel->scan_state->NewRelfrozenXid, diff);
}
if (minmulti_updated)
{
- diff = (int32) (vacrel->NewRelminMxid -
+ diff = (int32) (vacrel->scan_state->NewRelminMxid -
vacrel->cutoffs.relminmxid);
appendStringInfo(&buf,
_("new relminmxid: %u, which is %d MXIDs ahead of previous value\n"),
- vacrel->NewRelminMxid, diff);
+ vacrel->scan_state->NewRelminMxid, diff);
}
appendStringInfo(&buf, _("frozen: %u pages from table (%.2f%% of total) had %lld tuples frozen\n"),
- vacrel->new_frozen_tuple_pages,
+ vacrel->scan_state->new_frozen_tuple_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->new_frozen_tuple_pages /
+ 100.0 * vacrel->scan_state->new_frozen_tuple_pages /
orig_rel_pages,
- (long long) vacrel->tuples_frozen);
+ (long long) vacrel->scan_state->tuples_frozen);
appendStringInfo(&buf,
_("visibility map: %u pages set all-visible, %u pages set all-frozen (%u were all-visible)\n"),
- vacrel->vm_new_visible_pages,
- vacrel->vm_new_visible_frozen_pages +
- vacrel->vm_new_frozen_pages,
- vacrel->vm_new_frozen_pages);
+ vacrel->scan_state->vm_new_visible_pages,
+ vacrel->scan_state->vm_new_visible_frozen_pages +
+ vacrel->scan_state->vm_new_frozen_pages,
+ vacrel->scan_state->vm_new_frozen_pages);
if (vacrel->do_index_vacuuming)
{
if (vacrel->nindexes == 0 || vacrel->num_index_scans == 0)
@@ -746,10 +757,10 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
msgfmt = _("%u pages from table (%.2f%% of total) have %lld dead item identifiers\n");
}
appendStringInfo(&buf, msgfmt,
- vacrel->lpdead_item_pages,
+ vacrel->scan_state->lpdead_item_pages,
orig_rel_pages == 0 ? 100.0 :
- 100.0 * vacrel->lpdead_item_pages / orig_rel_pages,
- (long long) vacrel->lpdead_items);
+ 100.0 * vacrel->scan_state->lpdead_item_pages / orig_rel_pages,
+ (long long) vacrel->scan_state->lpdead_items);
for (int i = 0; i < vacrel->nindexes; i++)
{
IndexBulkDeleteResult *istat = vacrel->indstats[i];
@@ -882,7 +893,7 @@ lazy_scan_heap(LVRelState *vacrel)
bool has_lpdead_items;
bool got_cleanup_lock = false;
- vacrel->scanned_pages++;
+ vacrel->scan_state->scanned_pages++;
/* Report as block scanned, update error traceback information */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
@@ -900,7 +911,7 @@ lazy_scan_heap(LVRelState *vacrel)
* one-pass strategy, and the two-pass strategy with the index_cleanup
* param set to 'off'.
*/
- if (vacrel->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
+ if (vacrel->scan_state->scanned_pages % FAILSAFE_EVERY_PAGES == 0)
lazy_check_wraparound_failsafe(vacrel);
/*
@@ -1064,16 +1075,16 @@ lazy_scan_heap(LVRelState *vacrel)
/* now we can compute the new value for pg_class.reltuples */
vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, rel_pages,
- vacrel->scanned_pages,
- vacrel->live_tuples);
+ vacrel->scan_state->scanned_pages,
+ vacrel->scan_state->live_tuples);
/*
* Also compute the total number of surviving heap entries. In the
* (unlikely) scenario that new_live_tuples is -1, take it as zero.
*/
vacrel->new_rel_tuples =
- Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
- vacrel->missed_dead_tuples;
+ Max(vacrel->new_live_tuples, 0) + vacrel->scan_state->recently_dead_tuples +
+ vacrel->scan_state->missed_dead_tuples;
/*
* Do index vacuuming (call each index's ambulkdelete routine), then do
@@ -1110,10 +1121,10 @@ lazy_scan_heap(LVRelState *vacrel)
* there are no further blocks to process.
*
* vacrel is an in/out parameter here. Vacuum options and information about
- * the relation are read. vacrel->skippedallvis is set if we skip a block
- * that's all-visible but not all-frozen, to ensure that we don't update
- * relfrozenxid in that case. vacrel also holds information about the next
- * unskippable block, as bookkeeping for this function.
+ * the relation are read. vacrel->scan_state->skippedallvis is set if we skip
+ * a block that's all-visible but not all-frozen, to ensure that we don't
+ * update relfrozenxid in that case. vacrel also holds information about the
+ * next unskippable block, as bookkeeping for this function.
*/
static bool
heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
@@ -1170,7 +1181,7 @@ heap_vac_scan_next_block(LVRelState *vacrel, BlockNumber *blkno,
{
next_block = vacrel->next_unskippable_block;
if (skipsallvis)
- vacrel->skippedallvis = true;
+ vacrel->scan_state->skippedallvis = true;
}
}
@@ -1414,11 +1425,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
freespace = PageGetHeapFreeSpace(page);
@@ -1488,10 +1499,11 @@ lazy_scan_prune(LVRelState *vacrel,
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
&vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
&vacrel->offnum,
- &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+ &vacrel->scan_state->NewRelfrozenXid,
+ &vacrel->scan_state->NewRelminMxid);
- Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
- Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+ Assert(MultiXactIdIsValid(vacrel->scan_state->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->scan_state->NewRelfrozenXid));
if (presult.nfrozen > 0)
{
@@ -1501,7 +1513,7 @@ lazy_scan_prune(LVRelState *vacrel,
* frozen tuples (don't confuse that with pages newly set all-frozen
* in VM).
*/
- vacrel->new_frozen_tuple_pages++;
+ vacrel->scan_state->new_frozen_tuple_pages++;
}
/*
@@ -1536,7 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if (presult.lpdead_items > 0)
{
- vacrel->lpdead_item_pages++;
+ vacrel->scan_state->lpdead_item_pages++;
/*
* deadoffsets are collected incrementally in
@@ -1551,15 +1563,15 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += presult.live_tuples;
- vacrel->recently_dead_tuples += presult.recently_dead_tuples;
+ vacrel->scan_state->tuples_deleted += presult.ndeleted;
+ vacrel->scan_state->tuples_frozen += presult.nfrozen;
+ vacrel->scan_state->lpdead_items += presult.lpdead_items;
+ vacrel->scan_state->live_tuples += presult.live_tuples;
+ vacrel->scan_state->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_state->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
@@ -1608,13 +1620,13 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
if (presult.all_frozen)
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
presult.all_frozen)
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
/*
@@ -1700,8 +1712,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
/*
@@ -1709,7 +1721,7 @@ lazy_scan_prune(LVRelState *vacrel,
* above, so we don't need to test the value of old_vmbits.
*/
else
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
}
@@ -1748,8 +1760,8 @@ lazy_scan_noprune(LVRelState *vacrel,
missed_dead_tuples;
bool hastup;
HeapTupleHeader tupleheader;
- TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ TransactionId NoFreezePageRelfrozenXid = vacrel->scan_state->NewRelfrozenXid;
+ MultiXactId NoFreezePageRelminMxid = vacrel->scan_state->NewRelminMxid;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1876,8 +1888,8 @@ lazy_scan_noprune(LVRelState *vacrel,
* this particular page until the next VACUUM. Remember its details now.
* (lazy_scan_prune expects a clean slate, so we have to do this last.)
*/
- vacrel->NewRelfrozenXid = NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = NoFreezePageRelminMxid;
+ vacrel->scan_state->NewRelfrozenXid = NoFreezePageRelfrozenXid;
+ vacrel->scan_state->NewRelminMxid = NoFreezePageRelminMxid;
/* Save any LP_DEAD items found on the page in dead_items */
if (vacrel->nindexes == 0)
@@ -1904,25 +1916,25 @@ lazy_scan_noprune(LVRelState *vacrel,
* indexes will be deleted during index vacuuming (and then marked
* LP_UNUSED in the heap)
*/
- vacrel->lpdead_item_pages++;
+ vacrel->scan_state->lpdead_item_pages++;
dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
- vacrel->lpdead_items += lpdead_items;
+ vacrel->scan_state->lpdead_items += lpdead_items;
}
/*
* Finally, add relevant page-local counts to whole-VACUUM counts
*/
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
- vacrel->missed_dead_tuples += missed_dead_tuples;
+ vacrel->scan_state->live_tuples += live_tuples;
+ vacrel->scan_state->recently_dead_tuples += recently_dead_tuples;
+ vacrel->scan_state->missed_dead_tuples += missed_dead_tuples;
if (missed_dead_tuples > 0)
- vacrel->missed_dead_pages++;
+ vacrel->scan_state->missed_dead_pages++;
/* Can't truncate this page */
if (hastup)
- vacrel->nonempty_pages = blkno + 1;
+ vacrel->scan_state->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
@@ -1951,7 +1963,7 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(vacrel->lpdead_item_pages > 0);
+ Assert(vacrel->scan_state->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
{
@@ -1985,7 +1997,7 @@ lazy_vacuum(LVRelState *vacrel)
BlockNumber threshold;
Assert(vacrel->num_index_scans == 0);
- Assert(vacrel->lpdead_items == vacrel->dead_items_info->num_items);
+ Assert(vacrel->scan_state->lpdead_items == vacrel->dead_items_info->num_items);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2012,7 +2024,7 @@ lazy_vacuum(LVRelState *vacrel)
* cases then this may need to be reconsidered.
*/
threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
- bypass = (vacrel->lpdead_item_pages < threshold &&
+ bypass = (vacrel->scan_state->lpdead_item_pages < threshold &&
(TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L)));
}
@@ -2150,7 +2162,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
* place).
*/
Assert(vacrel->num_index_scans > 0 ||
- vacrel->dead_items_info->num_items == vacrel->lpdead_items);
+ vacrel->dead_items_info->num_items == vacrel->scan_state->lpdead_items);
Assert(allindexes || VacuumFailsafeActive);
/*
@@ -2259,8 +2271,8 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
* the second heap pass. No more, no less.
*/
Assert(vacrel->num_index_scans > 1 ||
- (vacrel->dead_items_info->num_items == vacrel->lpdead_items &&
- vacuumed_pages == vacrel->lpdead_item_pages));
+ (vacrel->dead_items_info->num_items == vacrel->scan_state->lpdead_items &&
+ vacuumed_pages == vacrel->scan_state->lpdead_item_pages));
ereport(DEBUG2,
(errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
@@ -2376,14 +2388,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
*/
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
- vacrel->vm_new_visible_pages++;
+ vacrel->scan_state->vm_new_visible_pages++;
if (all_frozen)
- vacrel->vm_new_visible_frozen_pages++;
+ vacrel->scan_state->vm_new_visible_frozen_pages++;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
all_frozen)
- vacrel->vm_new_frozen_pages++;
+ vacrel->scan_state->vm_new_frozen_pages++;
}
/* Revert to the previous phase information for error traceback */
@@ -2459,7 +2471,7 @@ static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
double reltuples = vacrel->new_rel_tuples;
- bool estimated_count = vacrel->scanned_pages < vacrel->rel_pages;
+ bool estimated_count = vacrel->scan_state->scanned_pages < vacrel->rel_pages;
const int progress_start_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_INDEXES_TOTAL
@@ -2640,7 +2652,7 @@ should_attempt_truncation(LVRelState *vacrel)
if (!vacrel->do_rel_truncate || VacuumFailsafeActive)
return false;
- possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
+ possibly_freeable = vacrel->rel_pages - vacrel->scan_state->nonempty_pages;
if (possibly_freeable > 0 &&
(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
@@ -2666,7 +2678,7 @@ lazy_truncate_heap(LVRelState *vacrel)
/* Update error traceback information one last time */
update_vacuum_error_info(vacrel, NULL, VACUUM_ERRCB_PHASE_TRUNCATE,
- vacrel->nonempty_pages, InvalidOffsetNumber);
+ vacrel->scan_state->nonempty_pages, InvalidOffsetNumber);
/*
* Loop until no more truncating can be done.
@@ -2767,7 +2779,7 @@ lazy_truncate_heap(LVRelState *vacrel)
* without also touching reltuples, since the tuple count wasn't
* changed by the truncation.
*/
- vacrel->removed_pages += orig_rel_pages - new_rel_pages;
+ vacrel->scan_state->removed_pages += orig_rel_pages - new_rel_pages;
vacrel->rel_pages = new_rel_pages;
ereport(vacrel->verbose ? INFO : DEBUG2,
@@ -2775,7 +2787,7 @@ lazy_truncate_heap(LVRelState *vacrel)
vacrel->relname,
orig_rel_pages, new_rel_pages)));
orig_rel_pages = new_rel_pages;
- } while (new_rel_pages > vacrel->nonempty_pages && lock_waiter_detected);
+ } while (new_rel_pages > vacrel->scan_state->nonempty_pages && lock_waiter_detected);
}
/*
@@ -2803,7 +2815,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
StaticAssertStmt((PREFETCH_SIZE & (PREFETCH_SIZE - 1)) == 0,
"prefetch size must be power of 2");
prefetchedUntil = InvalidBlockNumber;
- while (blkno > vacrel->nonempty_pages)
+ while (blkno > vacrel->scan_state->nonempty_pages)
{
Buffer buf;
Page page;
@@ -2915,7 +2927,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
* pages still are; we need not bother to look at the last known-nonempty
* page.
*/
- return vacrel->nonempty_pages;
+ return vacrel->scan_state->nonempty_pages;
}
/*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f84..80202d4a824 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1479,6 +1479,7 @@ LPVOID
LPWSTR
LSEG
LUID
+LVRelScanState
LVRelState
LVSavedErrInfo
LWLock
--
2.43.5
On Fri, Jan 3, 2025 at 3:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Dec 25, 2024 at 8:52 AM Tomas Vondra <tomas@vondra.me> wrote:
On 12/19/24 23:05, Masahiko Sawada wrote:
On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <tomas@vondra.me> wrote:
On 12/13/24 00:04, Tomas Vondra wrote:
...
The main difference is here:
master / no parallel workers:
pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total)
1 parallel worker:
pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total)
Clearly, with parallel vacuum we scan only a tiny fraction of the pages,
essentially just those with deleted tuples, which is ~1/20 of pages.
That's close to the 15x speedup.This effect is clearest without indexes, but it does affect even runs
with indexes - having to scan the indexes makes it much less pronounced,
though. However, these indexes are pretty massive (about the same size
as the table) - multiple times larger than the table. Chances are it'd
be clearer on realistic data sets.So the question is - is this correct? And if yes, why doesn't the
regular (serial) vacuum do that?There's some more strange things, though. For example, how come the avg
read rate is 0.000 MB/s?avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s
It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's
not 0.000 MB/s? I guess it's calculated from buffer misses, and all the
pages are in shared buffers (thanks to the DELETE earlier in that session).OK, after looking into this a bit more I think the reason is rather
simple - SKIP_PAGES_THRESHOLD.With serial runs, we end up scanning all pages, because even with an
update every 5000 tuples, that's still only ~25 pages apart, well within
the 32-page window. So we end up skipping no pages, scan and vacuum all
everything.But parallel runs have this skipping logic disabled, or rather the logic
that switches to sequential scans if the gap is less than 32 pages.IMHO this raises two questions:
1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to
sequential scans is the pages are close enough. Maybe there is a reason
for this difference? Workers can reduce the difference between random
and sequential I/0, similarly to prefetching. But that just means the
workers should use a lower threshold, e.g. asSKIP_PAGES_THRESHOLD / nworkers
or something like that? I don't see this discussed in this thread.
Each parallel heap scan worker allocates a chunk of blocks which is
8192 blocks at maximum, so we would need to use the
SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we
need to evaluate the differences anyway. WIll do the benchmark test
and share the results.Right. I don't think this really matters for small tables, and for large
tables the chunks should be fairly large (possibly up to 8192 blocks),
in which case we could apply SKIP_PAGE_THRESHOLD just like in the serial
case. There might be differences at boundaries between chunks, but that
seems like a minor / expected detail. I haven't checked know if the code
would need to change / how much.2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good
storage. If I can get an order of magnitude improvement (or more than
that) by disabling the threshold, and just doing random I/O, maybe
there's time to adjust it a bit.Yeah, you've started a thread for this so let's discuss it there.
OK. FWIW as suggested in the other thread, it doesn't seem to be merely
a question of VACUUM performance, as not skipping pages gives vacuum the
opportunity to do cleanup that would otherwise need to happen later.If only for this reason, I think it would be good to keep the serial and
parallel vacuum consistent.I've not evaluated SKIP_PAGE_THRESHOLD optimization yet but I'd like
to share the latest patch set as cfbot reports some failures. Comments
from Kuroda-san are also incorporated in this version. Also, I'd like
to share the performance test results I did with the latest patch.
I've implemented SKIP_PAGE_THRESHOLD optimization in parallel heap
scan, and attached the updated patch set. I've attached the
performance test results too to compare v6 and v7 patch sets. I can
see there are not big differences in test cases but the v7 patch has a
slightly better performance.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachments:
parallel_heap_vacuum_v6_v7.pdfapplication/pdf; name=parallel_heap_vacuum_v6_v7.pdfDownload
%PDF-1.4
% ����
3
0
obj
<<
/Type
/Catalog
/Names
<<
>>
/PageLabels
<<
/Nums
[
0
<<
/S
/D
/St
1
>>
]
>>
/Outlines
2
0
R
/Pages
1
0
R
>>
endobj
4
0
obj
<<
/Creator
(�� G o o g l e S h e e t s)
/Title
(��q!�L0n0�0�0�0�0�0�0�0�)
>>
endobj
5
0
obj
<<
/Type
/Page
/Parent
1
0
R
/MediaBox
[
0
0
842
595
]
/Contents
6
0
R
/Resources
7
0
R
/Annots
9
0
R
/Group
<<
/S
/Transparency
/CS
/DeviceRGB
>>
>>
endobj
6
0
obj
<<
/Filter
/FlateDecode
/Length
8
0
R
>>
stream
x���]�$����_�~��������e����`���m��F������9�$�X���3�{����Efk��Z���I���
~��~�>_��K�#�p��������|�����������5NH���������t���a^��2���x�{�����4-@���i��3#���������f���w�F6c��gX����+���k����s��+�����R���<:)v����B��_��UX�����s��!.�zyL�����3l�5d�����fr�N�~�I��k$9��5��z��%1p�o��2��bj�3���r���L�K��i�����������a�������^r2bc�Ul�����@H�����#�?r^FlN�/��q
��/�����o�����b@�G��=���s�9�9gW��e����h@���t�?��={�re:�\@����uq�s;�+wyp�5�4Fop&���\@������S6�S�8���u3@�����Xyd��L������@�mX&� a#p�C�����<�e������b��/���u^�E��f��0~�'��u�!&"W9N�_������!M�� ab�*�SZ���\�m<� A��z�*��p��0����<�!���)�e8��p�9���[i3�NCZ.�a�q.��1�L��sNF�1s������s'��K������s2*,�1���9�is���s��8..��1
��C���2*�e���w6�W]��9D{1�!H�7�N<���*)�����b1C5�3���u��4�����0�!@Lf�6r���[�@����������EMk�NI�v#,�{���5���s�xe=b���`�_�e�W8�z��~�^,"������^�9��q"GE�����5�S�Q�]���:�@��ED&�\���T 6��Fr��!BX���uN�"G]M�c�rJ?m��H��8�tc#�)����\����6D��
'�
m�I�Fl+���K�K��<�h��K��
z�����p�GQ��b��IU����e��nz�D~��q��d. Y������\��jQ�X����/-9z�QB-�� �aB2�,��
���8����d. ?����Bh��
j0P����k<���1�[��[m'7L�A`P ,�p�����bqu�aB-
���k�P{���.��1�dD�)���\�8�p�j�X'c0"��@�X���tg�m6'�������8 a!q��T��1Q/?���4`OP���@r2����]U�����hh�
���hFl�l�W�Zli�V!�<�������� "��h_!d. �t�����UE�:!s�w�kXD�VMu��$�w�z58����d. ?����X�!<CEg3�w��a]�V�5,">��������o
��U��5,
�����P���5�D��8�:z�6"�9N7<�(l�"���hr����N�����-5Ng@XH\�D�����W��������K��������P,[<Um��A�2bkw�RD�T��+��$���RD�T�����,�b�3��8P(�A�� �H��2W���*5�3D��S��a�Ja�J��;���l�jA�^<56HaO �n����Ohh(���@XX������;N�����6�ZC�~�W���16Um�W�\�u��3��vru�0*$�0�����C�Ev��)��T&Gd��!90JX�]�-�Fs]�p����5���%'#6u�� c��ZAB/9����z �����@�����e�����=*l}m���d���1���UB�o��{�E��&5!�Hm�w���l����q�� ������^��v�00s}��������/1�����������`��������Ww������I��i}|A�~�[��R68��@XX��+5���)�*�� ��������26������L�l(y�R��9?���0@�O�<�d��L���(��5��o��9'���}k`L����.9/���]k���:�����;7��E���}�$s����eG�WYN������C�#,��^�D�\�����Y��2�jb�*�c���l,��Y�&�U��S
\�P�p6��Q�T��c.\��Z�hg�� q"�D�Wq�#�d.�G{�:�����sr��k�Neh�o��F�\@��%�^��dcM���3����]���Hr~hc
�!�C�O�^r2bc7�b��J��
v���mY8�/�J@��4���u������
��b�^���2�-"2�mX@@��*�}���������b#�Jq��d. �s^
��q�� �X��{f��^��3�w+����U���C�W)NO,pmC������ "�
qr����Pc��'rQD~�>� s?�'��
�*��'W��:U���>�\��m,���������Cj�F�c�$z<��c�5��������O��l C�������9��[�Z���J��
��?^#�u�I��- T���H^F��{w4��=���}}���l��>�v* ��E��}����d��I#���@�9UJ2�,���F-��t�0!���z���b�~���;���xSO�~:�.�������[SP�T���.\������Fj�
sd�����V���1���3: �V'�\���tknNT%W%�W8Qs\�y7�y%����E� �!$'#����
[�!���#9�������0���E�2bkw���=�i���>�\@�x��XL!>�r��A���[N� ,�U9Se$s�����jp&_��\@~|� y�Cx�v&���d{�m|T��������o���;7>�E�g����x���WU����Y���-�\����:����Y�3���5^��s�>���9 a!q�5���y�?=���:�#S���c�<�����$��sNFl���{~V�e��1�Fl���!��YP �?�e���`C��d. Y�=2��:9�����l���1�aZ� �H�`p&g��\��u
�t�V-
���k<�����
����xOgi��@�\�P�}�fAW'`P ,,\������(N���(���\�8U����)t�V�W��:]��{%|�c�� ���UM���YBd ���o������g!*C4_����r~,F��Dehf?����
c�3I��r�
!Y���J����uo��=�d���6���l��k%B�D`V���$�X�s��U3S�r���{��ap&��r�O��K�hXBD�Z&]$�X�fe�f%>� ��ez\�kj=��3h�s����;��B�����j�X��,����{���0�p�"�q����u`�w�&,j<�$�D��&j��$�H�!���>�V�h��@%'��d��/�d ��'G.9��������� !���������r���a T#��H^Fl��}9jM�?� ��<�Z�l��}9jQ�?,
� �\@g��/G�%��] �$s�>w�~58�w��d. {���r�b�N3�x+���m9jR�>L
���u��V���{��aP�>
�����P�|hW����a!r��Tz���r��DX+�x���R��vDz) ���KT�����s>e�P/<�5�����f:a�9����j�����7��9����:����R��U��c�5bKwm��-
���}M$/#�v����P�����������Uw�*!s���o�w�������$�H�]xU�3y LH�r���Q��z L�\�y�<�{���9�~U����7���}�r� B�
pr����Pc��1�D~���:�d.��:�:�(��_e8���5^�J���1����B�k��=��+�.��"4$��d4���:�w��0 �bL�4�z$'#6v�Z$�a9�����WF�2bk7���C�Vq����$�7'����*e_d. ��or]
"8�R��H����c5,��m�0!���/l"�8zt}��ZDp�P&���,:�]+�j�Y�2=��5��w���DxV�L�p�j�n�Q�����}|A�~�S��G��*��5W��:]G���{$W'�B�k��=>��������UT������e��Z{�1����9����F�%�{3}���0v��+l�x��W&�eTX�sf_�a<P ��#���<�1*���U��,
�w��Z���
���3�-�;��3��@��3��
�l�Q{���@50s}����{��Z�����Z�
�Z�����QA%4���U^�A>:H�$F�>�H�x��(�TD�k�\��Z�C�s`PG��!P-$�q������Y���1"##f��@'���1�0*@�oH�M 9���nhPc��^A#v������5F��
*�}e$/#�v�>j��Jw_!d. ��s�W
bTP�����,����1*�t�:I���;������$s��p� ���=@����x�����1*�r�������{�P�T���.\�����c�������Gd.��z�:�8�c�
wr����N�����`X�A!� �D�'j�;w�����M�?9�Ar2�����$F����������u`I4�U�����S��=�g:8�\@�xs�GM!2��Y���dS�jXDd�3�jB2�,�;��gr6@������4���Y�H"!s����5���L'f�@���Z��g���CH�����pmj��Fh�DH��j#r��T���������r�\�u��*w�����&�5QK|d�N���+(v���m����L'����S����g�rNFd���:j*��*H?�5^#6u�V���j��E�2bswm�Qk��*�}u���l/k������E*�}�����b}���#��.�
H�R}����8�{���b����WG-�e��u����s���D�WmLO)p�C����WG
"��2&�\��5�����A�}U�>� s��p�^������|s��SY�{u���> ��`MH�������C�i4�w�rNF��S���D`
������KX��Kg`�=�e��n|���N��5�����{7>������E�����������tV-
��U�H���8�����br�;���.��1�U3Wx+O�>�{�tV
�n�u�[9Y\�,6��@�R��w���R�q"/E0�s�j!r��T����b�|��`La�}�j�����*e�z�����T5U���J���$lFP��I�,��y���C��gaA�����,�RD}V�%�L�Z�����_ �t������A��V����Ob�b�jBr���y�FK5�������������T�t6Vm�k���������K�jr&�h&r��]��e��H3PWh&����Xy���?��:F��X3�Z�Z���khq��A&34KQ���N���,�6�[5��u�*.{��)Df:�
�f#�.q��w�[|f��4z��(�A��n�,�����
�=y�\@�wC h���~3����c
�
�j����m���m�����q�T3�_����K�4T�U
�*��n���l��m�fV��*o�������2`fg�G�0��%.��=of4pU#sQ�y�=��"6p?���Z��d(��}��@oV5�� g/^��/�8��fu�(()��X�=N5/��w����)*��i����TN�~TV����y�N*j��u>��s8����Qt�@�t�8��_7�
�
���e�T�P>