doc review for parallel vacuum

Started by Justin Pryzbyalmost 6 years ago14 messages

pryzby@telsasoft.com

almost 6 years ago

2 attachment(s)

Original, long thread
/messages/by-id/CAA4eK1+nw1FBK3_sDnW+7kB+x4qbDJqetgqwYW8k2xv82RZ+Kw@mail.gmail.com

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ab1b8c2398..140637983a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -237,15 +237,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -253,7 +253,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and terminate at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -372,7 +372,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e017db4446..0ab0bea312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -194,7 +194,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -757,7 +757,7 @@ skip_blocks(Relation onerel, VacuumParams *params, BlockNumber *next_unskippable
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -836,7 +836,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;

 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
+	 * Initialize state for a parallel vacuum.  As of now, only one worker
 	 * can be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
@@ -864,7 +864,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}

 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2111,7 +2111,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);

 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2223,7 +2223,7 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;

 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
+		 * Now that stats[idx] points to the DSM segment, we don't need
 		 * the locally allocated results.
 		 */
 		pfree(*stats);
@@ -2329,7 +2329,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -2916,9 +2916,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -2937,7 +2937,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;

 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3010,7 +3010,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 }

 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3181,7 +3181,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3288,7 +3288,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index df06e7d174..ac348b312c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -493,7 +493,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)

 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..76d33b1ba2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2034,23 +2034,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold, which is calculated based on the number of active
  * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams

 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
2.17.0

Attachments:

v1-0001-docs-review-for-parallel-vacuum.patchtext/x-diff; charset=us-asciiDownload

From aec387f1c5e405d504ade077a20db7b6ff6c3835 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 19 Jan 2020 22:35:01 -0600
Subject: [PATCH v1 1/2] docs review for parallel vacuum

---
 doc/src/sgml/ref/vacuum.sgml | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ab1b8c2398..140637983a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -237,15 +237,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -253,7 +253,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and terminate at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -372,7 +372,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
-- 
2.17.0

v1-0002-Review-comments-for-parallel-vacuum.patchtext/x-diff; charset=us-asciiDownload

From 6ee0aca17afdd365a4b3d51a9c0ad152761b7576 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Mon, 20 Jan 2020 00:18:44 -0600
Subject: [PATCH v1 2/2] Review comments for parallel vacuum

---
 src/backend/access/heap/vacuumlazy.c  | 26 +++++++++++++-------------
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/commands/vacuum.c         | 18 +++++++++---------
 src/include/commands/vacuum.h         |  2 +-
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e017db4446..0ab0bea312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -194,7 +194,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -757,7 +757,7 @@ skip_blocks(Relation onerel, VacuumParams *params, BlockNumber *next_unskippable
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -836,7 +836,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
+	 * Initialize state for a parallel vacuum.  As of now, only one worker
 	 * can be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
@@ -864,7 +864,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}
 
 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2111,7 +2111,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);
 
 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2223,7 +2223,7 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;
 
 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
+		 * Now that stats[idx] points to the DSM segment, we don't need
 		 * the locally allocated results.
 		 */
 		pfree(*stats);
@@ -2329,7 +2329,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -2916,9 +2916,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -2937,7 +2937,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;
 
 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3010,7 +3010,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3181,7 +3181,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3288,7 +3288,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index df06e7d174..ac348b312c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -493,7 +493,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..76d33b1ba2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2034,23 +2034,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold, which is calculated based on the number of active
  * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
2.17.0

Amit Kapila

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#1)

Re: doc review for parallel vacuum

On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Original, long thread
/messages/by-id/CAA4eK1+nw1FBK3_sDnW+7kB+x4qbDJqetgqwYW8k2xv82RZ+Kw@mail.gmail.com

I have comments/questions on the patches:
doc changes
-------------------------
1.
      <para>
-      Perform vacuum index and cleanup index phases of
<command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of
<command>VACUUM</command>

Why is the proposed text improvement over what is already there?

2.
If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref
linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref
linkend="guc-min-parallel-index-scan-size"/>.

Here, it is not clear to me why the proposed text is better than what
we already have?

3.
..
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and
terminate at the end of

It is better to use 'exit' instead of 'ternimate' as we are not
forcing the workers to end rather we wait for them to exit.

Patch changes
-------------------------
1.
- * and index cleanup with parallel workers unless the parallel vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum is

As per my understanding 'parallel vacuum' is a noun phrase, so we
should have a determiner before it.

2.
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the

Similar to above, I think here also we should have a determiner before
'parallel mode'.

3.
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.
We achieve this

I am not sure if it is an improvement to use 'to force' instead of 'to
allow' because there is another criteria as well which decides whether
the worker will sleep or not. I am also not sure if the second change
(share of work it's) in this sentence is a clear improvement.

4.
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and

I don't see why it is necessary to make this part bold? We are using
it at one other place in the code in the way it is used here.

5.
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold

Hmm, again I am not sure if we should use 'force' here instead of 'allow'.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#2)

1 attachment(s)

Re: doc review for parallel vacuum

On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Original, long thread
/messages/by-id/CAA4eK1+nw1FBK3_sDnW+7kB+x4qbDJqetgqwYW8k2xv82RZ+Kw@mail.gmail.com
I have comments/questions on the patches:
doc changes
-------------------------
1.
<para>
-      Perform vacuum index and cleanup index phases of
<command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of
<command>VACUUM</command>
Why is the proposed text improvement over what is already there?

I have kept the existing text as it is for this case.

2.
If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref
linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
of the index is more than <xref
linkend="guc-min-parallel-index-scan-size"/>.

Here, it is not clear to me why the proposed text is better than what
we already have?

Changed as per your suggestion.

3.
..
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and
terminate at the end of
It is better to use 'exit' instead of 'ternimate' as we are not
forcing the workers to end rather we wait for them to exit.

I have used 'exit' instead of 'terminate' here.

Patch changes
-------------------------
1.
- * and index cleanup with parallel workers unless the parallel vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum is
As per my understanding 'parallel vacuum' is a noun phrase, so we
should have a determiner before it.
2.
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
Similar to above, I think here also we should have a determiner before
'parallel mode'.

Changed as per your suggestion.

3.
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.
We achieve this
I am not sure if it is an improvement to use 'to force' instead of 'to
allow' because there is another criteria as well which decides whether
the worker will sleep or not. I am also not sure if the second change
(share of work it's) in this sentence is a clear improvement.

I have used 'to allow' in above text, otherwise, acceptted your suggestions.

4.
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
I don't see why it is necessary to make this part bold? We are using
it at one other place in the code in the way it is used here.

Kept the existing text as it is.

5.
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
* certain threshold
Hmm, again I am not sure if we should use 'force' here instead of 'allow'.

Kept the usage of 'allow'.

I have combined both of your patches. Let me know if you have any
more suggestions, otherwise, I am planning to push this tomorrow.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchapplication/octet-stream; name=0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchDownload

From 8e1c1306e97608aaa9b677c90be6a6c6f36ed457 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Tue, 7 Apr 2020 09:48:28 +0530
Subject: [PATCH] Comments and doc fixes for commit 40d964ec99.

Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
---
 doc/src/sgml/ref/vacuum.sgml          | 16 ++++++++--------
 src/backend/access/heap/vacuumlazy.c  | 30 +++++++++++++++---------------
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/commands/vacuum.c         | 20 ++++++++++----------
 src/include/commands/vacuum.h         |  2 +-
 5 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 846056a..b7e0a8a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -234,13 +234,13 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
      <para>
       Perform vacuum index and cleanup index phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -248,7 +248,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and exit at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -367,7 +367,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d3..a757560 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -208,7 +208,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -732,7 +732,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -809,8 +809,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
-	 * can be used for an index, so we invoke parallelism only if there are at
+	 * Initialize state for a parallel vacuum.  As of now, only one worker can
+	 * be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
 	if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1)
@@ -837,7 +837,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}
 
 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2215,7 +2215,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);
 
 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2328,8 +2328,8 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;
 
 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
-		 * the locally allocated results.
+		 * Now that stats[idx] points to the DSM segment, we don't need the
+		 * locally allocated results.
 		 */
 		pfree(*stats);
 		*stats = bulkdelete_res;
@@ -2449,7 +2449,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -3050,9 +3050,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -3071,7 +3071,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;
 
 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3144,7 +3144,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3345,7 +3345,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3452,7 +3452,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 29057f3..14a8690 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -505,7 +505,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3a89f8f..495ac23 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2036,23 +2036,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to allow each
+ * worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
  * We allow each worker to update it as and when it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
- * certain threshold, which is calculated based on the number of active
- * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * We allow a worker to sleep only if it has performed I/O above a certain
+ * threshold, which is calculated based on the number of active workers
+ * (VacuumActiveNWorkers), and the overall cost balance is more than
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea..a4cd721 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
1.8.3.1

Justin Pryzby

pryzby@telsasoft.com

almost 6 years ago

In reply to: Amit Kapila (#3)

2 attachment(s)

Re: doc review for parallel vacuum

On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:

On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Original, long thread
/messages/by-id/CAA4eK1+nw1FBK3_sDnW+7kB+x4qbDJqetgqwYW8k2xv82RZ+Kw@mail.gmail.com
I have comments/questions on the patches:
doc changes
-------------------------
1.
<para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
Why is the proposed text improvement over what is already there?
I have kept the existing text as it is for this case.

Probably it should say "index vacuum and index cleanup", which is also what the
comment in vacuumlazy.c says.

2.
If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref
linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
of the index is more than <xref
linkend="guc-min-parallel-index-scan-size"/>.

Here, it is not clear to me why the proposed text is better than what
we already have?

Changed as per your suggestion.

To answer your question:

"VACUUM decides" doesn't sound consistent with docs.

"based on {+the+} number"
=> here, you're veritably missing an article...

"relation which" technically means that the *relation* is "is further limited
by"...

Patch changes
-------------------------
1.
- * and index cleanup with parallel workers unless the parallel vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum is
As per my understanding 'parallel vacuum' is a noun phrase, so we
should have a determiner before it.
Changed as per your suggestion.

Thanks (I think you meant an "article").

- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
I don't see why it is necessary to make this part bold? We are using
it at one other place in the code in the way it is used here.
Kept the existing text as it is.

I meant this as a question. I'm not sure what "as and when means" ? "If and
when" ?

I have combined both of your patches. Let me know if you have any
more suggestions, otherwise, I am planning to push this tomorrow.

In the meantime, I found some more issues, so I rebased on top of your patch so
you can review it.

--
Justin

Attachments:

v3-0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchtext/x-diff; charset=us-asciiDownload

From 3551aba7445b46d60d2199d068c0e06a5828c5c3 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Tue, 7 Apr 2020 09:48:28 +0530
Subject: [PATCH v3 1/2] Comments and doc fixes for commit 40d964ec99.

Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
---
 doc/src/sgml/ref/vacuum.sgml          | 16 +++++++-------
 src/backend/access/heap/vacuumlazy.c  | 30 +++++++++++++--------------
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/commands/vacuum.c         | 20 +++++++++---------
 src/include/commands/vacuum.h         |  2 +-
 5 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 846056a353..b7e0a8af6b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -234,13 +234,13 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
      <para>
       Perform vacuum index and cleanup index phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -248,7 +248,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and exit at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -367,7 +367,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d37a4..a757560996 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -208,7 +208,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -732,7 +732,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -809,8 +809,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
-	 * can be used for an index, so we invoke parallelism only if there are at
+	 * Initialize state for a parallel vacuum.  As of now, only one worker can
+	 * be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
 	if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1)
@@ -837,7 +837,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}
 
 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2215,7 +2215,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);
 
 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2328,8 +2328,8 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;
 
 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
-		 * the locally allocated results.
+		 * Now that stats[idx] points to the DSM segment, we don't need the
+		 * locally allocated results.
 		 */
 		pfree(*stats);
 		*stats = bulkdelete_res;
@@ -2449,7 +2449,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -3050,9 +3050,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -3071,7 +3071,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;
 
 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3144,7 +3144,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3345,7 +3345,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3452,7 +3452,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 29057f389e..14a8690019 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -505,7 +505,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3a89f8fe1e..495ac23a26 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2036,23 +2036,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to allow each
+ * worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
  * We allow each worker to update it as and when it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
- * certain threshold, which is calculated based on the number of active
- * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * We allow a worker to sleep only if it has performed I/O above a certain
+ * threshold, which is calculated based on the number of active workers
+ * (VacuumActiveNWorkers), and the overall cost balance is more than
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
2.17.0

v3-0002-docs-comments-review-for-parallel-vacuum.patchtext/x-diff; charset=us-asciiDownload

From 3c6314b72d9323dd1554deadedf1c57fe105ad6b Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 19 Jan 2020 22:35:01 -0600
Subject: [PATCH v3 2/2] docs/comments review for parallel vacuum

---
 doc/src/sgml/ref/vacuum.sgml         | 12 ++++++------
 src/backend/access/heap/vacuumlazy.c | 22 +++++++++++-----------
 src/backend/commands/vacuum.c        |  2 +-
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index b7e0a8af6b..5dc0787459 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -232,9 +232,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform index vacuum and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phase, please
+      background workers (for the details of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
       <literal>PARALLEL</literal> option is omitted, then the number of workers
       is determined based on the number of indexes that support parallel vacuum
@@ -358,16 +358,16 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
    </para>
 
    <para>
-     The <option>PARALLEL</option> option is used only for vacuum purpose.
-     Even if this option is specified with <option>ANALYZE</option> option
-     it does not affect <option>ANALYZE</option>.
+     The <option>PARALLEL</option> option is used only for vacuum operations.
+     If specified along with <option>ANALYZE</option>, the behavior during
+     <literal>ANALYZE</literal> is unchanged.
    </para>
 
    <para>
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportionally to the work done by that
+    parallel vacuum, each worker sleeps in proportion to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a757560996..dfa57df7b6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -232,8 +232,8 @@ typedef struct LVShared
 
 	/*
 	 * Number of active parallel workers.  This is used for computing the
-	 * minimum threshold of the vacuum cost balance for a worker to go for the
-	 * delay.
+	 * minimum threshold of the vacuum cost balance before a worker sleeps
+	 * for cost-based delay.
 	 */
 	pg_atomic_uint32 active_nworkers;
 
@@ -2312,12 +2312,12 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 
 	/*
 	 * Copy the index bulk-deletion result returned from ambulkdelete and
-	 * amvacuumcleanup to the DSM segment if it's the first time to get it
-	 * from them, because they allocate it locally and it's possible that an
-	 * index will be vacuumed by the different vacuum process at the next
-	 * time.  The copying of the result normally happens only after the first
-	 * time of index vacuuming.  From the second time, we pass the result on
-	 * the DSM segment so that they then update it directly.
+	 * amvacuumcleanup to the DSM segment if it's the first time to get a result
+	 * from a worker, because workers allocate BulkDeleteResults locally, and it's possible that an
+	 * index will be vacuumed by a different vacuum process the next
+	 * time.  Copying the result normally happens only the first
+	 * time an index is vacuumed.  For any additional vacuums, we pass the result on
+	 * the DSM segment so that workers then update it directly.
 	 *
 	 * Since all vacuum workers write the bulk-deletion result at different
 	 * slots we can write them without locking.
@@ -3138,7 +3138,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 		if (!can_parallel_vacuum[i])
 			continue;
 
-		/* Set NOT NULL as this index do support parallelism */
+		/* Set NOT NULL as this index does support parallelism */
 		lvshared->bitmap[i >> 3] |= 1 << (i & 0x07);
 	}
 }
@@ -3174,7 +3174,7 @@ update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
 
 /*
  * This function prepares and returns parallel vacuum state if we can launch
- * even one worker.  This function is responsible to enter parallel mode,
+ * even one worker.  This function is responsible for entering parallel mode,
  * create a parallel context, and then initialize the DSM segment.
  */
 static LVParallelState *
@@ -3346,7 +3346,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
  * Destroy the parallel context, and end parallel mode.
  *
  * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM in local memory and then later use that
+ * updated index statistics from DSM into local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
  * context, but that won't be safe (see ExitParallelMode).
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 495ac23a26..351d5215a9 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2040,7 +2040,7 @@ vacuum_delay_point(void)
  * worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
-- 
2.17.0

Amit Kapila

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#4)

Re: doc review for parallel vacuum

On Tue, Apr 7, 2020 at 10:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Original, long thread
/messages/by-id/CAA4eK1+nw1FBK3_sDnW+7kB+x4qbDJqetgqwYW8k2xv82RZ+Kw@mail.gmail.com
I have comments/questions on the patches:
doc changes
-------------------------
1.
<para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
Why is the proposed text improvement over what is already there?
I have kept the existing text as it is for this case.
Probably it should say "index vacuum and index cleanup", which is also what the
comment in vacuumlazy.c says.

Okay, that makes sense.

- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
I don't see why it is necessary to make this part bold? We are using
it at one other place in the code in the way it is used here.
Kept the existing text as it is.
I meant this as a question. I'm not sure what "as and when means" ? "If and
when" ?

It means the "at the time when" worker performed any I/O. This has
been used at some other place in code as well.

I have combined both of your patches. Let me know if you have any
more suggestions, otherwise, I am planning to push this tomorrow.

In the meantime, I found some more issues, so I rebased on top of your patch so
you can review it.

-     The <option>PARALLEL</option> option is used only for vacuum purpose.
-     Even if this option is specified with <option>ANALYZE</option> option
-     it does not affect <option>ANALYZE</option>.
+     The <option>PARALLEL</option> option is used only for vacuum operations.
+     If specified along with <option>ANALYZE</option>, the behavior during
+     <literal>ANALYZE</literal> is unchanged.

I think the proposed text makes the above text unclear especially "the
behavior during ANALYZE is unchanged.". Basically this option has
nothing to do with the behavior of vacuum or analyze. I think we
should be more specific as the current text.

 * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first time to get it
- * from them, because they allocate it locally and it's possible that an
- * index will be vacuumed by the different vacuum process at the next
- * time.  The copying of the result normally happens only after the first
- * time of index vacuuming.  From the second time, we pass the result on
- * the DSM segment so that they then update it directly.
+ * amvacuumcleanup to the DSM segment if it's the first time to get a result
+ * from a worker, because workers allocate BulkDeleteResults locally,
and it's possible that an
+ * index will be vacuumed by a different vacuum process the next
+ * time.

This can be done by the leader backend as well, so we can't use
workers terminology here. Also, I don't see the need to mention
BulkDeleteResults. I will include some changes from this text.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Justin Pryzby (#4)

Re: doc review for parallel vacuum

On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Original, long thread
/messages/by-id/CAA4eK1+nw1FBK3_sDnW+7kB+x4qbDJqetgqwYW8k2xv82RZ+Kw@mail.gmail.com
I have comments/questions on the patches:
doc changes
-------------------------
1.
<para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
Why is the proposed text improvement over what is already there?
I have kept the existing text as it is for this case.
Probably it should say "index vacuum and index cleanup", which is also what the
comment in vacuumlazy.c says.
2.
If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref
linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
of the index is more than <xref
linkend="guc-min-parallel-index-scan-size"/>.
Here, it is not clear to me why the proposed text is better than what
we already have?
Changed as per your suggestion.
To answer your question:

"VACUUM decides" doesn't sound consistent with docs.

"based on {+the+} number"
=> here, you're veritably missing an article...

"relation which" technically means that the *relation* is "is further limited
by"...
Patch changes
-------------------------
1.
- * and index cleanup with parallel workers unless the parallel vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum is
As per my understanding 'parallel vacuum' is a noun phrase, so we
should have a determiner before it.
Changed as per your suggestion.
Thanks (I think you meant an "article").
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
I don't see why it is necessary to make this part bold? We are using
it at one other place in the code in the way it is used here.
Kept the existing text as it is.
I meant this as a question. I'm not sure what "as and when means" ? "If and
when" ?

I have combined both of your patches. Let me know if you have any
more suggestions, otherwise, I am planning to push this tomorrow.

In the meantime, I found some more issues, so I rebased on top of your patch so
you can review it.

I don't have comments on your change other than the comments Amit
already sent. Thank you for reviewing this part!

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Amit Kapila

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Masahiko Sawada (#6)

1 attachment(s)

Re: doc review for parallel vacuum

On Wed, Apr 8, 2020 at 12:49 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:

I don't have comments on your change other than the comments Amit
already sent. Thank you for reviewing this part!

I have made the modifications as per my comments. What do you think
about the attached?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v4-0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchapplication/octet-stream; name=v4-0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchDownload

From d0a2fdc10cb0fedbddb40c1a19bf1b330dfc5264 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 10 Apr 2020 12:50:08 +0530
Subject: [PATCH v4] Comments and doc fixes for commit 40d964ec99.

Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila and Sawada Masahiko
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
---
 doc/src/sgml/ref/vacuum.sgml          | 18 ++++++-------
 src/backend/access/heap/vacuumlazy.c  | 49 ++++++++++++++++++-----------------
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/commands/vacuum.c         | 20 +++++++-------
 src/include/commands/vacuum.h         |  2 +-
 5 files changed, 46 insertions(+), 45 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 846056a..b7b8fa5 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -232,15 +232,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform index vacuum and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the details of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -248,7 +248,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and exit at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -367,7 +367,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps in proportion to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d3..7f9f86a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -208,7 +208,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -232,8 +232,8 @@ typedef struct LVShared
 
 	/*
 	 * Number of active parallel workers.  This is used for computing the
-	 * minimum threshold of the vacuum cost balance for a worker to go for the
-	 * delay.
+	 * minimum threshold of the vacuum cost balance before a worker sleeps for
+	 * cost-based delay.
 	 */
 	pg_atomic_uint32 active_nworkers;
 
@@ -732,7 +732,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -809,8 +809,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
-	 * can be used for an index, so we invoke parallelism only if there are at
+	 * Initialize state for a parallel vacuum.  As of now, only one worker can
+	 * be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
 	if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1)
@@ -837,7 +837,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}
 
 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2215,7 +2215,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);
 
 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2313,11 +2313,12 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 	/*
 	 * Copy the index bulk-deletion result returned from ambulkdelete and
 	 * amvacuumcleanup to the DSM segment if it's the first time to get it
-	 * from them, because they allocate it locally and it's possible that an
+	 * from them because they allocate it locally and it's possible that an
 	 * index will be vacuumed by the different vacuum process at the next
-	 * time.  The copying of the result normally happens only after the first
-	 * time of index vacuuming.  From the second time, we pass the result on
-	 * the DSM segment so that they then update it directly.
+	 * time.  Copying the result normally happens only the first time an index
+	 * is vacuumed.  For any additional vacuum pass, we directly point to the
+	 * result on the DSM segment and pass it to vacuum index APIs so that
+	 * workers can update it directly.
 	 *
 	 * Since all vacuum workers write the bulk-deletion result at different
 	 * slots we can write them without locking.
@@ -2328,8 +2329,8 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;
 
 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
-		 * the locally allocated results.
+		 * Now that stats[idx] points to the DSM segment, we don't need the
+		 * locally allocated results.
 		 */
 		pfree(*stats);
 		*stats = bulkdelete_res;
@@ -2449,7 +2450,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -3050,9 +3051,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -3071,7 +3072,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;
 
 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3138,13 +3139,13 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 		if (!can_parallel_vacuum[i])
 			continue;
 
-		/* Set NOT NULL as this index do support parallelism */
+		/* Set NOT NULL as this index does support parallelism */
 		lvshared->bitmap[i >> 3] |= 1 << (i & 0x07);
 	}
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3174,7 +3175,7 @@ update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
 
 /*
  * This function prepares and returns parallel vacuum state if we can launch
- * even one worker.  This function is responsible to enter parallel mode,
+ * even one worker.  This function is responsible for entering parallel mode,
  * create a parallel context, and then initialize the DSM segment.
  */
 static LVParallelState *
@@ -3345,8 +3346,8 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
- * updated index statistics from DSM in local memory and then later use that
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
  * context, but that won't be safe (see ExitParallelMode).
@@ -3452,7 +3453,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 29057f3..14a8690 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -505,7 +505,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3a89f8f..495ac23 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2036,23 +2036,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to allow each
+ * worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
  * We allow each worker to update it as and when it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
- * certain threshold, which is calculated based on the number of active
- * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * We allow a worker to sleep only if it has performed I/O above a certain
+ * threshold, which is calculated based on the number of active workers
+ * (VacuumActiveNWorkers), and the overall cost balance is more than
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea..a4cd721 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
1.8.3.1

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Amit Kapila (#7)

Re: doc review for parallel vacuum

On Fri, 10 Apr 2020 at 16:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 8, 2020 at 12:49 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:

I don't have comments on your change other than the comments Amit
already sent. Thank you for reviewing this part!

I have made the modifications as per my comments. What do you think
about the attached?

Thank you for updating the patch! Looks good to me.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Justin Pryzby

pryzby@telsasoft.com

almost 6 years ago

In reply to: Amit Kapila (#7)

Re: doc review for parallel vacuum

On Fri, Apr 10, 2020 at 12:56:08PM +0530, Amit Kapila wrote:

On Wed, Apr 8, 2020 at 12:49 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:

I don't have comments on your change other than the comments Amit
already sent. Thank you for reviewing this part!

I have made the modifications as per my comments. What do you think
about the attached?

Couple more changes (in bold):

- The <option>PARALLEL</option> option is used only for vacuum PURPOSES.
- Even if this option is specified with THE <option>ANALYZE</option> option

Also, this part still doesn't read well:

- * amvacuumcleanup to the DSM segment if it's the first time to get it?
- * from them? because they? allocate it locally and it's possible that an
- * index will be vacuumed by the different vacuum process at the next

If you change "it" and "them" and "it" and say "*a* different", then it'll be
ok.

--
Justin

#10

Amit Kapila

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#9)

Re: doc review for parallel vacuum

On Fri, Apr 10, 2020 at 7:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Also, this part still doesn't read well:

- * amvacuumcleanup to the DSM segment if it's the first time to get it?
- * from them? because they? allocate it locally and it's possible that an
- * index will be vacuumed by the different vacuum process at the next

If you change "it" and "them" and "it" and say "*a* different", then it'll be
ok.

I am not sure if I follow how exactly you want to change it but still
let me know what you think about if we change it like: "Copy the index
bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
the DSM segment if it's the first time because they allocate locally
and it's possible that an index will be vacuumed by the different
vacuum process at the next time."

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#11

Justin Pryzby

pryzby@telsasoft.com

almost 6 years ago

In reply to: Amit Kapila (#10)

Re: doc review for parallel vacuum

On Mon, Apr 13, 2020 at 10:44:42AM +0530, Amit Kapila wrote:

On Fri, Apr 10, 2020 at 7:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Also, this part still doesn't read well:

- * amvacuumcleanup to the DSM segment if it's the first time to get it?
- * from them? because they? allocate it locally and it's possible that an
- * index will be vacuumed by the different vacuum process at the next

If you change "it" and "them" and "it" and say "*a* different", then it'll be
ok.

I am not sure if I follow how exactly you want to change it but still
let me know what you think about if we change it like: "Copy the index
bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
the DSM segment if it's the first time because they allocate locally
and it's possible that an index will be vacuumed by the different
vacuum process at the next time."

I changed "the" to "a" and removed "at":

|Copy the index
|bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
|the DSM segment if it's the first time [???] because they allocate locally
|and it's possible that an index will be vacuumed by a different
|vacuum process the next time."

Is it correct to say: "..if it's the first iteration" and "different process on
the next iteration" ? Or "cycle" ?

--
Justin

#12

Amit Kapila

amit.kapila16@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#11)

1 attachment(s)

Re: doc review for parallel vacuum

On Mon, Apr 13, 2020 at 2:00 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

|Copy the index
|bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
|the DSM segment if it's the first time [???] because they allocate locally
|and it's possible that an index will be vacuumed by a different
|vacuum process the next time."

Is it correct to say: "..if it's the first iteration" and "different process on
the next iteration" ? Or "cycle" ?

"cycle" sounds better. I have changed the patch as per your latest
comments. Let me know what you think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchapplication/octet-stream; name=v5-0001-Comments-and-doc-fixes-for-commit-40d964ec99.patchDownload

From 0cb92923babf71f965237bc33a771936baf8e574 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 13 Apr 2020 15:18:27 +0530
Subject: [PATCH v5] Comments and doc fixes for commit 40d964ec99.

Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila and Sawada Masahiko
Discussion: https://postgr.es/m/20200322021801.GB2563@telsasoft.com
---
 doc/src/sgml/ref/vacuum.sgml          | 22 +++++++--------
 src/backend/access/heap/vacuumlazy.c  | 52 +++++++++++++++++------------------
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/commands/vacuum.c         | 20 +++++++-------
 src/include/commands/vacuum.h         |  2 +-
 5 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 846056a..d92db46 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -232,15 +232,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform index vacuum and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the details of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -248,7 +248,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and exit at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -358,8 +358,8 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
    </para>
 
    <para>
-     The <option>PARALLEL</option> option is used only for vacuum purpose.
-     Even if this option is specified with <option>ANALYZE</option> option
+     The <option>PARALLEL</option> option is used only for vacuum purposes.
+     Even if this option is specified with the <option>ANALYZE</option> option
      it does not affect <option>ANALYZE</option>.
    </para>
 
@@ -367,7 +367,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps in proportion to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d3..68dff11 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -208,7 +208,7 @@ typedef struct LVShared
 	 * live tuples in the index vacuum case or the new live tuples in the
 	 * index cleanup case.
 	 *
-	 * estimated_count is true if the reltuples is an estimated value.
+	 * estimated_count is true if reltuples is an estimated value.
 	 */
 	double		reltuples;
 	bool		estimated_count;
@@ -232,8 +232,8 @@ typedef struct LVShared
 
 	/*
 	 * Number of active parallel workers.  This is used for computing the
-	 * minimum threshold of the vacuum cost balance for a worker to go for the
-	 * delay.
+	 * minimum threshold of the vacuum cost balance before a worker sleeps for
+	 * cost-based delay.
 	 */
 	pg_atomic_uint32 active_nworkers;
 
@@ -732,7 +732,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
  *		to reclaim dead line pointers.
  *
  *		If the table has at least two indexes, we execute both index vacuum
- *		and index cleanup with parallel workers unless the parallel vacuum is
+ *		and index cleanup with parallel workers unless parallel vacuum is
  *		disabled.  In a parallel vacuum, we enter parallel mode and then
  *		create both the parallel context and the DSM segment before starting
  *		heap scan so that we can record dead tuples to the DSM segment.  All
@@ -809,8 +809,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
-	 * Initialize the state for a parallel vacuum.  As of now, only one worker
-	 * can be used for an index, so we invoke parallelism only if there are at
+	 * Initialize state for a parallel vacuum.  As of now, only one worker can
+	 * be used for an index, so we invoke parallelism only if there are at
 	 * least two indexes on a table.
 	 */
 	if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1)
@@ -837,7 +837,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	}
 
 	/*
-	 * Allocate the space for dead tuples in case the parallel vacuum is not
+	 * Allocate the space for dead tuples in case parallel vacuum is not
 	 * initialized.
 	 */
 	if (!ParallelVacuumIsActive(lps))
@@ -2215,7 +2215,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
 		shared_indstats = get_indstats(lvshared, idx);
 
 		/*
-		 * Skip processing indexes that doesn't participate in parallel
+		 * Skip processing indexes that don't participate in parallel
 		 * operation
 		 */
 		if (shared_indstats == NULL ||
@@ -2312,12 +2312,12 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 
 	/*
 	 * Copy the index bulk-deletion result returned from ambulkdelete and
-	 * amvacuumcleanup to the DSM segment if it's the first time to get it
-	 * from them, because they allocate it locally and it's possible that an
-	 * index will be vacuumed by the different vacuum process at the next
-	 * time.  The copying of the result normally happens only after the first
-	 * time of index vacuuming.  From the second time, we pass the result on
-	 * the DSM segment so that they then update it directly.
+	 * amvacuumcleanup to the DSM segment if it's the first cycle because they
+	 * allocate locally and it's possible that an index will be vacuumed by a
+	 * different vacuum process the next cycle.  Copying the result normally
+	 * happens only the first time an index is vacuumed.  For any additional
+	 * vacuum pass, we directly point to the result on the DSM segment and
+	 * pass it to vacuum index APIs so that workers can update it directly.
 	 *
 	 * Since all vacuum workers write the bulk-deletion result at different
 	 * slots we can write them without locking.
@@ -2328,8 +2328,8 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 		shared_indstats->updated = true;
 
 		/*
-		 * Now that the stats[idx] points to the DSM segment, we don't need
-		 * the locally allocated results.
+		 * Now that stats[idx] points to the DSM segment, we don't need the
+		 * locally allocated results.
 		 */
 		pfree(*stats);
 		*stats = bulkdelete_res;
@@ -2449,7 +2449,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *		reltuples is the number of heap tuples and estimated_count is true
- *		if the reltuples is an estimated value.
+ *		if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -3050,9 +3050,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -3071,7 +3071,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
 	int			i;
 
 	/*
-	 * We don't allow to perform parallel operation in standalone backend or
+	 * We don't allow performing parallel operation in standalone backend or
 	 * when parallelism is disabled.
 	 */
 	if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3138,13 +3138,13 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 		if (!can_parallel_vacuum[i])
 			continue;
 
-		/* Set NOT NULL as this index do support parallelism */
+		/* Set NOT NULL as this index does support parallelism */
 		lvshared->bitmap[i >> 3] |= 1 << (i & 0x07);
 	}
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3174,7 +3174,7 @@ update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
 
 /*
  * This function prepares and returns parallel vacuum state if we can launch
- * even one worker.  This function is responsible to enter parallel mode,
+ * even one worker.  This function is responsible for entering parallel mode,
  * create a parallel context, and then initialize the DSM segment.
  */
 static LVParallelState *
@@ -3345,8 +3345,8 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
- * updated index statistics from DSM in local memory and then later use that
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
  * context, but that won't be safe (see ExitParallelMode).
@@ -3452,7 +3452,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 29057f3..14a8690 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -505,7 +505,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3a89f8f..495ac23 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2036,23 +2036,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to allow each
+ * worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
  * We allow each worker to update it as and when it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
- * certain threshold, which is calculated based on the number of active
- * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * We allow a worker to sleep only if it has performed I/O above a certain
+ * threshold, which is calculated based on the number of active workers
+ * (VacuumActiveNWorkers), and the overall cost balance is more than
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea..a4cd721 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
-	 * based on the number of indexes.  -1 indicates a parallel vacuum is
+	 * based on the number of indexes.  -1 indicates parallel vacuum is
 	 * disabled.
 	 */
 	int			nworkers;
-- 
1.8.3.1

#13

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: Amit Kapila (#12)

Re: doc review for parallel vacuum

On Mon, Apr 13, 2020 at 03:22:06PM +0530, Amit Kapila wrote:

On Mon, Apr 13, 2020 at 2:00 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

|Copy the index
|bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
|the DSM segment if it's the first time [???] because they allocate locally
|and it's possible that an index will be vacuumed by a different
|vacuum process the next time."

Is it correct to say: "..if it's the first iteration" and "different process on
the next iteration" ? Or "cycle" ?

"cycle" sounds better. I have changed the patch as per your latest
comments. Let me know what you think?

Looks good. One more change:

[-Even if-]{+If+} this option is specified with the <option>ANALYZE</option> [-option-]{+option,+}

Remove "even" and add comma.

Thanks,
--
Justin

#14

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Justin Pryzby (#13)

Re: doc review for parallel vacuum

On Tue, Apr 14, 2020 at 2:54 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Looks good. One more change:

[-Even if-]{+If+} this option is specified with the <option>ANALYZE</option> [-option-]{+option,+}

Remove "even" and add comma.

Pushed after making this change.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com