doc: expand note about pg_upgrade's --jobs option

Started by Nathan Bossart10 months ago7 messages
#1Nathan Bossart
nathandbossart@gmail.com
1 attachment(s)

Magnus noted to me off-list that the "et cetera" in the following sentence
in pg_upgrade's docs is doing quite a bit of heavy lifting:

The --jobs option allows multiple CPU cores to be used for
copying/linking of files, dumping and restoring database schemas in
parallel, etc.; a good place to start is the maximum of the number of
CPU cores and tablespaces.

I added the "et cetera" in commit 40e2e5e92b to cover the many follow-up
commits that parallelized various pg_upgrade tasks. I was initially
worried that trying to list all the parallelized stuff would be too
verbose, but looking again, I think all the changes can be grouped into
"gathering cluster information" and "performing cluster checks." The
attached patch replaces the "et cetera" with those two general categories.

--
nathan

Attachments:

v1-0001-doc-Expand-note-about-pg_upgrade-s-jobs-option.patchtext/plain; charset=us-asciiDownload
From 23c5a41c0b7a61433b1cd0e6315b9c4bbc536608 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Tue, 4 Mar 2025 11:52:45 -0600
Subject: [PATCH v1 1/1] doc: Expand note about pg_upgrade's --jobs option.

Commit 40e2e5e92b and several follow-up commits parallelized many
pg_upgrade tasks but did not add much detail to the documentation.
This commit expands the existing note for --jobs to include the
general categories of newly-parallelized tasks.

Reported-by: Magnus Hagander <magnus@hagander.net>
Discussion: ???
---
 doc/src/sgml/ref/pgupgrade.sgml | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7bdd85c5cff..18c71355085 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -534,8 +534,9 @@ NET STOP postgresql-&majorversion;
 
     <para>
      The <option>--jobs</option> option allows multiple CPU cores to be used
-     for copying/linking of files, dumping and restoring database schemas
-     in parallel, etc.;  a good place to start is the maximum of the number of
+     for copying/linking of files, dumping and restoring database schemas,
+     gathering cluster information, and performing cluster checks;
+     a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
      machine.
-- 
2.39.5 (Apple Git-154)

#2Daniel Gustafsson
daniel@yesql.se
In reply to: Nathan Bossart (#1)
Re: doc: expand note about pg_upgrade's --jobs option

On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com> wrote:

The attached patch replaces the "et cetera" with those two general categories.

LGTM.

--
Daniel Gustafsson

#3Magnus Hagander
magnus@hagander.net
In reply to: Daniel Gustafsson (#2)
Re: doc: expand note about pg_upgrade's --jobs option

On Wed, Mar 5, 2025 at 11:00 AM Daniel Gustafsson <daniel@yesql.se> wrote:

On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com>

wrote:

The attached patch replaces the "et cetera" with those two general

categories.

LGTM.

Another option that I think would also work is to just cut down the details
to just "The <option>--jobs</option> option allows multiple CPU cores to be
used".

I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Magnus Hagander (#3)
Re: doc: expand note about pg_upgrade's --jobs option

On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:

Another option that I think would also work is to just cut down the details
to just "The <option>--jobs</option> option allows multiple CPU cores to be
used".

That's fine with me. It's probably not particularly actionable
information, anyway. If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace). If you've just got one big database in the default
tablespace, --jobs won't help.

I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

I've always read it to mean the former. But I'm not sure that's great
advice. If you have 8 cores and 100 tablespaces, does it make sense to use
--jobs=100? Ordinarily, I'd suggest the number of cores as the starting
point.

--
nathan

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#4)
1 attachment(s)
Re: doc: expand note about pg_upgrade's --jobs option

On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:

On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:

Another option that I think would also work is to just cut down the details
to just "The <option>--jobs</option> option allows multiple CPU cores to be
used".

That's fine with me. It's probably not particularly actionable
information, anyway. If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace). If you've just got one big database in the default
tablespace, --jobs won't help.

I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

I've always read it to mean the former. But I'm not sure that's great
advice. If you have 8 cores and 100 tablespaces, does it make sense to use
--jobs=100? Ordinarily, I'd suggest the number of cores as the starting
point.

Here's another attempt at the patch based on the latest discussion.

--
nathan

Attachments:

v2-0001-doc-Adjust-note-about-pg_upgrade-s-jobs-option.patchtext/plain; charset=us-asciiDownload
From e7f8633672530fb06dac72271cda429ad873a640 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Wed, 5 Mar 2025 10:19:27 -0600
Subject: [PATCH v2 1/1] doc: Adjust note about pg_upgrade's --jobs option.

Reported-by: Magnus Hagander <magnus@hagander.net>
Reviewed-by: ???
Discussion: https://postgr.es/m/Z8dBn_5iGLNuYiPo%40nathan
---
 doc/src/sgml/ref/pgupgrade.sgml | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 6ca20f19ec2..10911d81174 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -565,12 +565,11 @@ NET STOP postgresql-&majorversion;
     </para>
 
     <para>
-     The <option>--jobs</option> option allows multiple CPU cores to be used
-     for copying/linking of files, dumping and restoring database schemas
-     in parallel, etc.;  a good place to start is the maximum of the number of
-     CPU cores and tablespaces.  This option can dramatically reduce the
-     time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     Setting <option>--jobs</option> to 2 or higher allows pg_upgrade to
+     process multiple databases and tablespaces in parallel.  A good starting
+     point is the number of CPU cores on the machine.  This option can
+     substantially reduce the upgrade time for multi-database and
+     multi-tablespace servers.
     </para>
 
     <para>
-- 
2.39.5 (Apple Git-154)

#6Magnus Hagander
magnus@hagander.net
In reply to: Nathan Bossart (#5)
Re: doc: expand note about pg_upgrade's --jobs option

On Wed, Mar 5, 2025 at 5:28 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:

On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:

On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:

Another option that I think would also work is to just cut down the

details

to just "The <option>--jobs</option> option allows multiple CPU cores

to be

used".

That's fine with me. It's probably not particularly actionable
information, anyway. If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace). If you've just got one big database in the default
tablespace, --jobs won't help.

I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

I've always read it to mean the former. But I'm not sure that's great
advice. If you have 8 cores and 100 tablespaces, does it make sense to

use

--jobs=100? Ordinarily, I'd suggest the number of cores as the starting
point.

Here's another attempt at the patch based on the latest discussion.

LGTM!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Magnus Hagander (#6)
Re: doc: expand note about pg_upgrade's --jobs option

On Sat, Mar 08, 2025 at 01:43:52AM +0100, Magnus Hagander wrote:

LGTM!

Thanks, committed.

--
nathan