doc: expand note about pg_upgrade's --jobs option
Magnus noted to me off-list that the "et cetera" in the following sentence
in pg_upgrade's docs is doing quite a bit of heavy lifting:
The --jobs option allows multiple CPU cores to be used for
copying/linking of files, dumping and restoring database schemas in
parallel, etc.; a good place to start is the maximum of the number of
CPU cores and tablespaces.
I added the "et cetera" in commit 40e2e5e92b to cover the many follow-up
commits that parallelized various pg_upgrade tasks. I was initially
worried that trying to list all the parallelized stuff would be too
verbose, but looking again, I think all the changes can be grouped into
"gathering cluster information" and "performing cluster checks." The
attached patch replaces the "et cetera" with those two general categories.
--
nathan
Attachments:
v1-0001-doc-Expand-note-about-pg_upgrade-s-jobs-option.patchtext/plain; charset=us-asciiDownload+3-3
On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com> wrote:
The attached patch replaces the "et cetera" with those two general categories.
LGTM.
--
Daniel Gustafsson
On Wed, Mar 5, 2025 at 11:00 AM Daniel Gustafsson <daniel@yesql.se> wrote:
On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com>
wrote:
The attached patch replaces the "et cetera" with those two general
categories.
LGTM.
Another option that I think would also work is to just cut down the details
to just "The <option>--jobs</option> option allows multiple CPU cores to be
used".
I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
Another option that I think would also work is to just cut down the details
to just "The <option>--jobs</option> option allows multiple CPU cores to be
used".
That's fine with me. It's probably not particularly actionable
information, anyway. If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace). If you've just got one big database in the default
tablespace, --jobs won't help.
I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?
I've always read it to mean the former. But I'm not sure that's great
advice. If you have 8 cores and 100 tablespaces, does it make sense to use
--jobs=100? Ordinarily, I'd suggest the number of cores as the starting
point.
--
nathan
On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:
On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
Another option that I think would also work is to just cut down the details
to just "The <option>--jobs</option> option allows multiple CPU cores to be
used".That's fine with me. It's probably not particularly actionable
information, anyway. If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace). If you've just got one big database in the default
tablespace, --jobs won't help.I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?I've always read it to mean the former. But I'm not sure that's great
advice. If you have 8 cores and 100 tablespaces, does it make sense to use
--jobs=100? Ordinarily, I'd suggest the number of cores as the starting
point.
Here's another attempt at the patch based on the latest discussion.
--
nathan
Attachments:
v2-0001-doc-Adjust-note-about-pg_upgrade-s-jobs-option.patchtext/plain; charset=us-asciiDownload+5-7
On Wed, Mar 5, 2025 at 5:28 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:
On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
Another option that I think would also work is to just cut down the
details
to just "The <option>--jobs</option> option allows multiple CPU cores
to be
used".
That's fine with me. It's probably not particularly actionable
information, anyway. If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace). If you've just got one big database in the default
tablespace, --jobs won't help.I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?I've always read it to mean the former. But I'm not sure that's great
advice. If you have 8 cores and 100 tablespaces, does it make sense touse
--jobs=100? Ordinarily, I'd suggest the number of cores as the starting
point.Here's another attempt at the patch based on the latest discussion.
LGTM!
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>