optimizing number of workers
Having just received a shiny new dual CPU machine to use as a postgresql
server, I'm trying to do some reasonable efforts to configure it correctly.
The hard
ware has 128 cores, and I am running a VM with Redhat 9 and Postgresql
16.9.
In postgresql.conf I have:
max_worker_processes = 90 # (change requires restart)
max_parallel_workers_per_gather = 72 # gsh 26 oct 2022
max_parallel_maintenance_workers = 72 # gsh 12 jun 2025
max_parallel_workers = 72 # gsh 12 jun 2025
max_logical_replication_workers = 72 # gsh 12 jun 2025
max_sync_workers_per_subscription = 72 # gsh 12 jun 2025
autovacuum_max_workers = 12 # max number of autovacuum
subprocesses
When I do a simple count of a large (large being 1.8 Billion entries), I get
about 10 workers used.
prod_v1_0_0_rc1=# explain (analyze, buffers) select count(*) from
gaiadr3.gaia_source;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=14379796.81..14379796.82 rows=1 width=8) (actual
time=16702.806..16705.479 rows=1 loops=1)
Buffers: shared hit=2507481
-> Gather (cost=14379795.78..14379796.79 rows=10 width=8) (actual
time=16702.513..16705.470 rows=11 loops=1)
Workers Planned: 10
Workers Launched: 10
Buffers: shared hit=2507481
-> Partial Aggregate (cost=14379785.78..14379785.79 rows=1
width=8) (actual time=16691.820..16691.821 rows=1 loops=11)
Buffers: shared hit=2507481
-> Parallel Index Only Scan using gaia_source_nest128 on
gaia_source (cost=0.58..13926632.85 rows=181261171 width=0) (actual
time=0.025..9559.644 rows=164700888 loops=11)
Heap Fetches: 0
Buffers: shared hit=2507481
Planning:
Buffers: shared hit=163
Planning Time: 14.898 ms
Execution Time: 16705.592 ms
Postgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, it may be
that Postgresql has allocated the proper number of works. But if I wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help?
From: Greg Hennessy <greg.hennessy@gmail.com>
Date: Thursday, July 10, 2025 at 4:40 PM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: optimizing number of workers
Having just received a shiny new dual CPU machine to use as a postgresql
server, I'm trying to do some reasonable efforts to configure it correctly. The hard
ware has 128 cores, and I am running a VM with Redhat 9 and Postgresql 16.9.
In postgresql.conf I have:
max_worker_processes = 90 # (change requires restart)
max_parallel_workers_per_gather = 72 # gsh 26 oct 2022
max_parallel_maintenance_workers = 72 # gsh 12 jun 2025
max_parallel_workers = 72 # gsh 12 jun 2025
max_logical_replication_workers = 72 # gsh 12 jun 2025
max_sync_workers_per_subscription = 72 # gsh 12 jun 2025
autovacuum_max_workers = 12 # max number of autovacuum subprocesses
When I do a simple count of a large (large being 1.8 Billion entries), I get
about 10 workers used.
prod_v1_0_0_rc1=# explain (analyze, buffers) select count(*) from gaiadr3.gaia_source;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=14379796.81..14379796.82 rows=1 width=8) (actual time=16702.806..16705.479 rows=1 loops=1)
Buffers: shared hit=2507481
-> Gather (cost=14379795.78..14379796.79 rows=10 width=8) (actual time=16702.513..16705.470 rows=11 loops=1)
Workers Planned: 10
Workers Launched: 10
Buffers: shared hit=2507481
-> Partial Aggregate (cost=14379785.78..14379785.79 rows=1 width=8) (actual time=16691.820..16691.821 rows=1 loops=11)
Buffers: shared hit=2507481
-> Parallel Index Only Scan using gaia_source_nest128 on gaia_source (cost=0.58..13926632.85 rows=181261171 width=0) (actual time=0.025..9559.644 rows=164700888 loops=11)
Heap Fetches: 0
Buffers: shared hit=2507481
Planning:
Buffers: shared hit=163
Planning Time: 14.898 ms
Execution Time: 16705.592 ms
Postgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, it may be
that Postgresql has allocated the proper number of works. But if I wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help?
I believe you can decrease min_parallel_table_scan_size (default is 8MB) and min_parallel_index_scan_size (default 5112kB). The number of workers depends also on a multiple of these settings.
Setting those values to zero (not something I'd want to do in production)
changes the number of workes
from 10 to 13. At least something, but if anyone knows where discussion
about how to use
large numbers of CPU's in postgresql are being held I'd appreciate learning
about it.
Greg
On Fri, Jul 11, 2025 at 2:11 PM Weck, Luis <luis.weck@pismo.io> wrote:
Show quoted text
*From: *Greg Hennessy <greg.hennessy@gmail.com>
*Date: *Thursday, July 10, 2025 at 4:40 PM
*To: * pgsql-general@lists.postgresql.org <
pgsql-general@lists.postgresql.org>
*Subject: *optimizing number of workersHaving just received a shiny new dual CPU machine to use as a postgresql
server, I'm trying to do some reasonable efforts to configure it
correctly. The hard
ware has 128 cores, and I am running a VM with Redhat 9 and Postgresql
16.9.In postgresql.conf I have:
max_worker_processes = 90 # (change requires restart)
max_parallel_workers_per_gather = 72 # gsh 26 oct 2022
max_parallel_maintenance_workers = 72 # gsh 12 jun 2025
max_parallel_workers = 72 # gsh 12 jun 2025
max_logical_replication_workers = 72 # gsh 12 jun 2025
max_sync_workers_per_subscription = 72 # gsh 12 jun 2025
autovacuum_max_workers = 12 # max number of autovacuum
subprocessesWhen I do a simple count of a large (large being 1.8 Billion entries), I
get
about 10 workers used.prod_v1_0_0_rc1=# explain (analyze, buffers) select count(*) from
gaiadr3.gaia_source;QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=14379796.81..14379796.82 rows=1 width=8)
(actual time=16702.806..16705.479 rows=1 loops=1)
Buffers: shared hit=2507481
-> Gather (cost=14379795.78..14379796.79 rows=10 width=8) (actual
time=16702.513..16705.470 rows=11 loops=1)
Workers Planned: 10
Workers Launched: 10
Buffers: shared hit=2507481
-> Partial Aggregate (cost=14379785.78..14379785.79 rows=1
width=8) (actual time=16691.820..16691.821 rows=1 loops=11)
Buffers: shared hit=2507481
-> Parallel Index Only Scan using gaia_source_nest128 on
gaia_source (cost=0.58..13926632.85 rows=181261171 width=0) (actual
time=0.025..9559.644 rows=164700888 loops=11)
Heap Fetches: 0
Buffers: shared hit=2507481
Planning:
Buffers: shared hit=163
Planning Time: 14.898 ms
Execution Time: 16705.592 msPostgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, it may be
that Postgresql has allocated the proper number of works. But if I wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help?I believe you can decrease min_parallel_table_scan_size (default is 8MB)
and min_parallel_index_scan_size (default 5112kB). The number of workers
depends also on a multiple of these settings.
Greg Hennessy <greg.hennessy@gmail.com> writes:
Postgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, it may be
that Postgresql has allocated the proper number of works. But if I wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help?
See the bit about
* Select the number of workers based on the log of the size of
* the relation. This probably needs to be a good deal more
* sophisticated, but we need something here for now. Note that
in compute_parallel_worker(). You can move things at the margins by
changing min_parallel_table_scan_size, but that logarithmic behavior
will constrain the number of workers pretty quickly. You'd have to
change that code to assign a whole bunch of workers to one scan.
(No, I don't know why it's done like that. There might be related
discussion in our archives, but finding it could be difficult.)
regards, tom lane
if I "alter table allwise set (parallel_workers = 64);" then I can get 64
workers. I wonder if the code
to check the rel_parallel_workers do deal with the default algorithm not
allocating sufficient
parallel_workers.
On Mon, Jul 14, 2025 at 2:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Greg Hennessy <greg.hennessy@gmail.com> writes:
Postgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, itmay be
that Postgresql has allocated the proper number of works. But if I
wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help?See the bit about
* Select the number of workers based on the log of the size of
* the relation. This probably needs to be a good deal more
* sophisticated, but we need something here for now. Note
thatin compute_parallel_worker(). You can move things at the margins by
changing min_parallel_table_scan_size, but that logarithmic behavior
will constrain the number of workers pretty quickly. You'd have to
change that code to assign a whole bunch of workers to one scan.(No, I don't know why it's done like that. There might be related
discussion in our archives, but finding it could be difficult.)regards, tom lane