Automatically sizing the IO worker pool

Started by Thomas Munro9 months ago11 messages

thomas.munro@gmail.com

9 months ago

5 attachment(s)

It's hard to know how to set io_workers=3. If it's too small,
io_method=worker's small submission queue overflows and it silently
falls back to synchronous IO. If it's too high, it generates a lot of
pointless wakeups and scheduling overhead, which might be considered
an independent problem or not, but having the right size pool
certainly mitigates it. Here's a patch to replace that GUC with:

io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms

It grows the pool when a backlog is detected (better ideas for this
logic welcome), and lets idle workers time out. IO jobs were already
concentrated into the lowest numbered workers, partly because that
seemed to have marginally better latency than anything else tried so
far due to latch collapsing with lucky timing, and partly in
anticipation of this.

The patch also reduces bogus wakeups a bit by being a bit more
cautious about fanout. That could probably be improved a lot more and
needs more research. It's quite tricky to figure out how to suppress
wakeups without throwing potential concurrency away.

The first couple of patches are independent of this topic, and might
be potential cleanups/fixes for master/v18. The last is a simple
latency test.

Ideas, testing, flames etc welcome.

Attachments:

0001-aio-Regularize-io_method-worker-naming-conventions.patchtext/x-patch; charset=US-ASCII; name=0001-aio-Regularize-io_method-worker-naming-conventions.patchDownload

From 1dbba36f67df5d3d34a990613d6d68d15caf1b17 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 29 Mar 2025 13:25:27 +1300
Subject: [PATCH 1/5] aio: Regularize io_method=worker naming conventions.

method_worker.c didn't keep up with the pattern of PgAioXXX for type
names in the pgaio module.  Add the missing "Pg" prefix used else where.

Likewise for pgaio_choose_idle_worker() which alone failed to use a
pgaio_worker_XXX() name refecting its submodule.  Rename.

Standardize on parameter names num_staged_ios, staged_ios for the
internal submission function.

Rename the array of handle IDs in PgAioSubmissionQueue to sqes,
since that's a term of art seen in many of these types of systems.
---
 src/backend/storage/aio/method_worker.c | 54 ++++++++++++-------------
 src/tools/pgindent/typedefs.list        |  6 +--
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c
index 8ad17ec1ef7..ba5bc5e44ba 100644
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -51,26 +51,26 @@
 #define IO_WORKER_WAKEUP_FANOUT 2
 
 
-typedef struct AioWorkerSubmissionQueue
+typedef struct PgAioWorkerSubmissionQueue
 {
 	uint32		size;
 	uint32		mask;
 	uint32		head;
 	uint32		tail;
-	uint32		ios[FLEXIBLE_ARRAY_MEMBER];
-} AioWorkerSubmissionQueue;
+	uint32		sqes[FLEXIBLE_ARRAY_MEMBER];
+} PgAioWorkerSubmissionQueue;
 
-typedef struct AioWorkerSlot
+typedef struct PgAioWorkerSlot
 {
 	Latch	   *latch;
 	bool		in_use;
-} AioWorkerSlot;
+} PgAioWorkerSlot;
 
-typedef struct AioWorkerControl
+typedef struct PgAioWorkerControl
 {
 	uint64		idle_worker_mask;
-	AioWorkerSlot workers[FLEXIBLE_ARRAY_MEMBER];
-} AioWorkerControl;
+	PgAioWorkerSlot workers[FLEXIBLE_ARRAY_MEMBER];
+} PgAioWorkerControl;
 
 
 static size_t pgaio_worker_shmem_size(void);
@@ -95,8 +95,8 @@ int			io_workers = 3;
 
 static int	io_worker_queue_size = 64;
 static int	MyIoWorkerId;
-static AioWorkerSubmissionQueue *io_worker_submission_queue;
-static AioWorkerControl *io_worker_control;
+static PgAioWorkerSubmissionQueue *io_worker_submission_queue;
+static PgAioWorkerControl *io_worker_control;
 
 
 static size_t
@@ -105,15 +105,15 @@ pgaio_worker_queue_shmem_size(int *queue_size)
 	/* Round size up to next power of two so we can make a mask. */
 	*queue_size = pg_nextpower2_32(io_worker_queue_size);
 
-	return offsetof(AioWorkerSubmissionQueue, ios) +
+	return offsetof(PgAioWorkerSubmissionQueue, sqes) +
 		sizeof(uint32) * *queue_size;
 }
 
 static size_t
 pgaio_worker_control_shmem_size(void)
 {
-	return offsetof(AioWorkerControl, workers) +
-		sizeof(AioWorkerSlot) * MAX_IO_WORKERS;
+	return offsetof(PgAioWorkerControl, workers) +
+		sizeof(PgAioWorkerSlot) * MAX_IO_WORKERS;
 }
 
 static size_t
@@ -161,7 +161,7 @@ pgaio_worker_shmem_init(bool first_time)
 }
 
 static int
-pgaio_choose_idle_worker(void)
+pgaio_worker_choose_idle(void)
 {
 	int			worker;
 
@@ -178,7 +178,7 @@ pgaio_choose_idle_worker(void)
 static bool
 pgaio_worker_submission_queue_insert(PgAioHandle *ioh)
 {
-	AioWorkerSubmissionQueue *queue;
+	PgAioWorkerSubmissionQueue *queue;
 	uint32		new_head;
 
 	queue = io_worker_submission_queue;
@@ -190,7 +190,7 @@ pgaio_worker_submission_queue_insert(PgAioHandle *ioh)
 		return false;			/* full */
 	}
 
-	queue->ios[queue->head] = pgaio_io_get_id(ioh);
+	queue->sqes[queue->head] = pgaio_io_get_id(ioh);
 	queue->head = new_head;
 
 	return true;
@@ -199,14 +199,14 @@ pgaio_worker_submission_queue_insert(PgAioHandle *ioh)
 static uint32
 pgaio_worker_submission_queue_consume(void)
 {
-	AioWorkerSubmissionQueue *queue;
+	PgAioWorkerSubmissionQueue *queue;
 	uint32		result;
 
 	queue = io_worker_submission_queue;
 	if (queue->tail == queue->head)
 		return UINT32_MAX;		/* empty */
 
-	result = queue->ios[queue->tail];
+	result = queue->sqes[queue->tail];
 	queue->tail = (queue->tail + 1) & (queue->size - 1);
 
 	return result;
@@ -239,37 +239,37 @@ pgaio_worker_needs_synchronous_execution(PgAioHandle *ioh)
 }
 
 static void
-pgaio_worker_submit_internal(int nios, PgAioHandle *ios[])
+pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 {
 	PgAioHandle *synchronous_ios[PGAIO_SUBMIT_BATCH_SIZE];
 	int			nsync = 0;
 	Latch	   *wakeup = NULL;
 	int			worker;
 
-	Assert(nios <= PGAIO_SUBMIT_BATCH_SIZE);
+	Assert(num_staged_ios <= PGAIO_SUBMIT_BATCH_SIZE);
 
 	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
-	for (int i = 0; i < nios; ++i)
+	for (int i = 0; i < num_staged_ios; ++i)
 	{
-		Assert(!pgaio_worker_needs_synchronous_execution(ios[i]));
-		if (!pgaio_worker_submission_queue_insert(ios[i]))
+		Assert(!pgaio_worker_needs_synchronous_execution(staged_ios[i]));
+		if (!pgaio_worker_submission_queue_insert(staged_ios[i]))
 		{
 			/*
 			 * We'll do it synchronously, but only after we've sent as many as
 			 * we can to workers, to maximize concurrency.
 			 */
-			synchronous_ios[nsync++] = ios[i];
+			synchronous_ios[nsync++] = staged_ios[i];
 			continue;
 		}
 
 		if (wakeup == NULL)
 		{
 			/* Choose an idle worker to wake up if we haven't already. */
-			worker = pgaio_choose_idle_worker();
+			worker = pgaio_worker_choose_idle();
 			if (worker >= 0)
 				wakeup = io_worker_control->workers[worker].latch;
 
-			pgaio_debug_io(DEBUG4, ios[i],
+			pgaio_debug_io(DEBUG4, staged_ios[i],
 						   "choosing worker %d",
 						   worker);
 		}
@@ -482,7 +482,7 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 						   IO_WORKER_WAKEUP_FANOUT);
 			for (int i = 0; i < nwakeups; ++i)
 			{
-				if ((worker = pgaio_choose_idle_worker()) < 0)
+				if ((worker = pgaio_worker_choose_idle()) < 0)
 					break;
 				latches[nlatches++] = io_worker_control->workers[worker].latch;
 			}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d16bc208654..9946cfcec41 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -55,9 +55,6 @@ AggStrategy
 AggTransInfo
 Aggref
 AggregateInstrumentation
-AioWorkerControl
-AioWorkerSlot
-AioWorkerSubmissionQueue
 AlenState
 Alias
 AllocBlock
@@ -2175,6 +2172,9 @@ PgAioTargetID
 PgAioTargetInfo
 PgAioUringContext
 PgAioWaitRef
+PgAioWorkerControl
+PgAioWorkerSlot
+PgAioWorkerSubmissionQueue
 PgArchData
 PgBackendGSSStatus
 PgBackendSSLStatus
-- 
2.39.5

0002-aio-Remove-IO-worker-ID-references-from-postmaster.c.patchtext/x-patch; charset=US-ASCII; name=0002-aio-Remove-IO-worker-ID-references-from-postmaster.c.patchDownload

From 99c9a303d37d7e2232d3c28ee091aed82fe5b8eb Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 11 Apr 2025 23:10:10 +1200
Subject: [PATCH 2/5] aio: Remove IO worker ID references from postmaster.c.

An ancient ancestor of this code had the postmaster assign IDs to IO
workers.  Now it tracks them in an unordered array, and it might be
confusing to readers that it refers to their indexes as IDs in various
places.  Fix.
---
 src/backend/postmaster/postmaster.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 17fed96fe20..0e8623dea18 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -4337,15 +4337,15 @@ maybe_start_bgworkers(void)
 static bool
 maybe_reap_io_worker(int pid)
 {
-	for (int id = 0; id < MAX_IO_WORKERS; ++id)
+	for (int i = 0; i < MAX_IO_WORKERS; ++i)
 	{
-		if (io_worker_children[id] &&
-			io_worker_children[id]->pid == pid)
+		if (io_worker_children[i] &&
+			io_worker_children[i]->pid == pid)
 		{
-			ReleasePostmasterChildSlot(io_worker_children[id]);
+			ReleasePostmasterChildSlot(io_worker_children[i]);
 
 			--io_worker_count;
-			io_worker_children[id] = NULL;
+			io_worker_children[i] = NULL;
 			return true;
 		}
 	}
@@ -4389,22 +4389,22 @@ maybe_adjust_io_workers(void)
 	while (io_worker_count < io_workers)
 	{
 		PMChild    *child;
-		int			id;
+		int			i;
 
 		/* find unused entry in io_worker_children array */
-		for (id = 0; id < MAX_IO_WORKERS; ++id)
+		for (i = 0; i < MAX_IO_WORKERS; ++i)
 		{
-			if (io_worker_children[id] == NULL)
+			if (io_worker_children[i] == NULL)
 				break;
 		}
-		if (id == MAX_IO_WORKERS)
-			elog(ERROR, "could not find a free IO worker ID");
+		if (i == MAX_IO_WORKERS)
+			elog(ERROR, "could not find a free IO worker slot");
 
 		/* Try to launch one. */
 		child = StartChildProcess(B_IO_WORKER);
 		if (child != NULL)
 		{
-			io_worker_children[id] = child;
+			io_worker_children[i] = child;
 			++io_worker_count;
 		}
 		else
@@ -4415,11 +4415,11 @@ maybe_adjust_io_workers(void)
 	if (io_worker_count > io_workers)
 	{
 		/* ask the IO worker in the highest slot to exit */
-		for (int id = MAX_IO_WORKERS - 1; id >= 0; --id)
+		for (int i = MAX_IO_WORKERS - 1; i >= 0; --i)
 		{
-			if (io_worker_children[id] != NULL)
+			if (io_worker_children[i] != NULL)
 			{
-				kill(io_worker_children[id]->pid, SIGUSR2);
+				kill(io_worker_children[i]->pid, SIGUSR2);
 				break;
 			}
 		}
-- 
2.39.5

0003-aio-Try-repeatedly-to-give-batched-IOs-to-workers.patchtext/x-patch; charset=US-ASCII; name=0003-aio-Try-repeatedly-to-give-batched-IOs-to-workers.patchDownload

From a90a692725eedd692f934bf3ed56a2e3a7f3fc2c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 11 Apr 2025 21:17:26 +1200
Subject: [PATCH 3/5] aio: Try repeatedly to give batched IOs to workers.

Previously, if the first of a batch of IOs didn't fit in a batch we'd
run all of them synchronously.  Andres rightly pointed out that we
should really try again between synchronous IOs, since the workers might
have made progress.

Suggested-by: Andres Freund <andres@anarazel.de>
---
 src/backend/storage/aio/method_worker.c | 30 ++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c
index ba5bc5e44ba..c20d6d0f18b 100644
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -280,12 +280,36 @@ pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 		SetLatch(wakeup);
 
 	/* Run whatever is left synchronously. */
-	if (nsync > 0)
+	for (int i = 0; i < nsync; ++i)
 	{
-		for (int i = 0; i < nsync; ++i)
+		wakeup = NULL;
+
+		/*
+		 * Between synchronous IO operations, try again to enqueue as many as
+		 * we can.
+		 */
+		if (i > 0)
 		{
-			pgaio_io_perform_synchronously(synchronous_ios[i]);
+			wakeup = NULL;
+
+			LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
+			while (i < nsync &&
+				   pgaio_worker_submission_queue_insert(synchronous_ios[i]))
+			{
+				if (wakeup == NULL && (worker = pgaio_worker_choose_idle()) >= 0)
+					wakeup = io_worker_control->workers[worker].latch;
+				i++;
+			}
+			LWLockRelease(AioWorkerSubmissionQueueLock);
+
+			if (wakeup)
+				SetLatch(wakeup);
+
+			if (i == nsync)
+				break;
 		}
+
+		pgaio_io_perform_synchronously(synchronous_ios[i]);
 	}
 }
 
-- 
2.39.5

0004-aio-Adjust-IO-worker-pool-size-automatically.patchtext/x-patch; charset=US-ASCII; name=0004-aio-Adjust-IO-worker-pool-size-automatically.patchDownload

From 02325442bea440e65b5f3817c3fb8bd4681bbd25 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 22 Mar 2025 00:36:49 +1300
Subject: [PATCH 4/5] aio: Adjust IO worker pool size automatically.

Replace the simple io_workers setting with:

  io_min_workers=1
  io_max_workers=8
  io_worker_idle_timeout=60s
  io_worker_launch_interval=500ms

The pool is automatically sized within the configured range according
to demand.

XXX WIP
---
 doc/src/sgml/config.sgml                      |  70 ++-
 src/backend/postmaster/postmaster.c           |  64 ++-
 src/backend/storage/aio/method_worker.c       | 450 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   1 +
 src/backend/utils/misc/guc_tables.c           |  46 +-
 src/backend/utils/misc/postgresql.conf.sample |   5 +-
 src/include/storage/io_worker.h               |   9 +-
 src/include/storage/lwlocklist.h              |   1 +
 src/include/storage/pmsignal.h                |   1 +
 src/test/modules/test_aio/t/002_io_workers.pl |  15 +-
 10 files changed, 541 insertions(+), 121 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1674c22cb2..9f2e7ae6785 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2769,16 +2769,76 @@ include_dir 'conf.d'
        </listitem>
       </varlistentry>
 
-      <varlistentry id="guc-io-workers" xreflabel="io_workers">
-       <term><varname>io_workers</varname> (<type>int</type>)
+      <varlistentry id="guc-io-min-workers" xreflabel="io_min_workers">
+       <term><varname>io_min_workers</varname> (<type>int</type>)
        <indexterm>
-        <primary><varname>io_workers</varname> configuration parameter</primary>
+        <primary><varname>io_min_workers</varname> configuration parameter</primary>
        </indexterm>
        </term>
        <listitem>
         <para>
-         Selects the number of I/O worker processes to use. The default is
-         3. This parameter can only be set in the
+         Sets the minimum number of I/O worker processes to use. The default is
+         1. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command
+         line.
+        </para>
+        <para>
+         Only has an effect if <xref linkend="guc-io-method"/> is set to
+         <literal>worker</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry id="guc-io-max-workers" xreflabel="io_max_workers">
+       <term><varname>io_max_workers</varname> (<type>int</type>)
+       <indexterm>
+        <primary><varname>io_max_workers</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of I/O worker processes to use. The default is
+         8. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command
+         line.
+        </para>
+        <para>
+         Only has an effect if <xref linkend="guc-io-method"/> is set to
+         <literal>worker</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry id="guc-io-worker-idle-timeout" xreflabel="io_worker_idle_timeout">
+       <term><varname>io_worker_idle_timeout</varname> (<type>int</type>)
+       <indexterm>
+        <primary><varname>io_worker_idle_timeout</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the time after which idle I/O worker processes will exit, reducing the
+         maximum size of the I/O worker pool towards the minimum.  The default
+         is 1 minute.
+         This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command
+         line.
+        </para>
+        <para>
+         Only has an effect if <xref linkend="guc-io-method"/> is set to
+         <literal>worker</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry id="guc-io-worker-launch-interval" xreflabel="io_worker_launch_interval">
+       <term><varname>io_worker_launch_interval</varname> (<type>int</type>)
+       <indexterm>
+        <primary><varname>io_worker_launch_interval</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the minimum time between launching new I/O workers.  This can be used to avoid
+         sudden bursts of new I/O workers.  The default is 100ms.
+         This parameter can only be set in the
          <filename>postgresql.conf</filename> file or on the server command
          line.
         </para>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 0e8623dea18..b3f68897194 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -408,6 +408,7 @@ static DNSServiceRef bonjour_sdref = NULL;
 #endif
 
 /* State for IO worker management. */
+static TimestampTz io_worker_launch_delay_until = 0;
 static int	io_worker_count = 0;
 static PMChild *io_worker_children[MAX_IO_WORKERS];
 
@@ -1569,6 +1570,15 @@ DetermineSleepTime(void)
 	if (StartWorkerNeeded)
 		return 0;
 
+	/* If we need a new IO worker, defer until launch delay expires. */
+	if (pgaio_worker_test_new_worker_needed() &&
+		io_worker_count < io_max_workers)
+	{
+		if (io_worker_launch_delay_until == 0)
+			return 0;
+		next_wakeup = io_worker_launch_delay_until;
+	}
+
 	if (HaveCrashedWorker)
 	{
 		dlist_mutable_iter iter;
@@ -3750,6 +3760,15 @@ process_pm_pmsignal(void)
 		StartWorkerNeeded = true;
 	}
 
+	/* Process IO worker start requets. */
+	if (CheckPostmasterSignal(PMSIGNAL_IO_WORKER_CHANGE))
+	{
+		/*
+		 * No local flag, as the state is exposed through pgaio_worker_*()
+		 * functions.  This signal is received on potentially actionable level
+		 * changes, so that maybe_adjust_io_workers() will run.
+		 */
+	}
 	/* Process background worker state changes. */
 	if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
 	{
@@ -4355,8 +4374,9 @@ maybe_reap_io_worker(int pid)
 /*
  * Start or stop IO workers, to close the gap between the number of running
  * workers and the number of configured workers.  Used to respond to change of
- * the io_workers GUC (by increasing and decreasing the number of workers), as
- * well as workers terminating in response to errors (by starting
+ * the io_{min,max}_workers GUCs (by increasing and decreasing the number of
+ * workers) and requests to start a new one due to submission queue backlog,
+ * as well as workers terminating in response to errors (by starting
  * "replacement" workers).
  */
 static void
@@ -4385,8 +4405,16 @@ maybe_adjust_io_workers(void)
 
 	Assert(pmState < PM_WAIT_IO_WORKERS);
 
-	/* Not enough running? */
-	while (io_worker_count < io_workers)
+	/* Cancel the launch delay when it expires to minimize clock access. */
+	if (io_worker_launch_delay_until != 0 &&
+		io_worker_launch_delay_until <= GetCurrentTimestamp())
+		io_worker_launch_delay_until = 0;
+
+	/* Not enough workers running? */
+	while (io_worker_launch_delay_until == 0 &&
+		   io_worker_count < io_max_workers &&
+		   ((io_worker_count < io_min_workers ||
+			 pgaio_worker_clear_new_worker_needed())))
 	{
 		PMChild    *child;
 		int			i;
@@ -4400,6 +4428,16 @@ maybe_adjust_io_workers(void)
 		if (i == MAX_IO_WORKERS)
 			elog(ERROR, "could not find a free IO worker slot");
 
+		/*
+		 * Apply launch delay even for failures to avoid retrying too fast on
+		 * fork() failure, but not while we're still building the minimum pool
+		 * size.
+		 */
+		if (io_worker_count >= io_min_workers)
+			io_worker_launch_delay_until =
+				TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+											io_worker_launch_interval);
+
 		/* Try to launch one. */
 		child = StartChildProcess(B_IO_WORKER);
 		if (child != NULL)
@@ -4411,19 +4449,11 @@ maybe_adjust_io_workers(void)
 			break;				/* try again next time */
 	}
 
-	/* Too many running? */
-	if (io_worker_count > io_workers)
-	{
-		/* ask the IO worker in the highest slot to exit */
-		for (int i = MAX_IO_WORKERS - 1; i >= 0; --i)
-		{
-			if (io_worker_children[i] != NULL)
-			{
-				kill(io_worker_children[i]->pid, SIGUSR2);
-				break;
-			}
-		}
-	}
+	/*
+	 * If there are too many running because io_max_workers changed, that will
+	 * be handled by the IO workers themselves so they can shut down in
+	 * preferred order.
+	 */
 }
 
 
diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c
index c20d6d0f18b..78817bb4196 100644
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -11,9 +11,10 @@
  * infrastructure for reopening the file, and must processed synchronously by
  * the client code when submitted.
  *
- * So that the submitter can make just one system call when submitting a batch
- * of IOs, wakeups "fan out"; each woken IO worker can wake two more. XXX This
- * could be improved by using futexes instead of latches to wake N waiters.
+ * When a batch of IOs is submitted, the lowest numbered idle worker is woken
+ * up.  If it sees more work in the queue it wakes a peer to help, and so on
+ * in a chain.  When a backlog is detected, the pool size is increased.  When
+ * the highest numbered worker times out after a period of inactivity.
  *
  * This method of AIO is available in all builds on all operating systems, and
  * is the default.
@@ -40,16 +41,16 @@
 #include "storage/io_worker.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/memdebug.h"
 #include "utils/ps_status.h"
 #include "utils/wait_event.h"
 
-
-/* How many workers should each worker wake up if needed? */
-#define IO_WORKER_WAKEUP_FANOUT 2
-
+/* Saturation for stats counters used to estimate wakeup:work ratio. */
+#define PGAIO_WORKER_STATS_MAX 64
 
 typedef struct PgAioWorkerSubmissionQueue
 {
@@ -62,17 +63,25 @@ typedef struct PgAioWorkerSubmissionQueue
 
 typedef struct PgAioWorkerSlot
 {
-	Latch	   *latch;
-	bool		in_use;
+	ProcNumber	proc_number;
 } PgAioWorkerSlot;
 
 typedef struct PgAioWorkerControl
 {
+	/* Seen by postmaster */
+	volatile bool new_worker_needed;
+
+	/* Potected by AioWorkerSubmissionQueueLock. */
 	uint64		idle_worker_mask;
+
+	/* Protected by AioWorkerControlLock. */
+	uint64		worker_set;
+	int			nworkers;
+
+	/* Protected by AioWorkerControlLock. */
 	PgAioWorkerSlot workers[FLEXIBLE_ARRAY_MEMBER];
 } PgAioWorkerControl;
 
-
 static size_t pgaio_worker_shmem_size(void);
 static void pgaio_worker_shmem_init(bool first_time);
 
@@ -90,11 +99,14 @@ const IoMethodOps pgaio_worker_ops = {
 
 
 /* GUCs */
-int			io_workers = 3;
+int			io_min_workers = 1;
+int			io_max_workers = 8;
+int			io_worker_idle_timeout = 60000;
+int			io_worker_launch_interval = 500;
 
 
 static int	io_worker_queue_size = 64;
-static int	MyIoWorkerId;
+static int	MyIoWorkerId = -1;
 static PgAioWorkerSubmissionQueue *io_worker_submission_queue;
 static PgAioWorkerControl *io_worker_control;
 
@@ -151,36 +163,171 @@ pgaio_worker_shmem_init(bool first_time)
 						&found);
 	if (!found)
 	{
-		io_worker_control->idle_worker_mask = 0;
+		io_worker_control->new_worker_needed = false;
+		io_worker_control->worker_set = 0;
 		for (int i = 0; i < MAX_IO_WORKERS; ++i)
-		{
-			io_worker_control->workers[i].latch = NULL;
-			io_worker_control->workers[i].in_use = false;
-		}
+			io_worker_control->workers[i].proc_number = INVALID_PROC_NUMBER;
+	}
+}
+
+static void
+pgaio_worker_consider_new_worker(uint32 queue_depth)
+{
+	/*
+	 * This is called from sites that don't hold AioWorkerControlLock, but it
+	 * changes infrequently and an up to date value is not required for this
+	 * heuristic purpose.
+	 */
+	if (!io_worker_control->new_worker_needed &&
+		queue_depth >= io_worker_control->nworkers)
+	{
+		io_worker_control->new_worker_needed = true;
+		SendPostmasterSignal(PMSIGNAL_IO_WORKER_CHANGE);
 	}
 }
 
+/*
+ * Called by a worker when the queue is empty, to try to prevent a delayed
+ * reaction to a brief burst.  This races against the postmaster acting on the
+ * old value if it was recently set to true, but that's OK, the ordering would
+ * be indeterminate anyway even if we could use locks in the postmaster.
+ */
+static void
+pgaio_worker_cancel_new_worker(void)
+{
+	io_worker_control->new_worker_needed = false;
+}
+
+/*
+ * Called by the postmaster to check if a new worker is needed.
+ */
+bool
+pgaio_worker_test_new_worker_needed(void)
+{
+	return io_worker_control->new_worker_needed;
+}
+
+/*
+ * Called by the postmaster to check if a new worker is needed when it's ready
+ * to launch one, and clear the flag.
+ */
+bool
+pgaio_worker_clear_new_worker_needed(void)
+{
+	bool		result;
+
+	result = io_worker_control->new_worker_needed;
+	if (result)
+		io_worker_control->new_worker_needed = false;
+
+	return result;
+}
+
+static uint64
+pgaio_worker_mask(int worker)
+{
+	return UINT64_C(1) << worker;
+}
+
+static void
+pgaio_worker_add(uint64 *set, int worker)
+{
+	*set |= pgaio_worker_mask(worker);
+}
+
+static void
+pgaio_worker_remove(uint64 *set, int worker)
+{
+	*set &= ~pgaio_worker_mask(worker);
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+pgaio_worker_in(uint64 set, int worker)
+{
+	return (set & pgaio_worker_mask(worker)) != 0;
+}
+#endif
+
+static uint64
+pgaio_worker_highest(uint64 set)
+{
+	return pg_leftmost_one_pos64(set);
+}
+
+static uint64
+pgaio_worker_lowest(uint64 set)
+{
+	return pg_rightmost_one_pos64(set);
+}
+
+static int
+pgaio_worker_pop(uint64 *set)
+{
+	int			worker;
+
+	Assert(set != 0);
+	worker = pgaio_worker_lowest(*set);
+	pgaio_worker_remove(set, worker);
+	return worker;
+}
+
 static int
 pgaio_worker_choose_idle(void)
 {
+	uint64		idle_worker_mask;
 	int			worker;
 
-	if (io_worker_control->idle_worker_mask == 0)
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
+	/*
+	 * Workers only wake higher numbered workers, to try to encourage an
+	 * ordering of wakeup:work ratios, reducing spurious wakeups in lower
+	 * numbered workers.
+	 */
+	idle_worker_mask = io_worker_control->idle_worker_mask;
+	if (MyIoWorkerId != -1)
+		idle_worker_mask &= ~(pgaio_worker_mask(MyIoWorkerId) - 1);
+
+	if (idle_worker_mask == 0)
 		return -1;
 
 	/* Find the lowest bit position, and clear it. */
-	worker = pg_rightmost_one_pos64(io_worker_control->idle_worker_mask);
-	io_worker_control->idle_worker_mask &= ~(UINT64_C(1) << worker);
+	worker = pgaio_worker_lowest(idle_worker_mask);
+	pgaio_worker_remove(&io_worker_control->idle_worker_mask, worker);
 
 	return worker;
 }
 
+/*
+ * Try to wake a worker by setting its latch, to tell it there are IOs to
+ * process in the submission queue.
+ */
+static void
+pgaio_worker_wake(int worker)
+{
+	ProcNumber	proc_number;
+
+	/*
+	 * If the selected worker is concurrently exiting, then pgaio_worker_die()
+	 * had not yet removed it as of when we saw it in idle_worker_mask. That's
+	 * OK, because it will wake all remaining workers to close wakeup-vs-exit
+	 * races: *someone* will see the queued IO.  If there are no workers
+	 * running, the postmaster will start a new one.
+	 */
+	proc_number = io_worker_control->workers[worker].proc_number;
+	if (proc_number != INVALID_PROC_NUMBER)
+		SetLatch(&GetPGProcByNumber(proc_number)->procLatch);
+}
+
 static bool
 pgaio_worker_submission_queue_insert(PgAioHandle *ioh)
 {
 	PgAioWorkerSubmissionQueue *queue;
 	uint32		new_head;
 
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
 	queue = io_worker_submission_queue;
 	new_head = (queue->head + 1) & (queue->size - 1);
 	if (new_head == queue->tail)
@@ -202,6 +349,8 @@ pgaio_worker_submission_queue_consume(void)
 	PgAioWorkerSubmissionQueue *queue;
 	uint32		result;
 
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
 	queue = io_worker_submission_queue;
 	if (queue->tail == queue->head)
 		return UINT32_MAX;		/* empty */
@@ -218,6 +367,8 @@ pgaio_worker_submission_queue_depth(void)
 	uint32		head;
 	uint32		tail;
 
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
 	head = io_worker_submission_queue->head;
 	tail = io_worker_submission_queue->tail;
 
@@ -242,9 +393,9 @@ static void
 pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 {
 	PgAioHandle *synchronous_ios[PGAIO_SUBMIT_BATCH_SIZE];
+	uint32		queue_depth;
+	int			worker = -1;
 	int			nsync = 0;
-	Latch	   *wakeup = NULL;
-	int			worker;
 
 	Assert(num_staged_ios <= PGAIO_SUBMIT_BATCH_SIZE);
 
@@ -259,51 +410,48 @@ pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 			 * we can to workers, to maximize concurrency.
 			 */
 			synchronous_ios[nsync++] = staged_ios[i];
-			continue;
 		}
-
-		if (wakeup == NULL)
+		else if (worker == -1)
 		{
 			/* Choose an idle worker to wake up if we haven't already. */
 			worker = pgaio_worker_choose_idle();
-			if (worker >= 0)
-				wakeup = io_worker_control->workers[worker].latch;
 
 			pgaio_debug_io(DEBUG4, staged_ios[i],
 						   "choosing worker %d",
 						   worker);
 		}
 	}
+	queue_depth = pgaio_worker_submission_queue_depth();
 	LWLockRelease(AioWorkerSubmissionQueueLock);
 
-	if (wakeup)
-		SetLatch(wakeup);
+	if (worker != -1)
+		pgaio_worker_wake(worker);
+	else
+		pgaio_worker_consider_new_worker(queue_depth);
 
 	/* Run whatever is left synchronously. */
 	for (int i = 0; i < nsync; ++i)
 	{
-		wakeup = NULL;
-
 		/*
 		 * Between synchronous IO operations, try again to enqueue as many as
 		 * we can.
 		 */
 		if (i > 0)
 		{
-			wakeup = NULL;
+			worker = -1;
 
 			LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
 			while (i < nsync &&
 				   pgaio_worker_submission_queue_insert(synchronous_ios[i]))
 			{
-				if (wakeup == NULL && (worker = pgaio_worker_choose_idle()) >= 0)
-					wakeup = io_worker_control->workers[worker].latch;
+				if (worker == -1)
+					worker = pgaio_worker_choose_idle();
 				i++;
 			}
 			LWLockRelease(AioWorkerSubmissionQueueLock);
 
-			if (wakeup)
-				SetLatch(wakeup);
+			if (worker != -1)
+				pgaio_worker_wake(worker);
 
 			if (i == nsync)
 				break;
@@ -335,13 +483,27 @@ pgaio_worker_submit(uint16 num_staged_ios, PgAioHandle **staged_ios)
 static void
 pgaio_worker_die(int code, Datum arg)
 {
-	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
-	Assert(io_worker_control->workers[MyIoWorkerId].in_use);
-	Assert(io_worker_control->workers[MyIoWorkerId].latch == MyLatch);
+	uint64		notify_set;
 
-	io_worker_control->workers[MyIoWorkerId].in_use = false;
-	io_worker_control->workers[MyIoWorkerId].latch = NULL;
+	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
+	pgaio_worker_remove(&io_worker_control->idle_worker_mask, MyIoWorkerId);
 	LWLockRelease(AioWorkerSubmissionQueueLock);
+
+	LWLockAcquire(AioWorkerControlLock, LW_EXCLUSIVE);
+	Assert(io_worker_control->workers[MyIoWorkerId].proc_number == MyProcNumber);
+	io_worker_control->workers[MyIoWorkerId].proc_number = INVALID_PROC_NUMBER;
+	Assert(pgaio_worker_in(io_worker_control->worker_set, MyIoWorkerId));
+	pgaio_worker_remove(&io_worker_control->worker_set, MyIoWorkerId);
+	notify_set = io_worker_control->worker_set;
+	Assert(io_worker_control->nworkers > 0);
+	io_worker_control->nworkers--;
+	Assert(pg_popcount64(io_worker_control->worker_set) ==
+		   io_worker_control->nworkers);
+	LWLockRelease(AioWorkerControlLock);
+
+	/* Notify other workers on pool change. */
+	while (notify_set != 0)
+		pgaio_worker_wake(pgaio_worker_pop(&notify_set));
 }
 
 /*
@@ -351,33 +513,37 @@ pgaio_worker_die(int code, Datum arg)
 static void
 pgaio_worker_register(void)
 {
-	MyIoWorkerId = -1;
+	uint64		worker_set_inverted;
+	uint64		old_worker_set;
 
-	/*
-	 * XXX: This could do with more fine-grained locking. But it's also not
-	 * very common for the number of workers to change at the moment...
-	 */
-	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
+	MyIoWorkerId = -1;
 
-	for (int i = 0; i < MAX_IO_WORKERS; ++i)
+	LWLockAcquire(AioWorkerControlLock, LW_EXCLUSIVE);
+	worker_set_inverted = ~io_worker_control->worker_set;
+	if (worker_set_inverted != 0)
 	{
-		if (!io_worker_control->workers[i].in_use)
-		{
-			Assert(io_worker_control->workers[i].latch == NULL);
-			io_worker_control->workers[i].in_use = true;
-			MyIoWorkerId = i;
-			break;
-		}
-		else
-			Assert(io_worker_control->workers[i].latch != NULL);
+		MyIoWorkerId = pgaio_worker_lowest(worker_set_inverted);
+		if (MyIoWorkerId >= MAX_IO_WORKERS)
+			MyIoWorkerId = -1;
 	}
-
 	if (MyIoWorkerId == -1)
 		elog(ERROR, "couldn't find a free worker slot");
 
-	io_worker_control->idle_worker_mask |= (UINT64_C(1) << MyIoWorkerId);
-	io_worker_control->workers[MyIoWorkerId].latch = MyLatch;
-	LWLockRelease(AioWorkerSubmissionQueueLock);
+	Assert(io_worker_control->workers[MyIoWorkerId].proc_number ==
+		   INVALID_PROC_NUMBER);
+	io_worker_control->workers[MyIoWorkerId].proc_number = MyProcNumber;
+
+	old_worker_set = io_worker_control->worker_set;
+	Assert(!pgaio_worker_in(old_worker_set, MyIoWorkerId));
+	pgaio_worker_add(&io_worker_control->worker_set, MyIoWorkerId);
+	io_worker_control->nworkers++;
+	Assert(pg_popcount64(io_worker_control->worker_set) ==
+		   io_worker_control->nworkers);
+	LWLockRelease(AioWorkerControlLock);
+
+	/* Notify other workers on pool change. */
+	while (old_worker_set != 0)
+		pgaio_worker_wake(pgaio_worker_pop(&old_worker_set));
 
 	on_shmem_exit(pgaio_worker_die, 0);
 }
@@ -403,14 +569,47 @@ pgaio_worker_error_callback(void *arg)
 	errcontext("I/O worker executing I/O on behalf of process %d", owner_pid);
 }
 
+/*
+ * Check if this backend is allowed to time out, and thus should use a
+ * non-infinite sleep time.  Only the highest-numbered worker is allowed to
+ * time out, and only if the pool is above io_min_workers.  Serializing
+ * timeouts keeps IDs in a range 0..N without gaps, and avoids undershooting
+ * io_min_workers.
+ *
+ * The result is only instantaneously true and may be temporarily inconsistent
+ * in different workers around transitions, but all workers are woken up on
+ * pool size or GUC changes making the result eventually consistent.
+ */
+static bool
+pgaio_worker_can_timeout(void)
+{
+	uint64		worker_set;
+
+	/* Serialize against pool sized changes. */
+	LWLockAcquire(AioWorkerControlLock, LW_SHARED);
+	worker_set = io_worker_control->worker_set;
+	LWLockRelease(AioWorkerControlLock);
+
+	if (MyIoWorkerId != pgaio_worker_highest(worker_set))
+		return false;
+	if (MyIoWorkerId < io_min_workers)
+		return false;
+
+	return true;
+}
+
 void
 IoWorkerMain(const void *startup_data, size_t startup_data_len)
 {
 	sigjmp_buf	local_sigjmp_buf;
+	TimestampTz idle_timeout_abs = 0;
+	int			timeout_guc_used = 0;
 	PgAioHandle *volatile error_ioh = NULL;
 	ErrorContextCallback errcallback = {0};
 	volatile int error_errno = 0;
 	char		cmd[128];
+	int			ios = 0;
+	int			wakeups = 0;
 
 	MyBackendType = B_IO_WORKER;
 	AuxiliaryProcessMainCommon();
@@ -479,47 +678,53 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 	while (!ShutdownRequestPending)
 	{
 		uint32		io_index;
-		Latch	   *latches[IO_WORKER_WAKEUP_FANOUT];
-		int			nlatches = 0;
-		int			nwakeups = 0;
-		int			worker;
+		uint32		queue_depth;
+		int			worker = -1;
 
 		/* Try to get a job to do. */
 		LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
-		if ((io_index = pgaio_worker_submission_queue_consume()) == UINT32_MAX)
+		io_index = pgaio_worker_submission_queue_consume();
+		queue_depth = pgaio_worker_submission_queue_depth();
+		if (io_index == UINT32_MAX)
 		{
-			/*
-			 * Nothing to do.  Mark self idle.
-			 *
-			 * XXX: Invent some kind of back pressure to reduce useless
-			 * wakeups?
-			 */
-			io_worker_control->idle_worker_mask |= (UINT64_C(1) << MyIoWorkerId);
+			/* Nothing to do.  Mark self idle. */
+			pgaio_worker_add(&io_worker_control->idle_worker_mask,
+							 MyIoWorkerId);
 		}
 		else
 		{
 			/* Got one.  Clear idle flag. */
-			io_worker_control->idle_worker_mask &= ~(UINT64_C(1) << MyIoWorkerId);
+			pgaio_worker_remove(&io_worker_control->idle_worker_mask,
+								MyIoWorkerId);
 
-			/* See if we can wake up some peers. */
-			nwakeups = Min(pgaio_worker_submission_queue_depth(),
-						   IO_WORKER_WAKEUP_FANOUT);
-			for (int i = 0; i < nwakeups; ++i)
-			{
-				if ((worker = pgaio_worker_choose_idle()) < 0)
-					break;
-				latches[nlatches++] = io_worker_control->workers[worker].latch;
-			}
+			/*
+			 * See if we should wake up a peer.  Only do this if this worker
+			 * is not experiencing spurious wakeups itself, to end a chain of
+			 * wasted scheduling.
+			 */
+			if (queue_depth > 0 && wakeups <= ios)
+				worker = pgaio_worker_choose_idle();
 		}
 		LWLockRelease(AioWorkerSubmissionQueueLock);
 
-		for (int i = 0; i < nlatches; ++i)
-			SetLatch(latches[i]);
+		/* Propagate wakeups. */
+		if (worker != -1)
+			pgaio_worker_wake(worker);
+		else if (wakeups <= ios)
+			pgaio_worker_consider_new_worker(queue_depth);
 
 		if (io_index != UINT32_MAX)
 		{
 			PgAioHandle *ioh = NULL;
 
+			/* Cancel timeout and update wakeup:work ratio. */
+			idle_timeout_abs = 0;
+			if (++ios == PGAIO_WORKER_STATS_MAX)
+			{
+				ios /= 2;
+				wakeups /= 2;
+			}
+
 			ioh = &pgaio_ctl->io_handles[io_index];
 			error_ioh = ioh;
 			errcallback.arg = ioh;
@@ -585,12 +790,83 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 		}
 		else
 		{
-			WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
-					  WAIT_EVENT_IO_WORKER_MAIN);
+			int			timeout_ms;
+
+			/* Cancel new worker if pending. */
+			pgaio_worker_cancel_new_worker();
+
+			/* Compute the remaining allowed idle time. */
+			if (io_worker_idle_timeout == -1)
+			{
+				/* Never time out. */
+				timeout_ms = -1;
+			}
+			else
+			{
+				TimestampTz now = GetCurrentTimestamp();
+
+				/* If the GUC changes, reset timer. */
+				if (idle_timeout_abs != 0 &&
+					io_worker_idle_timeout != timeout_guc_used)
+					idle_timeout_abs = 0;
+
+				/* On first sleep, compute absolute timeout. */
+				if (idle_timeout_abs == 0)
+				{
+					idle_timeout_abs =
+						TimestampTzPlusMilliseconds(now,
+													io_worker_idle_timeout);
+					timeout_guc_used = io_worker_idle_timeout;
+				}
+
+				/*
+				 * All workers maintain the absolute timeout value, but only
+				 * the highest worker can actually time out and only if
+				 * io_min_workers is exceeded.  All others wait only for
+				 * explicit wakeups caused by queue insertion, wakeup
+				 * propagation, change of pool size (possibly making them
+				 * highest), or GUC reload.
+				 */
+				if (pgaio_worker_can_timeout())
+					timeout_ms =
+						TimestampDifferenceMilliseconds(now,
+														idle_timeout_abs);
+				else
+					timeout_ms = -1;
+			}
+
+			if (WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+						  timeout_ms,
+						  WAIT_EVENT_IO_WORKER_MAIN) == WL_TIMEOUT)
+			{
+				/* WL_TIMEOUT */
+				if (pgaio_worker_can_timeout())
+					if (GetCurrentTimestamp() >= idle_timeout_abs)
+						break;
+			}
+			else
+			{
+				/* WL_LATCH_SET */
+				if (++wakeups == PGAIO_WORKER_STATS_MAX)
+				{
+					ios /= 2;
+					wakeups /= 2;
+				}
+			}
 			ResetLatch(MyLatch);
 		}
 
 		CHECK_FOR_INTERRUPTS();
+
+		if (ConfigReloadPending)
+		{
+			ConfigReloadPending = false;
+			ProcessConfigFile(PGC_SIGHUP);
+
+			/* If io_max_workers has been decreased, exit highest first. */
+			if (MyIoWorkerId >= io_max_workers)
+				break;
+		}
 	}
 
 	error_context_stack = errcallback.previous;
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 930321905f1..067a3a1bb21 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -353,6 +353,7 @@ DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
 AioWorkerSubmissionQueue	"Waiting to access AIO worker submission queue."
+AioWorkerControl	"Waiting to update AIO worker information."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 60b12446a1c..bbb8855b12d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3306,14 +3306,52 @@ struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"io_workers",
+		{"io_max_workers",
 			PGC_SIGHUP,
 			RESOURCES_IO,
-			gettext_noop("Number of IO worker processes, for io_method=worker."),
+			gettext_noop("Maximum number of IO worker processes, for io_method=worker."),
 			NULL,
 		},
-		&io_workers,
-		3, 1, MAX_IO_WORKERS,
+		&io_max_workers,
+		8, 1, MAX_IO_WORKERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"io_min_workers",
+			PGC_SIGHUP,
+			RESOURCES_IO,
+			gettext_noop("Minimum number of IO worker processes, for io_method=worker."),
+			NULL,
+		},
+		&io_min_workers,
+		1, 1, MAX_IO_WORKERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"io_worker_idle_timeout",
+			PGC_SIGHUP,
+			RESOURCES_IO,
+			gettext_noop("Maximum idle time before IO workers exit, for io_method=worker."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&io_worker_idle_timeout,
+		60 * 1000, -1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"io_worker_launch_interval",
+			PGC_SIGHUP,
+			RESOURCES_IO,
+			gettext_noop("Maximum idle time between launching IO workers, for io_method=worker."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&io_worker_launch_interval,
+		500, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 34826d01380..4370f673821 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -214,7 +214,10 @@
 					# can execute simultaneously
 					# -1 sets based on shared_buffers
 					# (change requires restart)
-#io_workers = 3				# 1-32;
+#io_min_workers = 1			# 1-32;
+#io_max_workers = 8			# 1-32;
+#io_worker_idle_timeout = 60s		# min 100ms
+#io_worker_launch_interval = 500ms	# min 0ms
 
 # - Worker Processes -
 
diff --git a/src/include/storage/io_worker.h b/src/include/storage/io_worker.h
index 7bde7e89c8a..de9c80109e0 100644
--- a/src/include/storage/io_worker.h
+++ b/src/include/storage/io_worker.h
@@ -17,6 +17,13 @@
 
 pg_noreturn extern void IoWorkerMain(const void *startup_data, size_t startup_data_len);
 
-extern PGDLLIMPORT int io_workers;
+extern PGDLLIMPORT int io_min_workers;
+extern PGDLLIMPORT int io_max_workers;
+extern PGDLLIMPORT int io_worker_idle_timeout;
+extern PGDLLIMPORT int io_worker_launch_interval;
+
+/* Interfaces visible to the postmaster. */
+extern bool pgaio_worker_test_new_worker_needed(void);
+extern bool pgaio_worker_clear_new_worker_needed(void);
 
 #endif							/* IO_WORKER_H */
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index a9681738146..c1801d08833 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
 PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, AioWorkerControl)
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 67fa9ac06e1..10a967f6739 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -38,6 +38,7 @@ typedef enum
 	PMSIGNAL_ROTATE_LOGFILE,	/* send SIGUSR1 to syslogger to rotate logfile */
 	PMSIGNAL_START_AUTOVAC_LAUNCHER,	/* start an autovacuum launcher */
 	PMSIGNAL_START_AUTOVAC_WORKER,	/* start an autovacuum worker */
+	PMSIGNAL_IO_WORKER_CHANGE,	/* IO worker pool change */
 	PMSIGNAL_BACKGROUND_WORKER_CHANGE,	/* background worker state change */
 	PMSIGNAL_START_WALRECEIVER, /* start a walreceiver */
 	PMSIGNAL_ADVANCE_STATE_MACHINE, /* advance postmaster's state machine */
diff --git a/src/test/modules/test_aio/t/002_io_workers.pl b/src/test/modules/test_aio/t/002_io_workers.pl
index af5fae15ea7..a0252857798 100644
--- a/src/test/modules/test_aio/t/002_io_workers.pl
+++ b/src/test/modules/test_aio/t/002_io_workers.pl
@@ -14,6 +14,9 @@ $node->init();
 $node->append_conf(
 	'postgresql.conf', qq(
 io_method=worker
+io_worker_idle_timeout=0ms
+io_worker_launch_interval=0ms
+io_max_workers=32
 ));
 
 $node->start();
@@ -31,7 +34,7 @@ sub test_number_of_io_workers_dynamic
 {
 	my $node = shift;
 
-	my $prev_worker_count = $node->safe_psql('postgres', 'SHOW io_workers');
+	my $prev_worker_count = $node->safe_psql('postgres', 'SHOW io_min_workers');
 
 	# Verify that worker count can't be set to 0
 	change_number_of_io_workers($node, 0, $prev_worker_count, 1);
@@ -62,23 +65,23 @@ sub change_number_of_io_workers
 	my ($result, $stdout, $stderr);
 
 	($result, $stdout, $stderr) =
-	  $node->psql('postgres', "ALTER SYSTEM SET io_workers = $worker_count");
+	  $node->psql('postgres', "ALTER SYSTEM SET io_min_workers = $worker_count");
 	$node->safe_psql('postgres', 'SELECT pg_reload_conf()');
 
 	if ($expect_failure)
 	{
 		ok( $stderr =~
-			  /$worker_count is outside the valid range for parameter "io_workers"/,
-			"updating number of io_workers to $worker_count failed, as expected"
+			  /$worker_count is outside the valid range for parameter "io_min_workers"/,
+			"updating number of io_min_workers to $worker_count failed, as expected"
 		);
 
 		return $prev_worker_count;
 	}
 	else
 	{
-		is( $node->safe_psql('postgres', 'SHOW io_workers'),
+		is( $node->safe_psql('postgres', 'SHOW io_min_workers'),
 			$worker_count,
-			"updating number of io_workers from $prev_worker_count to $worker_count"
+			"updating number of io_min_workers from $prev_worker_count to $worker_count"
 		);
 
 		check_io_worker_count($node, $worker_count);
-- 
2.39.5

0005-XXX-read_buffer_loop.patchtext/x-patch; charset=US-ASCII; name=0005-XXX-read_buffer_loop.patchDownload

From 43fea48f5f6e9b3301a0216f0402b2558862d632 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 5 Apr 2025 11:14:26 +1300
Subject: [PATCH 5/5] XXX read_buffer_loop

select read_buffer_loop(n) with different values of n in each
session to test latency of reading one block.
---
 src/test/modules/test_aio/test_aio--1.0.sql |  4 ++
 src/test/modules/test_aio/test_aio.c        | 59 +++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/src/test/modules/test_aio/test_aio--1.0.sql b/src/test/modules/test_aio/test_aio--1.0.sql
index e495481c41e..c37b38afcb0 100644
--- a/src/test/modules/test_aio/test_aio--1.0.sql
+++ b/src/test/modules/test_aio/test_aio--1.0.sql
@@ -106,3 +106,7 @@ AS 'MODULE_PATHNAME' LANGUAGE C;
 CREATE FUNCTION inj_io_reopen_detach()
 RETURNS pg_catalog.void STRICT
 AS 'MODULE_PATHNAME' LANGUAGE C;
+
+CREATE FUNCTION read_buffer_loop(block int)
+RETURNS pg_catalog.void STRICT
+AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c
index 1d776010ef4..2654302a13c 100644
--- a/src/test/modules/test_aio/test_aio.c
+++ b/src/test/modules/test_aio/test_aio.c
@@ -18,6 +18,8 @@
 
 #include "postgres.h"
 
+#include <math.h>
+
 #include "access/relation.h"
 #include "fmgr.h"
 #include "storage/aio.h"
@@ -27,6 +29,7 @@
 #include "storage/checksum.h"
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
+#include "storage/read_stream.h"
 #include "utils/builtins.h"
 #include "utils/injection_point.h"
 #include "utils/rel.h"
@@ -806,3 +809,59 @@ inj_io_reopen_detach(PG_FUNCTION_ARGS)
 #endif
 	PG_RETURN_VOID();
 }
+
+static BlockNumber
+zero_callback(ReadStream *stream, void *user_data, void *pbd)
+{
+	return *(BlockNumber *) user_data;
+}
+
+PG_FUNCTION_INFO_V1(read_buffer_loop);
+Datum
+read_buffer_loop(PG_FUNCTION_ARGS)
+{
+	BlockNumber block = PG_GETARG_UINT32(0);
+	Relation	rel;
+	ReadStream *stream;
+	Buffer		buffer;
+	TimestampTz start;
+
+	rel = relation_open(TypeRelationId, AccessShareLock);
+	stream = read_stream_begin_relation(0, NULL, rel, MAIN_FORKNUM, zero_callback, &block, 0);
+	for (int loop = 0; loop < 10; loop++)
+	{
+		double		samples[25000];
+		double		avg = 0;
+		double		sum = 0;
+		double		var = 0;
+		double		dev;
+		double		stddev;
+
+		for (int i = 0; i < lengthof(samples); ++i)
+		{
+			bool flushed;
+
+			start = GetCurrentTimestamp();
+			buffer = read_stream_next_buffer(stream, NULL);
+			samples[i] = GetCurrentTimestamp() - start;
+			sum += samples[i];
+
+			ReleaseBuffer(buffer);
+			read_stream_reset(stream);
+			EvictUnpinnedBuffer(buffer, &flushed);
+		}
+		avg = sum / lengthof(samples);
+		for (int i = 0; i < lengthof(samples); i++)
+		{
+			dev = samples[i] - avg;
+			var += dev * dev;
+		}
+		stddev = sqrt(var / lengthof(samples));
+
+		elog(NOTICE, "n = %zu, avg = %.1fus, stddev = %.1f", lengthof(samples), avg, stddev);
+	}
+	read_stream_end(stream);
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_VOID();
+}
-- 
2.39.5

Jose Luis Tallon

jltallon@adv-solutions.net

9 months ago

In reply to: Thomas Munro (#1)

Re: Automatically sizing the IO worker pool

On 12/4/25 18:59, Thomas Munro wrote:

It's hard to know how to set io_workers=3.

Hmmm.... enable the below behaviour if "io_workers=auto" (default) ?

Sometimes being able to set this kind of parameters manually helps
tremendously with specific workloads... :S

[snip]
Here's a patch to replace that GUC with:

io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms

Great as defaults / backwards compat with io_workers=auto. Sounds more
user-friendly to me, at least....

[snip]

Ideas, testing, flames etc welcome.

Logic seems sound, if a bit daunting for inexperienced users --- well,
maybe just a bit more than it is now, but ISTM evolution should try and
flatten novices' learning curve, right?

Just .02€, though.

Thanks,

--
Parkinson's Law: Work expands to fill the time alloted to it.

Thomas Munro

thomas.munro@gmail.com

9 months ago

In reply to: Jose Luis Tallon (#2)

Re: Automatically sizing the IO worker pool

On Mon, Apr 14, 2025 at 5:45 AM Jose Luis Tallon
<jltallon@adv-solutions.net> wrote:

On 12/4/25 18:59, Thomas Munro wrote:

It's hard to know how to set io_workers=3.

Hmmm.... enable the below behaviour if "io_workers=auto" (default) ?

Why not just delete io_workers? If you really want a fixed number,
you can set io_min_workers==io_max_workers.

What should io_max_workers default to? I guess it could be pretty
large without much danger, but I'm not sure. If it's a small value,
an overloaded storage system goes through two stages: first it fills
the queue up with a backlog of requests until it overflows because the
configured maximum of workers isn't keeping up, and then new
submissions start falling back to synchronous IO, sort of jumping
ahead of the queued backlog, but also stalling if the real reason is
that the storage itself isn't keeping up. Whether it'd be better for
the IO worker pool to balloon all the way up to 32 processes (an
internal limit) if required to try to avoid that with default
settings, I'm not entirely sure. Maybe? Why not at least try to get
all the concurrency possible, before falling back to synchronous?
Queued but not running IOs seem to be strictly worse than queued but
not even trying to run. I'd be interested to hear people's thoughts
and experiences actually trying different kinds of workloads on
different kinds of storage. Whether adding more concurrency actually
helps or just generates a lot of useless new processes before the
backpressure kicks in depends on why it's not keeping up, eg hitting
IOPS, throughput or concurrency limits in the storage. In later work
I hope we can make higher levels smarter about understanding whether
requesting more concurrency helps or hurts with feedback (that's quite
a hard problem that some of my colleagues have been looking into), but
the simpler question here seems to be: should this fairly low level
system-wide setting ship with a default that includes any preconceived
assumptions about that?

It's superficially like max_parallel_workers, which ships with a
default of 8, and that's basically where I plucked that 8 from in the
current patch for lack of a serious idea to propose yet. But it's
also more complex than CPU: you know how many cores you have and you
know things about your workload, but even really small "on the metal"
systems probably have a lot more concurrent I/O capacity -- perhaps
depending on the type of operation! (and so far we only have reads) --
than CPU cores. Especially once you completely abandon the idea that
anyone runs databases on spinning rust in modern times, even on low
end systems, which I think we've more or less agreed to assume these
days with related changes such as the recent *_io_concurrency default
change (1->16). It's actually pretty hard to drive a laptop up to
needing more half a dozen or a dozen or a dozen or so workers with
this patch for especially without debug_io_direct=data ie with fast
double-buffered I/O, but cloud environments may also be where most
databases run these days, and low end cloud configurations have
arbitrary made up limits that may be pretty low, so it all depends....
I really don't know, but one idea is that we could leave it open as
possible, and let users worry about that with higher-level settings
and the query concurrency they choose to generate...
io_method=io_uring is effectively open, so why should io_method=worker
be any different by default? Just some thoughts. I'm not sure.

Dmitry Dolgov

9erthalion6@gmail.com

8 months ago

In reply to: Thomas Munro (#1)

Re: Automatically sizing the IO worker pool

On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote:
It's hard to know how to set io_workers=3. If it's too small,
io_method=worker's small submission queue overflows and it silently
falls back to synchronous IO. If it's too high, it generates a lot of
pointless wakeups and scheduling overhead, which might be considered
an independent problem or not, but having the right size pool
certainly mitigates it. Here's a patch to replace that GUC with:

io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms

It grows the pool when a backlog is detected (better ideas for this
logic welcome), and lets idle workers time out.

I like the idea. In fact, I've been pondering about something like a
"smart" configuration for quite some time, and convinced that a similar
approach needs to be applied to many performance-related GUCs.

Idle timeout and launch interval serving as a measure of sensitivity
makes sense to me, growing the pool when a backlog (queue_depth >
nworkers, so even a slightest backlog?) is detected seems to be somewhat
arbitrary. From what I understand the pool growing velocity is constant
and do not depend on the worker demand (i.e. queue_depth)? It may sounds
fancy, but I've got an impression it should be possible to apply what's
called a "low-pass filter" in the control theory (sort of a transfer
function with an exponential decay) to smooth out the demand and adjust
the worker pool based on that.

As a side note, it might be far fetched, but there are instruments in
queueing theory to figure out how much workers are needed to guarantee a
certain low queueing probability, but for that one needs to have an
average arrival rate (in our case, average number of IO operations
dispatched to workers) and an average service rate (average number of IO
operations performed by workers).

wenhui qiu

qiuwenhuifx@gmail.com

8 months ago

In reply to: Dmitry Dolgov (#4)

Re: Automatically sizing the IO worker pool

On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote:
It's hard to know how to set io_workers=3. If it's too small,
io_method=worker's small submission queue overflows and it silently
falls back to synchronous IO. If it's too high, it generates a lot of
pointless wakeups and scheduling overhead, which might be considered
an independent problem or not, but having the right size pool
certainly mitigates it. Here's a patch to replace that GUC with:

io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms

It grows the pool when a backlog is detected (better ideas for this
logic welcome), and lets idle workers time out.

I also like idea ,can we set a
io_workers= 3
io_max_workers= cpu/4
io_workers_oversubscribe = 3 (range 1-8）
io_workers * io_workers_oversubscribe <=io_max_workers

On Sun, May 25, 2025 at 3:20 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Show quoted text

On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote:
It's hard to know how to set io_workers=3. If it's too small,
io_method=worker's small submission queue overflows and it silently
falls back to synchronous IO. If it's too high, it generates a lot of
pointless wakeups and scheduling overhead, which might be considered
an independent problem or not, but having the right size pool
certainly mitigates it. Here's a patch to replace that GUC with:

io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms

It grows the pool when a backlog is detected (better ideas for this
logic welcome), and lets idle workers time out.

I like the idea. In fact, I've been pondering about something like a
"smart" configuration for quite some time, and convinced that a similar
approach needs to be applied to many performance-related GUCs.

Idle timeout and launch interval serving as a measure of sensitivity
makes sense to me, growing the pool when a backlog (queue_depth >
nworkers, so even a slightest backlog?) is detected seems to be somewhat
arbitrary. From what I understand the pool growing velocity is constant
and do not depend on the worker demand (i.e. queue_depth)? It may sounds
fancy, but I've got an impression it should be possible to apply what's
called a "low-pass filter" in the control theory (sort of a transfer
function with an exponential decay) to smooth out the demand and adjust
the worker pool based on that.

As a side note, it might be far fetched, but there are instruments in
queueing theory to figure out how much workers are needed to guarantee a
certain low queueing probability, but for that one needs to have an
average arrival rate (in our case, average number of IO operations
dispatched to workers) and an average service rate (average number of IO
operations performed by workers).

Thomas Munro

thomas.munro@gmail.com

8 months ago

In reply to: Dmitry Dolgov (#4)

Re: Automatically sizing the IO worker pool

On Sun, May 25, 2025 at 7:20 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote:
It's hard to know how to set io_workers=3. If it's too small,
io_method=worker's small submission queue overflows and it silently
falls back to synchronous IO. If it's too high, it generates a lot of
pointless wakeups and scheduling overhead, which might be considered
an independent problem or not, but having the right size pool
certainly mitigates it. Here's a patch to replace that GUC with:

io_min_workers=1
io_max_workers=8
io_worker_idle_timeout=60s
io_worker_launch_interval=500ms

It grows the pool when a backlog is detected (better ideas for this
logic welcome), and lets idle workers time out.

I like the idea. In fact, I've been pondering about something like a
"smart" configuration for quite some time, and convinced that a similar
approach needs to be applied to many performance-related GUCs.

Idle timeout and launch interval serving as a measure of sensitivity
makes sense to me, growing the pool when a backlog (queue_depth >
nworkers, so even a slightest backlog?) is detected seems to be somewhat
arbitrary. From what I understand the pool growing velocity is constant
and do not depend on the worker demand (i.e. queue_depth)? It may sounds
fancy, but I've got an impression it should be possible to apply what's
called a "low-pass filter" in the control theory (sort of a transfer
function with an exponential decay) to smooth out the demand and adjust
the worker pool based on that.

As a side note, it might be far fetched, but there are instruments in
queueing theory to figure out how much workers are needed to guarantee a
certain low queueing probability, but for that one needs to have an
average arrival rate (in our case, average number of IO operations
dispatched to workers) and an average service rate (average number of IO
operations performed by workers).

Hi Dmitry,

Thanks for looking, and yeah these are definitely the right sort of
questions. I will be both unsurprised and delighted if someone can
bring some more science to this problem. I did read up on Erlang's
formula C ("This formula is used to determine the number of agents or
customer service representatives needed to staff a call centre, for a
specified desired probability of queuing" according to Wikipedia) and
a bunch of related textbook stuff. And yeah I had a bunch of
exponential moving averages of various values using scaled fixed point
arithmetic (just a bunch of shifts and adds) to smooth inputs, in
various attempts. But ... I'm not even sure if we can say that our
I/O arrivals have a Poisson distribution, since they are not all
independent. I tried more things too, while I was still unsure what I
should even be optimising for. My current answer to that is: low
latency with low variance, as seen with io_uring.

In this version I went back to basics and built something that looks
more like the controls of a classic process/thread pool (think Apache)
or connection pool (think JDBC), with a couple of additions based on
intuition: (1) a launch interval, which acts as a bit of damping
against overshooting on brief bursts that are too far apart, and (2)
the queue length > workers * k as a simple way to determine that
latency is being introduced by not having enough workers. Perhaps
there is a good way to compute an adaptive value for k with some fancy
theories, but k=1 seems to have *some* basis: that's the lowest number
which the pool is too small and *certainly* introducing latency, but
any lower constant is harder to defend because we don't know how many
workers are already awake and about to consume tasks. Something from
queuing theory might provide an adaptive value, but in the end, I
figured we really just want to know if the queue is growing ie in
danger of overflowing (note: the queue is small! 64, and not
currently changeable, more on that later, and the overflow behaviour
is synchronous I/O as back-pressure). You seem to be suggesting that
k=1 sounds too low, not too high, but there is that separate
time-based defence against overshoot in response to rare bursts.

You could get more certainty about jobs already about to be consumed
by a worker that is about to dequeue, by doing a lot more book
keeping: assigning them to workers on submission (separate states,
separate queues, various other ideas I guess). But everything I tried
like that caused latency or latency variance to go up, because it
missed out on the chance for another worker to pick it up sooner
opportunistically. This arrangement has the most stable and
predictable pool size and lowest avg latency and stddev(latency) I
have managed to come up with so far. That said, we have plenty of
time to experiment with better ideas if you want to give it a shot or
propose concrete ideas, given that I missed v18 :-)

About control theory... yeah. That's an interesting bag of tricks.
FWIW Melanie and (more recently) I have looked into textbook control
algorithms at a higher level of the I/O stack (and Melanie gave a talk
about other applications in eg VACUUM at pgconf.dev). In
read_stream.c, where I/O demand is created, we've been trying to set
the desired I/O concurrency level and thus lookahead distance with
adaptive feedback. We've tried a lot of stuff. I hope we can share
some concept patches some time soon, well, maybe in this cycle. Some
interesting recent experiments produced graphs that look a lot like
the ones in the book "Feedback Control for Computer Systems" (an easy
software-person book I found for people without an engineering/control
theory background where the problems match our world more closely, cf
typical texts that are about controlling motors and other mechanical
stuff...). Experimental goals are: find the the smallest concurrent
I/O request level (and thus lookahead distance and thus speculative
work done and buffers pinned) that keeps the I/O stall probability
near zero (and keep adapting, since other queries and applications are
sharing system I/O queues), and if that's not even possible, find the
highest concurrent I/O request level that doesn't cause extra latency
due to queuing in lower levels (I/O workers, kernel, ..., disks).
That second part is quite hard. In other words, if higher levels own
that problem and bring the adaptivity, then perhaps io_method=worker
can get away with being quite stupid. Just a thought...

Thomas Munro

thomas.munro@gmail.com

8 months ago

In reply to: Thomas Munro (#6)

Re: Automatically sizing the IO worker pool

BTW I would like to push 0001 and 0002 to master/18. They are are not
behaviour changes, they just fix up a bunch of inconsistent (0001) and
misleading (0002) variable naming and comments to reflect reality (in
AIO v1 the postmaster used to assign those I/O worker numbers, now the
postmaster has its own array of slots to track them that is *not*
aligned with the ID numbers/slots in shared memory ie those numbers
you see in the ps status, so that's bound to confuse people
maintaining this code). I just happened to notice that when working
on this dynamic worker pool stuff. If there are no objections I will
go ahead and do that soon.

Dmitry Dolgov

9erthalion6@gmail.com

8 months ago

In reply to: Thomas Munro (#6)

Re: Automatically sizing the IO worker pool

On Mon, May 26, 2025, 8:01 AM Thomas Munro <thomas.munro@gmail.com> wrote:

But ... I'm not even sure if we can say that our
I/O arrivals have a Poisson distribution, since they are not all
independent.

Yeah, a good point, one have to be careful with assumptions about
distribution -- from what I've read many processes in computer systems are
better described by a Pareto. But the beauty of the queuing theory is that
many results are independent from the distribution (not sure about
dependencies though).

In this version I went back to basics and built something that looks

more like the controls of a classic process/thread pool (think Apache)
or connection pool (think JDBC), with a couple of additions based on
intuition: (1) a launch interval, which acts as a bit of damping
against overshooting on brief bursts that are too far apart, and (2)
the queue length > workers * k as a simple way to determine that
latency is being introduced by not having enough workers. Perhaps
there is a good way to compute an adaptive value for k with some fancy
theories, but k=1 seems to have *some* basis: that's the lowest number
which the pool is too small and *certainly* introducing latency, but
any lower constant is harder to defend because we don't know how many
workers are already awake and about to consume tasks. Something from
queuing theory might provide an adaptive value, but in the end, I
figured we really just want to know if the queue is growing ie in
danger of overflowing (note: the queue is small! 64, and not
currently changeable, more on that later, and the overflow behaviour
is synchronous I/O as back-pressure). You seem to be suggesting that
k=1 sounds too low, not too high, but there is that separate
time-based defence against overshoot in response to rare bursts.

I probably had to start with a statement that I find the current approach
reasonable, and I'm only curious if there is more to get out of it. I
haven't benchmarked the patch yet (plan getting to it when I'll get back),
and can imagine practical considerations significantly impacting any
potential solution.

About control theory... yeah. That's an interesting bag of tricks.

FWIW Melanie and (more recently) I have looked into textbook control
algorithms at a higher level of the I/O stack (and Melanie gave a talk
about other applications in eg VACUUM at pgconf.dev). In
read_stream.c, where I/O demand is created, we've been trying to set
the desired I/O concurrency level and thus lookahead distance with
adaptive feedback. We've tried a lot of stuff. I hope we can share
some concept patches some time soon, well, maybe in this cycle. Some
interesting recent experiments produced graphs that look a lot like
the ones in the book "Feedback Control for Computer Systems" (an easy
software-person book I found for people without an engineering/control
theory background where the problems match our world more closely, cf
typical texts that are about controlling motors and other mechanical
stuff...). Experimental goals are: find the the smallest concurrent
I/O request level (and thus lookahead distance and thus speculative
work done and buffers pinned) that keeps the I/O stall probability
near zero (and keep adapting, since other queries and applications are
sharing system I/O queues), and if that's not even possible, find the
highest concurrent I/O request level that doesn't cause extra latency
due to queuing in lower levels (I/O workers, kernel, ..., disks).
That second part is quite hard. In other words, if higher levels own
that problem and bring the adaptivity, then perhaps io_method=worker
can get away with being quite stupid. Just a thought...

Looking forward to it. And thanks for the reminder about the talk, wanted
to watch it already long time ago, but somehow didn't managed yet.

Show quoted text

Thomas Munro

thomas.munro@gmail.com

6 months ago

In reply to: Dmitry Dolgov (#8)

2 attachment(s)

Re: Automatically sizing the IO worker pool

On Wed, May 28, 2025 at 5:55 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

I probably had to start with a statement that I find the current approach reasonable, and I'm only curious if there is more to get out of it. I haven't benchmarked the patch yet (plan getting to it when I'll get back), and can imagine practical considerations significantly impacting any potential solution.

Here's a rebase.

Attachments:

v2-0001-aio-Try-repeatedly-to-give-batched-IOs-to-workers.patchtext/x-patch; charset=US-ASCII; name=v2-0001-aio-Try-repeatedly-to-give-batched-IOs-to-workers.patchDownload

From fa7aac1bc9c0a47fbdbd9459424f08fa61b71ce2 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 11 Apr 2025 21:17:26 +1200
Subject: [PATCH v2 1/2] aio: Try repeatedly to give batched IOs to workers.

Previously, when the submission queue was full we'd run all remaining
IOs in a batched submissoin synchronously.  Andres rightly pointed out
that we should really try again between synchronous IOs, since the
workers might have made progress in draining the queue.

Suggested-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com
---
 src/backend/storage/aio/method_worker.c | 30 ++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c
index bf8f77e6ff6..9a82d5f847d 100644
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -282,12 +282,36 @@ pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 		SetLatch(wakeup);
 
 	/* Run whatever is left synchronously. */
-	if (nsync > 0)
+	for (int i = 0; i < nsync; ++i)
 	{
-		for (int i = 0; i < nsync; ++i)
+		wakeup = NULL;
+
+		/*
+		 * Between synchronous IO operations, try again to enqueue as many as
+		 * we can.
+		 */
+		if (i > 0)
 		{
-			pgaio_io_perform_synchronously(synchronous_ios[i]);
+			wakeup = NULL;
+
+			LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
+			while (i < nsync &&
+				   pgaio_worker_submission_queue_insert(synchronous_ios[i]))
+			{
+				if (wakeup == NULL && (worker = pgaio_worker_choose_idle()) >= 0)
+					wakeup = io_worker_control->workers[worker].latch;
+				i++;
+			}
+			LWLockRelease(AioWorkerSubmissionQueueLock);
+
+			if (wakeup)
+				SetLatch(wakeup);
+
+			if (i == nsync)
+				break;
 		}
+
+		pgaio_io_perform_synchronously(synchronous_ios[i]);
 	}
 }
 
-- 
2.47.2

v2-0002-aio-Adjust-IO-worker-pool-size-automatically.patchtext/x-patch; charset=US-ASCII; name=v2-0002-aio-Adjust-IO-worker-pool-size-automatically.patchDownload

From a0a5fff1f1d21c002bf68d36de9aff21bdf61783 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 22 Mar 2025 00:36:49 +1300
Subject: [PATCH v2 2/2] aio: Adjust IO worker pool size automatically.

Replace the simple io_workers setting with:

  io_min_workers=1
  io_max_workers=8
  io_worker_idle_timeout=60s
  io_worker_launch_interval=500ms

The pool is automatically sized within the configured range according
to demand.

Discussion: https://postgr.es/m/CA%2BhUKG%2Bm4xV0LMoH2c%3DoRAdEXuCnh%2BtGBTWa7uFeFMGgTLAw%2BQ%40mail.gmail.com
---
 doc/src/sgml/config.sgml                      |  70 ++-
 src/backend/postmaster/postmaster.c           |  64 ++-
 src/backend/storage/aio/method_worker.c       | 445 ++++++++++++++----
 .../utils/activity/wait_event_names.txt       |   1 +
 src/backend/utils/misc/guc_tables.c           |  46 +-
 src/backend/utils/misc/postgresql.conf.sample |   5 +-
 src/include/storage/io_worker.h               |   9 +-
 src/include/storage/lwlocklist.h              |   1 +
 src/include/storage/pmsignal.h                |   1 +
 src/test/modules/test_aio/t/002_io_workers.pl |  15 +-
 10 files changed, 535 insertions(+), 122 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c7acc0f182f..98532e55041 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2787,16 +2787,76 @@ include_dir 'conf.d'
        </listitem>
       </varlistentry>
 
-      <varlistentry id="guc-io-workers" xreflabel="io_workers">
-       <term><varname>io_workers</varname> (<type>integer</type>)
+      <varlistentry id="guc-io-min-workers" xreflabel="io_min_workers">
+       <term><varname>io_min_workers</varname> (<type>integer</type>)
        <indexterm>
-        <primary><varname>io_workers</varname> configuration parameter</primary>
+        <primary><varname>io_min_workers</varname> configuration parameter</primary>
        </indexterm>
        </term>
        <listitem>
         <para>
-         Selects the number of I/O worker processes to use. The default is
-         3. This parameter can only be set in the
+         Sets the minimum number of I/O worker processes to use. The default is
+         1. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command
+         line.
+        </para>
+        <para>
+         Only has an effect if <xref linkend="guc-io-method"/> is set to
+         <literal>worker</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry id="guc-io-max-workers" xreflabel="io_max_workers">
+       <term><varname>io_max_workers</varname> (<type>int</type>)
+       <indexterm>
+        <primary><varname>io_max_workers</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of I/O worker processes to use. The default is
+         8. This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command
+         line.
+        </para>
+        <para>
+         Only has an effect if <xref linkend="guc-io-method"/> is set to
+         <literal>worker</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry id="guc-io-worker-idle-timeout" xreflabel="io_worker_idle_timeout">
+       <term><varname>io_worker_idle_timeout</varname> (<type>int</type>)
+       <indexterm>
+        <primary><varname>io_worker_idle_timeout</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the time after which idle I/O worker processes will exit, reducing the
+         maximum size of the I/O worker pool towards the minimum.  The default
+         is 1 minute.
+         This parameter can only be set in the
+         <filename>postgresql.conf</filename> file or on the server command
+         line.
+        </para>
+        <para>
+         Only has an effect if <xref linkend="guc-io-method"/> is set to
+         <literal>worker</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+      <varlistentry id="guc-io-worker-launch-interval" xreflabel="io_worker_launch_interval">
+       <term><varname>io_worker_launch_interval</varname> (<type>int</type>)
+       <indexterm>
+        <primary><varname>io_worker_launch_interval</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the minimum time between launching new I/O workers.  This can be used to avoid
+         sudden bursts of new I/O workers.  The default is 100ms.
+         This parameter can only be set in the
          <filename>postgresql.conf</filename> file or on the server command
          line.
         </para>
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index cca9b946e53..a5438fa079d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -408,6 +408,7 @@ static DNSServiceRef bonjour_sdref = NULL;
 #endif
 
 /* State for IO worker management. */
+static TimestampTz io_worker_launch_delay_until = 0;
 static int	io_worker_count = 0;
 static PMChild *io_worker_children[MAX_IO_WORKERS];
 
@@ -1569,6 +1570,15 @@ DetermineSleepTime(void)
 	if (StartWorkerNeeded)
 		return 0;
 
+	/* If we need a new IO worker, defer until launch delay expires. */
+	if (pgaio_worker_test_new_worker_needed() &&
+		io_worker_count < io_max_workers)
+	{
+		if (io_worker_launch_delay_until == 0)
+			return 0;
+		next_wakeup = io_worker_launch_delay_until;
+	}
+
 	if (HaveCrashedWorker)
 	{
 		dlist_mutable_iter iter;
@@ -3750,6 +3760,15 @@ process_pm_pmsignal(void)
 		StartWorkerNeeded = true;
 	}
 
+	/* Process IO worker start requets. */
+	if (CheckPostmasterSignal(PMSIGNAL_IO_WORKER_CHANGE))
+	{
+		/*
+		 * No local flag, as the state is exposed through pgaio_worker_*()
+		 * functions.  This signal is received on potentially actionable level
+		 * changes, so that maybe_adjust_io_workers() will run.
+		 */
+	}
 	/* Process background worker state changes. */
 	if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
 	{
@@ -4355,8 +4374,9 @@ maybe_reap_io_worker(int pid)
 /*
  * Start or stop IO workers, to close the gap between the number of running
  * workers and the number of configured workers.  Used to respond to change of
- * the io_workers GUC (by increasing and decreasing the number of workers), as
- * well as workers terminating in response to errors (by starting
+ * the io_{min,max}_workers GUCs (by increasing and decreasing the number of
+ * workers) and requests to start a new one due to submission queue backlog,
+ * as well as workers terminating in response to errors (by starting
  * "replacement" workers).
  */
 static void
@@ -4385,8 +4405,16 @@ maybe_adjust_io_workers(void)
 
 	Assert(pmState < PM_WAIT_IO_WORKERS);
 
-	/* Not enough running? */
-	while (io_worker_count < io_workers)
+	/* Cancel the launch delay when it expires to minimize clock access. */
+	if (io_worker_launch_delay_until != 0 &&
+		io_worker_launch_delay_until <= GetCurrentTimestamp())
+		io_worker_launch_delay_until = 0;
+
+	/* Not enough workers running? */
+	while (io_worker_launch_delay_until == 0 &&
+		   io_worker_count < io_max_workers &&
+		   ((io_worker_count < io_min_workers ||
+			 pgaio_worker_clear_new_worker_needed())))
 	{
 		PMChild    *child;
 		int			i;
@@ -4400,6 +4428,16 @@ maybe_adjust_io_workers(void)
 		if (i == MAX_IO_WORKERS)
 			elog(ERROR, "could not find a free IO worker slot");
 
+		/*
+		 * Apply launch delay even for failures to avoid retrying too fast on
+		 * fork() failure, but not while we're still building the minimum pool
+		 * size.
+		 */
+		if (io_worker_count >= io_min_workers)
+			io_worker_launch_delay_until =
+				TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
+											io_worker_launch_interval);
+
 		/* Try to launch one. */
 		child = StartChildProcess(B_IO_WORKER);
 		if (child != NULL)
@@ -4411,19 +4449,11 @@ maybe_adjust_io_workers(void)
 			break;				/* try again next time */
 	}
 
-	/* Too many running? */
-	if (io_worker_count > io_workers)
-	{
-		/* ask the IO worker in the highest slot to exit */
-		for (int i = MAX_IO_WORKERS - 1; i >= 0; --i)
-		{
-			if (io_worker_children[i] != NULL)
-			{
-				kill(io_worker_children[i]->pid, SIGUSR2);
-				break;
-			}
-		}
-	}
+	/*
+	 * If there are too many running because io_max_workers changed, that will
+	 * be handled by the IO workers themselves so they can shut down in
+	 * preferred order.
+	 */
 }
 
 
diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c
index 9a82d5f847d..6d3f5289e18 100644
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -11,9 +11,10 @@
  * infrastructure for reopening the file, and must processed synchronously by
  * the client code when submitted.
  *
- * So that the submitter can make just one system call when submitting a batch
- * of IOs, wakeups "fan out"; each woken IO worker can wake two more. XXX This
- * could be improved by using futexes instead of latches to wake N waiters.
+ * When a batch of IOs is submitted, the lowest numbered idle worker is woken
+ * up.  If it sees more work in the queue it wakes a peer to help, and so on
+ * in a chain.  When a backlog is detected, the pool size is increased.  When
+ * the highest numbered worker times out after a period of inactivity.
  *
  * This method of AIO is available in all builds on all operating systems, and
  * is the default.
@@ -40,6 +41,8 @@
 #include "storage/io_worker.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
+#include "storage/lwlock.h"
+#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
 #include "utils/injection_point.h"
@@ -47,10 +50,8 @@
 #include "utils/ps_status.h"
 #include "utils/wait_event.h"
 
-
-/* How many workers should each worker wake up if needed? */
-#define IO_WORKER_WAKEUP_FANOUT 2
-
+/* Saturation for stats counters used to estimate wakeup:work ratio. */
+#define PGAIO_WORKER_STATS_MAX 64
 
 typedef struct PgAioWorkerSubmissionQueue
 {
@@ -63,17 +64,25 @@ typedef struct PgAioWorkerSubmissionQueue
 
 typedef struct PgAioWorkerSlot
 {
-	Latch	   *latch;
-	bool		in_use;
+	ProcNumber	proc_number;
 } PgAioWorkerSlot;
 
 typedef struct PgAioWorkerControl
 {
+	/* Seen by postmaster */
+	volatile bool new_worker_needed;
+
+	/* Potected by AioWorkerSubmissionQueueLock. */
 	uint64		idle_worker_mask;
+
+	/* Protected by AioWorkerControlLock. */
+	uint64		worker_set;
+	int			nworkers;
+
+	/* Protected by AioWorkerControlLock. */
 	PgAioWorkerSlot workers[FLEXIBLE_ARRAY_MEMBER];
 } PgAioWorkerControl;
 
-
 static size_t pgaio_worker_shmem_size(void);
 static void pgaio_worker_shmem_init(bool first_time);
 
@@ -91,11 +100,14 @@ const IoMethodOps pgaio_worker_ops = {
 
 
 /* GUCs */
-int			io_workers = 3;
+int			io_min_workers = 1;
+int			io_max_workers = 8;
+int			io_worker_idle_timeout = 60000;
+int			io_worker_launch_interval = 500;
 
 
 static int	io_worker_queue_size = 64;
-static int	MyIoWorkerId;
+static int	MyIoWorkerId = -1;
 static PgAioWorkerSubmissionQueue *io_worker_submission_queue;
 static PgAioWorkerControl *io_worker_control;
 
@@ -152,37 +164,172 @@ pgaio_worker_shmem_init(bool first_time)
 						&found);
 	if (!found)
 	{
+		io_worker_control->new_worker_needed = false;
+		io_worker_control->worker_set = 0;
 		io_worker_control->idle_worker_mask = 0;
 		for (int i = 0; i < MAX_IO_WORKERS; ++i)
-		{
-			io_worker_control->workers[i].latch = NULL;
-			io_worker_control->workers[i].in_use = false;
-		}
+			io_worker_control->workers[i].proc_number = INVALID_PROC_NUMBER;
 	}
 }
 
+static void
+pgaio_worker_consider_new_worker(uint32 queue_depth)
+{
+	/*
+	 * This is called from sites that don't hold AioWorkerControlLock, but it
+	 * changes infrequently and an up to date value is not required for this
+	 * heuristic purpose.
+	 */
+	if (!io_worker_control->new_worker_needed &&
+		queue_depth >= io_worker_control->nworkers)
+	{
+		io_worker_control->new_worker_needed = true;
+		SendPostmasterSignal(PMSIGNAL_IO_WORKER_CHANGE);
+	}
+}
+
+/*
+ * Called by a worker when the queue is empty, to try to prevent a delayed
+ * reaction to a brief burst.  This races against the postmaster acting on the
+ * old value if it was recently set to true, but that's OK, the ordering would
+ * be indeterminate anyway even if we could use locks in the postmaster.
+ */
+static void
+pgaio_worker_cancel_new_worker(void)
+{
+	io_worker_control->new_worker_needed = false;
+}
+
+/*
+ * Called by the postmaster to check if a new worker is needed.
+ */
+bool
+pgaio_worker_test_new_worker_needed(void)
+{
+	return io_worker_control->new_worker_needed;
+}
+
+/*
+ * Called by the postmaster to check if a new worker is needed when it's ready
+ * to launch one, and clear the flag.
+ */
+bool
+pgaio_worker_clear_new_worker_needed(void)
+{
+	bool		result;
+
+	result = io_worker_control->new_worker_needed;
+	if (result)
+		io_worker_control->new_worker_needed = false;
+
+	return result;
+}
+
+static uint64
+pgaio_worker_mask(int worker)
+{
+	return UINT64_C(1) << worker;
+}
+
+static void
+pgaio_worker_add(uint64 *set, int worker)
+{
+	*set |= pgaio_worker_mask(worker);
+}
+
+static void
+pgaio_worker_remove(uint64 *set, int worker)
+{
+	*set &= ~pgaio_worker_mask(worker);
+}
+
+#ifdef USE_ASSERT_CHECKING
+static bool
+pgaio_worker_in(uint64 set, int worker)
+{
+	return (set & pgaio_worker_mask(worker)) != 0;
+}
+#endif
+
+static uint64
+pgaio_worker_highest(uint64 set)
+{
+	return pg_leftmost_one_pos64(set);
+}
+
+static uint64
+pgaio_worker_lowest(uint64 set)
+{
+	return pg_rightmost_one_pos64(set);
+}
+
+static int
+pgaio_worker_pop(uint64 *set)
+{
+	int			worker;
+
+	Assert(set != 0);
+	worker = pgaio_worker_lowest(*set);
+	pgaio_worker_remove(set, worker);
+	return worker;
+}
+
 static int
 pgaio_worker_choose_idle(void)
 {
+	uint64		idle_worker_mask;
 	int			worker;
 
-	if (io_worker_control->idle_worker_mask == 0)
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
+	/*
+	 * Workers only wake higher numbered workers, to try to encourage an
+	 * ordering of wakeup:work ratios, reducing spurious wakeups in lower
+	 * numbered workers.
+	 */
+	idle_worker_mask = io_worker_control->idle_worker_mask;
+	if (MyIoWorkerId != -1)
+		idle_worker_mask &= ~(pgaio_worker_mask(MyIoWorkerId) - 1);
+
+	if (idle_worker_mask == 0)
 		return -1;
 
 	/* Find the lowest bit position, and clear it. */
-	worker = pg_rightmost_one_pos64(io_worker_control->idle_worker_mask);
-	io_worker_control->idle_worker_mask &= ~(UINT64_C(1) << worker);
-	Assert(io_worker_control->workers[worker].in_use);
+	worker = pgaio_worker_lowest(idle_worker_mask);
+	pgaio_worker_remove(&io_worker_control->idle_worker_mask, worker);
 
 	return worker;
 }
 
+/*
+ * Try to wake a worker by setting its latch, to tell it there are IOs to
+ * process in the submission queue.
+ */
+static void
+pgaio_worker_wake(int worker)
+{
+	ProcNumber	proc_number;
+
+	/*
+	 * If the selected worker is concurrently exiting, then pgaio_worker_die()
+	 * had not yet removed it as of when we saw it in idle_worker_mask. That's
+	 * OK, because it will wake all remaining workers to close wakeup-vs-exit
+	 * races: *someone* will see the queued IO.  If there are no workers
+	 * running, the postmaster will start a new one.
+	 */
+	proc_number = io_worker_control->workers[worker].proc_number;
+	if (proc_number != INVALID_PROC_NUMBER)
+		SetLatch(&GetPGProcByNumber(proc_number)->procLatch);
+}
+
 static bool
 pgaio_worker_submission_queue_insert(PgAioHandle *ioh)
 {
 	PgAioWorkerSubmissionQueue *queue;
 	uint32		new_head;
 
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
 	queue = io_worker_submission_queue;
 	new_head = (queue->head + 1) & (queue->size - 1);
 	if (new_head == queue->tail)
@@ -204,6 +351,8 @@ pgaio_worker_submission_queue_consume(void)
 	PgAioWorkerSubmissionQueue *queue;
 	uint32		result;
 
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
 	queue = io_worker_submission_queue;
 	if (queue->tail == queue->head)
 		return UINT32_MAX;		/* empty */
@@ -220,6 +369,8 @@ pgaio_worker_submission_queue_depth(void)
 	uint32		head;
 	uint32		tail;
 
+	Assert(LWLockHeldByMeInMode(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE));
+
 	head = io_worker_submission_queue->head;
 	tail = io_worker_submission_queue->tail;
 
@@ -244,9 +395,9 @@ static void
 pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 {
 	PgAioHandle *synchronous_ios[PGAIO_SUBMIT_BATCH_SIZE];
+	uint32		queue_depth;
+	int			worker = -1;
 	int			nsync = 0;
-	Latch	   *wakeup = NULL;
-	int			worker;
 
 	Assert(num_staged_ios <= PGAIO_SUBMIT_BATCH_SIZE);
 
@@ -261,51 +412,48 @@ pgaio_worker_submit_internal(int num_staged_ios, PgAioHandle **staged_ios)
 			 * we can to workers, to maximize concurrency.
 			 */
 			synchronous_ios[nsync++] = staged_ios[i];
-			continue;
 		}
-
-		if (wakeup == NULL)
+		else if (worker == -1)
 		{
 			/* Choose an idle worker to wake up if we haven't already. */
 			worker = pgaio_worker_choose_idle();
-			if (worker >= 0)
-				wakeup = io_worker_control->workers[worker].latch;
 
 			pgaio_debug_io(DEBUG4, staged_ios[i],
 						   "choosing worker %d",
 						   worker);
 		}
 	}
+	queue_depth = pgaio_worker_submission_queue_depth();
 	LWLockRelease(AioWorkerSubmissionQueueLock);
 
-	if (wakeup)
-		SetLatch(wakeup);
+	if (worker != -1)
+		pgaio_worker_wake(worker);
+	else
+		pgaio_worker_consider_new_worker(queue_depth);
 
 	/* Run whatever is left synchronously. */
 	for (int i = 0; i < nsync; ++i)
 	{
-		wakeup = NULL;
-
 		/*
 		 * Between synchronous IO operations, try again to enqueue as many as
 		 * we can.
 		 */
 		if (i > 0)
 		{
-			wakeup = NULL;
+			worker = -1;
 
 			LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
 			while (i < nsync &&
 				   pgaio_worker_submission_queue_insert(synchronous_ios[i]))
 			{
-				if (wakeup == NULL && (worker = pgaio_worker_choose_idle()) >= 0)
-					wakeup = io_worker_control->workers[worker].latch;
+				if (worker == -1)
+					worker = pgaio_worker_choose_idle();
 				i++;
 			}
 			LWLockRelease(AioWorkerSubmissionQueueLock);
 
-			if (wakeup)
-				SetLatch(wakeup);
+			if (worker != -1)
+				pgaio_worker_wake(worker);
 
 			if (i == nsync)
 				break;
@@ -337,14 +485,27 @@ pgaio_worker_submit(uint16 num_staged_ios, PgAioHandle **staged_ios)
 static void
 pgaio_worker_die(int code, Datum arg)
 {
-	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
-	Assert(io_worker_control->workers[MyIoWorkerId].in_use);
-	Assert(io_worker_control->workers[MyIoWorkerId].latch == MyLatch);
+	uint64		notify_set;
 
-	io_worker_control->idle_worker_mask &= ~(UINT64_C(1) << MyIoWorkerId);
-	io_worker_control->workers[MyIoWorkerId].in_use = false;
-	io_worker_control->workers[MyIoWorkerId].latch = NULL;
+	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
+	pgaio_worker_remove(&io_worker_control->idle_worker_mask, MyIoWorkerId);
 	LWLockRelease(AioWorkerSubmissionQueueLock);
+
+	LWLockAcquire(AioWorkerControlLock, LW_EXCLUSIVE);
+	Assert(io_worker_control->workers[MyIoWorkerId].proc_number == MyProcNumber);
+	io_worker_control->workers[MyIoWorkerId].proc_number = INVALID_PROC_NUMBER;
+	Assert(pgaio_worker_in(io_worker_control->worker_set, MyIoWorkerId));
+	pgaio_worker_remove(&io_worker_control->worker_set, MyIoWorkerId);
+	notify_set = io_worker_control->worker_set;
+	Assert(io_worker_control->nworkers > 0);
+	io_worker_control->nworkers--;
+	Assert(pg_popcount64(io_worker_control->worker_set) ==
+		   io_worker_control->nworkers);
+	LWLockRelease(AioWorkerControlLock);
+
+	/* Notify other workers on pool change. */
+	while (notify_set != 0)
+		pgaio_worker_wake(pgaio_worker_pop(&notify_set));
 }
 
 /*
@@ -354,33 +515,37 @@ pgaio_worker_die(int code, Datum arg)
 static void
 pgaio_worker_register(void)
 {
-	MyIoWorkerId = -1;
+	uint64		worker_set_inverted;
+	uint64		old_worker_set;
 
-	/*
-	 * XXX: This could do with more fine-grained locking. But it's also not
-	 * very common for the number of workers to change at the moment...
-	 */
-	LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
+	MyIoWorkerId = -1;
 
-	for (int i = 0; i < MAX_IO_WORKERS; ++i)
+	LWLockAcquire(AioWorkerControlLock, LW_EXCLUSIVE);
+	worker_set_inverted = ~io_worker_control->worker_set;
+	if (worker_set_inverted != 0)
 	{
-		if (!io_worker_control->workers[i].in_use)
-		{
-			Assert(io_worker_control->workers[i].latch == NULL);
-			io_worker_control->workers[i].in_use = true;
-			MyIoWorkerId = i;
-			break;
-		}
-		else
-			Assert(io_worker_control->workers[i].latch != NULL);
+		MyIoWorkerId = pgaio_worker_lowest(worker_set_inverted);
+		if (MyIoWorkerId >= MAX_IO_WORKERS)
+			MyIoWorkerId = -1;
 	}
-
 	if (MyIoWorkerId == -1)
 		elog(ERROR, "couldn't find a free worker slot");
 
-	io_worker_control->idle_worker_mask |= (UINT64_C(1) << MyIoWorkerId);
-	io_worker_control->workers[MyIoWorkerId].latch = MyLatch;
-	LWLockRelease(AioWorkerSubmissionQueueLock);
+	Assert(io_worker_control->workers[MyIoWorkerId].proc_number ==
+		   INVALID_PROC_NUMBER);
+	io_worker_control->workers[MyIoWorkerId].proc_number = MyProcNumber;
+
+	old_worker_set = io_worker_control->worker_set;
+	Assert(!pgaio_worker_in(old_worker_set, MyIoWorkerId));
+	pgaio_worker_add(&io_worker_control->worker_set, MyIoWorkerId);
+	io_worker_control->nworkers++;
+	Assert(pg_popcount64(io_worker_control->worker_set) ==
+		   io_worker_control->nworkers);
+	LWLockRelease(AioWorkerControlLock);
+
+	/* Notify other workers on pool change. */
+	while (old_worker_set != 0)
+		pgaio_worker_wake(pgaio_worker_pop(&old_worker_set));
 
 	on_shmem_exit(pgaio_worker_die, 0);
 }
@@ -406,14 +571,47 @@ pgaio_worker_error_callback(void *arg)
 	errcontext("I/O worker executing I/O on behalf of process %d", owner_pid);
 }
 
+/*
+ * Check if this backend is allowed to time out, and thus should use a
+ * non-infinite sleep time.  Only the highest-numbered worker is allowed to
+ * time out, and only if the pool is above io_min_workers.  Serializing
+ * timeouts keeps IDs in a range 0..N without gaps, and avoids undershooting
+ * io_min_workers.
+ *
+ * The result is only instantaneously true and may be temporarily inconsistent
+ * in different workers around transitions, but all workers are woken up on
+ * pool size or GUC changes making the result eventually consistent.
+ */
+static bool
+pgaio_worker_can_timeout(void)
+{
+	uint64		worker_set;
+
+	/* Serialize against pool sized changes. */
+	LWLockAcquire(AioWorkerControlLock, LW_SHARED);
+	worker_set = io_worker_control->worker_set;
+	LWLockRelease(AioWorkerControlLock);
+
+	if (MyIoWorkerId != pgaio_worker_highest(worker_set))
+		return false;
+	if (MyIoWorkerId < io_min_workers)
+		return false;
+
+	return true;
+}
+
 void
 IoWorkerMain(const void *startup_data, size_t startup_data_len)
 {
 	sigjmp_buf	local_sigjmp_buf;
+	TimestampTz idle_timeout_abs = 0;
+	int			timeout_guc_used = 0;
 	PgAioHandle *volatile error_ioh = NULL;
 	ErrorContextCallback errcallback = {0};
 	volatile int error_errno = 0;
 	char		cmd[128];
+	int			ios = 0;
+	int			wakeups = 0;
 
 	MyBackendType = B_IO_WORKER;
 	AuxiliaryProcessMainCommon();
@@ -482,10 +680,8 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 	while (!ShutdownRequestPending)
 	{
 		uint32		io_index;
-		Latch	   *latches[IO_WORKER_WAKEUP_FANOUT];
-		int			nlatches = 0;
-		int			nwakeups = 0;
-		int			worker;
+		uint32		queue_depth;
+		int			worker = -1;
 
 		/*
 		 * Try to get a job to do.
@@ -494,40 +690,48 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 		 * to ensure that we don't see an outdated data in the handle.
 		 */
 		LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
-		if ((io_index = pgaio_worker_submission_queue_consume()) == UINT32_MAX)
+		io_index = pgaio_worker_submission_queue_consume();
+		queue_depth = pgaio_worker_submission_queue_depth();
+		if (io_index == UINT32_MAX)
 		{
-			/*
-			 * Nothing to do.  Mark self idle.
-			 *
-			 * XXX: Invent some kind of back pressure to reduce useless
-			 * wakeups?
-			 */
-			io_worker_control->idle_worker_mask |= (UINT64_C(1) << MyIoWorkerId);
+			/* Nothing to do.  Mark self idle. */
+			pgaio_worker_add(&io_worker_control->idle_worker_mask,
+							 MyIoWorkerId);
 		}
 		else
 		{
 			/* Got one.  Clear idle flag. */
-			io_worker_control->idle_worker_mask &= ~(UINT64_C(1) << MyIoWorkerId);
+			pgaio_worker_remove(&io_worker_control->idle_worker_mask,
+								MyIoWorkerId);
 
-			/* See if we can wake up some peers. */
-			nwakeups = Min(pgaio_worker_submission_queue_depth(),
-						   IO_WORKER_WAKEUP_FANOUT);
-			for (int i = 0; i < nwakeups; ++i)
-			{
-				if ((worker = pgaio_worker_choose_idle()) < 0)
-					break;
-				latches[nlatches++] = io_worker_control->workers[worker].latch;
-			}
+			/*
+			 * See if we should wake up a peer.  Only do this if this worker
+			 * is not experiencing spurious wakeups itself, to end a chain of
+			 * wasted scheduling.
+			 */
+			if (queue_depth > 0 && wakeups <= ios)
+				worker = pgaio_worker_choose_idle();
 		}
 		LWLockRelease(AioWorkerSubmissionQueueLock);
 
-		for (int i = 0; i < nlatches; ++i)
-			SetLatch(latches[i]);
+		/* Propagate wakeups. */
+		if (worker != -1)
+			pgaio_worker_wake(worker);
+		else if (wakeups <= ios)
+			pgaio_worker_consider_new_worker(queue_depth);
 
 		if (io_index != UINT32_MAX)
 		{
 			PgAioHandle *ioh = NULL;
 
+			/* Cancel timeout and update wakeup:work ratio. */
+			idle_timeout_abs = 0;
+			if (++ios == PGAIO_WORKER_STATS_MAX)
+			{
+				ios /= 2;
+				wakeups /= 2;
+			}
+
 			ioh = &pgaio_ctl->io_handles[io_index];
 			error_ioh = ioh;
 			errcallback.arg = ioh;
@@ -593,8 +797,69 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 		}
 		else
 		{
-			WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, -1,
-					  WAIT_EVENT_IO_WORKER_MAIN);
+			int			timeout_ms;
+
+			/* Cancel new worker if pending. */
+			pgaio_worker_cancel_new_worker();
+
+			/* Compute the remaining allowed idle time. */
+			if (io_worker_idle_timeout == -1)
+			{
+				/* Never time out. */
+				timeout_ms = -1;
+			}
+			else
+			{
+				TimestampTz now = GetCurrentTimestamp();
+
+				/* If the GUC changes, reset timer. */
+				if (idle_timeout_abs != 0 &&
+					io_worker_idle_timeout != timeout_guc_used)
+					idle_timeout_abs = 0;
+
+				/* On first sleep, compute absolute timeout. */
+				if (idle_timeout_abs == 0)
+				{
+					idle_timeout_abs =
+						TimestampTzPlusMilliseconds(now,
+													io_worker_idle_timeout);
+					timeout_guc_used = io_worker_idle_timeout;
+				}
+
+				/*
+				 * All workers maintain the absolute timeout value, but only
+				 * the highest worker can actually time out and only if
+				 * io_min_workers is exceeded.  All others wait only for
+				 * explicit wakeups caused by queue insertion, wakeup
+				 * propagation, change of pool size (possibly making them
+				 * highest), or GUC reload.
+				 */
+				if (pgaio_worker_can_timeout())
+					timeout_ms =
+						TimestampDifferenceMilliseconds(now,
+														idle_timeout_abs);
+				else
+					timeout_ms = -1;
+			}
+
+			if (WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+						  timeout_ms,
+						  WAIT_EVENT_IO_WORKER_MAIN) == WL_TIMEOUT)
+			{
+				/* WL_TIMEOUT */
+				if (pgaio_worker_can_timeout())
+					if (GetCurrentTimestamp() >= idle_timeout_abs)
+						break;
+			}
+			else
+			{
+				/* WL_LATCH_SET */
+				if (++wakeups == PGAIO_WORKER_STATS_MAX)
+				{
+					ios /= 2;
+					wakeups /= 2;
+				}
+			}
 			ResetLatch(MyLatch);
 		}
 
@@ -604,6 +869,10 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
 		{
 			ConfigReloadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
+
+			/* If io_max_workers has been decreased, exit highest first. */
+			if (MyIoWorkerId >= io_max_workers)
+				break;
 		}
 	}
 
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4da68312b5f..c6c8107fe33 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -352,6 +352,7 @@ DSMRegistry	"Waiting to read or update the dynamic shared memory registry."
 InjectionPoint	"Waiting to read or update information related to injection points."
 SerialControl	"Waiting to read or update shared <filename>pg_serial</filename> state."
 AioWorkerSubmissionQueue	"Waiting to access AIO worker submission queue."
+AioWorkerControl	"Waiting to update AIO worker information."
 
 #
 # END OF PREDEFINED LWLOCKS (DO NOT CHANGE THIS LINE)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..ecb16facb67 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3306,14 +3306,52 @@ struct config_int ConfigureNamesInt[] =
 	},
 
 	{
-		{"io_workers",
+		{"io_max_workers",
 			PGC_SIGHUP,
 			RESOURCES_IO,
-			gettext_noop("Number of IO worker processes, for io_method=worker."),
+			gettext_noop("Maximum number of IO worker processes, for io_method=worker."),
 			NULL,
 		},
-		&io_workers,
-		3, 1, MAX_IO_WORKERS,
+		&io_max_workers,
+		8, 1, MAX_IO_WORKERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"io_min_workers",
+			PGC_SIGHUP,
+			RESOURCES_IO,
+			gettext_noop("Minimum number of IO worker processes, for io_method=worker."),
+			NULL,
+		},
+		&io_min_workers,
+		1, 1, MAX_IO_WORKERS,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"io_worker_idle_timeout",
+			PGC_SIGHUP,
+			RESOURCES_IO,
+			gettext_noop("Maximum idle time before IO workers exit, for io_method=worker."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&io_worker_idle_timeout,
+		60 * 1000, -1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"io_worker_launch_interval",
+			PGC_SIGHUP,
+			RESOURCES_IO,
+			gettext_noop("Maximum idle time between launching IO workers, for io_method=worker."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&io_worker_launch_interval,
+		500, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..1da6345ad7a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -210,7 +210,10 @@
 					# can execute simultaneously
 					# -1 sets based on shared_buffers
 					# (change requires restart)
-#io_workers = 3				# 1-32;
+#io_min_workers = 1			# 1-32;
+#io_max_workers = 8			# 1-32;
+#io_worker_idle_timeout = 60s		# min 100ms
+#io_worker_launch_interval = 500ms	# min 0ms
 
 # - Worker Processes -
 
diff --git a/src/include/storage/io_worker.h b/src/include/storage/io_worker.h
index 7bde7e89c8a..de9c80109e0 100644
--- a/src/include/storage/io_worker.h
+++ b/src/include/storage/io_worker.h
@@ -17,6 +17,13 @@
 
 pg_noreturn extern void IoWorkerMain(const void *startup_data, size_t startup_data_len);
 
-extern PGDLLIMPORT int io_workers;
+extern PGDLLIMPORT int io_min_workers;
+extern PGDLLIMPORT int io_max_workers;
+extern PGDLLIMPORT int io_worker_idle_timeout;
+extern PGDLLIMPORT int io_worker_launch_interval;
+
+/* Interfaces visible to the postmaster. */
+extern bool pgaio_worker_test_new_worker_needed(void);
+extern bool pgaio_worker_clear_new_worker_needed(void);
 
 #endif							/* IO_WORKER_H */
diff --git a/src/include/storage/lwlocklist.h b/src/include/storage/lwlocklist.h
index a9681738146..c1801d08833 100644
--- a/src/include/storage/lwlocklist.h
+++ b/src/include/storage/lwlocklist.h
@@ -84,3 +84,4 @@ PG_LWLOCK(50, DSMRegistry)
 PG_LWLOCK(51, InjectionPoint)
 PG_LWLOCK(52, SerialControl)
 PG_LWLOCK(53, AioWorkerSubmissionQueue)
+PG_LWLOCK(54, AioWorkerControl)
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 428aa3fd68a..2859a636349 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -38,6 +38,7 @@ typedef enum
 	PMSIGNAL_ROTATE_LOGFILE,	/* send SIGUSR1 to syslogger to rotate logfile */
 	PMSIGNAL_START_AUTOVAC_LAUNCHER,	/* start an autovacuum launcher */
 	PMSIGNAL_START_AUTOVAC_WORKER,	/* start an autovacuum worker */
+	PMSIGNAL_IO_WORKER_CHANGE,	/* IO worker pool change */
 	PMSIGNAL_BACKGROUND_WORKER_CHANGE,	/* background worker state change */
 	PMSIGNAL_START_WALRECEIVER, /* start a walreceiver */
 	PMSIGNAL_ADVANCE_STATE_MACHINE, /* advance postmaster's state machine */
diff --git a/src/test/modules/test_aio/t/002_io_workers.pl b/src/test/modules/test_aio/t/002_io_workers.pl
index af5fae15ea7..a0252857798 100644
--- a/src/test/modules/test_aio/t/002_io_workers.pl
+++ b/src/test/modules/test_aio/t/002_io_workers.pl
@@ -14,6 +14,9 @@ $node->init();
 $node->append_conf(
 	'postgresql.conf', qq(
 io_method=worker
+io_worker_idle_timeout=0ms
+io_worker_launch_interval=0ms
+io_max_workers=32
 ));
 
 $node->start();
@@ -31,7 +34,7 @@ sub test_number_of_io_workers_dynamic
 {
 	my $node = shift;
 
-	my $prev_worker_count = $node->safe_psql('postgres', 'SHOW io_workers');
+	my $prev_worker_count = $node->safe_psql('postgres', 'SHOW io_min_workers');
 
 	# Verify that worker count can't be set to 0
 	change_number_of_io_workers($node, 0, $prev_worker_count, 1);
@@ -62,23 +65,23 @@ sub change_number_of_io_workers
 	my ($result, $stdout, $stderr);
 
 	($result, $stdout, $stderr) =
-	  $node->psql('postgres', "ALTER SYSTEM SET io_workers = $worker_count");
+	  $node->psql('postgres', "ALTER SYSTEM SET io_min_workers = $worker_count");
 	$node->safe_psql('postgres', 'SELECT pg_reload_conf()');
 
 	if ($expect_failure)
 	{
 		ok( $stderr =~
-			  /$worker_count is outside the valid range for parameter "io_workers"/,
-			"updating number of io_workers to $worker_count failed, as expected"
+			  /$worker_count is outside the valid range for parameter "io_min_workers"/,
+			"updating number of io_min_workers to $worker_count failed, as expected"
 		);
 
 		return $prev_worker_count;
 	}
 	else
 	{
-		is( $node->safe_psql('postgres', 'SHOW io_workers'),
+		is( $node->safe_psql('postgres', 'SHOW io_min_workers'),
 			$worker_count,
-			"updating number of io_workers from $prev_worker_count to $worker_count"
+			"updating number of io_min_workers from $prev_worker_count to $worker_count"
 		);
 
 		check_io_worker_count($node, $worker_count);
-- 
2.47.2

#10

Dmitry Dolgov

9erthalion6@gmail.com

6 months ago

In reply to: Thomas Munro (#9)

3 attachment(s)

Re: Automatically sizing the IO worker pool

On Sat, Jul 12, 2025 at 05:08:29PM +1200, Thomas Munro wrote:
On Wed, May 28, 2025 at 5:55 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

I probably had to start with a statement that I find the current
approach reasonable, and I'm only curious if there is more to get
out of it. I haven't benchmarked the patch yet (plan getting to it
when I'll get back), and can imagine practical considerations
significantly impacting any potential solution.

Here's a rebase.

Thanks. I was experimenting with this approach, and realized there isn't
much metrics exposed about workers and the IO queue so far. Since the
worker pool growth is based on the queue size and workers try to share
the load uniformly, it makes to have a system view to show those
numbers, let's say a system view for worker handles and a function to
get the current queue size? E.g. workers load in my testing was quite
varying, see "Load distribution between workers" graph, which shows a
quick profiling run including currently running io workers.

Regarding the worker pool growth approach, it sounds reasonable to me.
With static number of workers one needs to somehow find a number
suitable for all types of workload, where with this patch one needs only
to fiddle with the launch interval to handle possible spikes. It would
be interesting to investigate, how this approach would react to
different dynamics of the queue size. I've plotted one "spike" scenario
in the "Worker pool size response to queue depth", where there is a
pretty artificial burst of IO, making the queue size look like a step
function. If I understand the patch implementation correctly, it would
respond linearly over time (green line), one could also think about
applying a first order butterworth low pass filter to respond quicker
but still smooth (orange line).

But in reality the queue size would be of course much more volatile even
on stable workloads, like in "Queue depth over time" (one can see
general oscillation, as well as different modes, e.g. where data is in
the page cache vs where it isn't). Event more, there is a feedback where
increasing number of workers would accelerate queue size decrease --
based on [1]Harchol-Balter, Mor. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, 2013. the system utilization for M/M/k depends on the arrival
rate, processing rate and number of processors, where pretty intuitively
more processors reduce utilization. But alas, as you've mentioned this
result exists for Poisson distribution only.

Btw, I assume something similar could be done to other methods as well?
I'm not up to date on io uring, can one change the ring depth on the
fly?

As a side note, I was trying to experiment with this patch using
dm-mapper's delay feature to introduce an arbitrary large io latency and
see how the io queue is growing. But strangely enough, even though the
pure io latency was high, the queue growth was smaller than e.g. on a
real hardware under the same conditions without any artificial delay. Is
there anything obvious I'm missing that could have explained that?

[1]: Harchol-Balter, Mor. Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, 2013.
systems: queueing theory in action. Cambridge University Press, 2013.

Attachments:

load.pngimage/pngDownload

�PNG


IHDR��5���:tEXtSoftwareMatplotlib version3.10.3, https://matplotlib.org/f%��	pHYsaa�?�ig(IDATx���y\T����0�:����n�����&�[e���Vz��-5s�Vje����u+�RK+355TpI� \�\PTe������r����g`>��k^1�<s����s���B�������R������~1�@""""3�HDDDdf����� ��a$"""23�DDDDf�������0�@""""3�HDDDdf����� ��a$"""23�DDDDf�������0�@""""3�HDDDdf����� ��a$"""23�DDDDf�������0�@""""3�HDDDdf����� ��a$"""23�DDDDf�������0�@�z����B��������9s�P(��������g-Uxgg���B���e��i���?aoo_����P(0g��z[_���Nz0~~~<x��e�4@j��-[�B���]������d	R�0��j����
�4Q}���"z0iiiP*k�o��~�
�-�Q����EII	���������������[�/���[���%"��5pj��NCYee%����P(`mm
�:[��X[[7�H�����^�STTT/�!�K�D7����0`4
�����o_��������W1m�4�k�����h40`:t��.\��!C�����2e
�����g�����������?���v��������s��EX[[���={�D||<p�?��E��}��^����G}����#00j�����`�3g� &&vvv����[o�!�4�N}o]��j���{����`���pww����q\�|�������z=������X[[����=��]�f�;;v�v��!mc������|��'R�+W�@�T����`]/�������o�>���������E�^�����/^���c���	�Z���|���m�~g�W������f�����}����S����>�B�_�U�����B���;�0`����-^�!!!P�����Fll,�������m��Err2"##akk��^{��5-_����x��Wk�����?~#G����3z��	�����1c��Y3��j4i��=���={��Cd
�Oi"��CDD4
�O�+++|������7v��!}A�9s������������������W/?~�����K�}��EFF^z�%x{{��o����������#������;������J��=�������9s�g�}�������������~������Cff&�������V���K����&L�Z�����z}�mu:�������c��y��y3f�����J���[Fmocj�����*�&M���3f����g�b����8q"~���{����|����l�2�3/������p�B�������?`ee����c��I�������/���'�����m[���/��p�
�W�^������k"""��n��@�.]0{�l(�J,]�?�0v�����P@vv6�w��B��'����6m��q�������'l�����R�i��A��b��yx����o��;���m����	;w����>*��T*q��!���C��@��c��=�0a���9s�`�������/����4|��g8p��������b��>|8�~��;�}|��x�����k���w��������-Z��{��'�a�����c�4i���������xddd��������Q#�t�R@8p��m�"T*�8}��4-33S888���HiZii���t�MOOj�Z���[�����b�������"$������<d�amm-��;'M;~��������������GK�;t� 
t���������m 4�����v���K�i�G���I��iz�^4H�T*q��e!�			�nwu��Sm�����={��~2��VuDEE	�^/M�2e����yyyw�_�n��]��b�
��o�����!!!�W�^��+66VxzzJ��N�*"##���������B���
�B!,X ���Ec�}������_���O�6n�8��Iq����>\8::���b!n���i�F���I�,X �#G��u�
4H���J��*�*,,,��M��B���b��uB!rrr�J����g.��_-M���� �,Yr��}}}����B!�~�mi~M�����1b��u\�vM~��]����%`2{:����;����iz�&M0r�H���������vU7\�t:�������Z�BJJ����~�
M�4�����I�lmm
�t���-[�`��!������i�111������;��'O�`/6l����n?q�D���3K�����u�}�p/5��U�0a��%����t:�;w��u�k;��YGGG���W�\�^]�t���=�����dgg#--
�q�,22��kp���B:�����'Ob��������[TT��}�b������B���~�#�<!�A�111�j��1�3*���>�8~��HII������D������k
�tIu���(//����
nl?~<4
6n�h��Z�1c����y�����_�|��_]�n������?o����*�
������
^&�w��e�U�V��k��
�z=��?�����z,X��/Fzz:t:�����U����s

�m����Q]=%%%h���m�Z�j��~���������c��e��h��-����g�y�����������V�T0h��%p��_]������������/pc������j�����v999�\OU���k�5k�?����������GI�4
:t� �F�}��j�ZTTT //_|�����j��}���J$%%�y�����ADD�;f���������A5'*�
���M�������7n��3�����j[Q���Z��|�W^y������;�Q�F��7��1��{���7�xc����o�
(�JL�<��}��[dd$N�>�u�������_~��?�K�,���>k�2lllj��[�p��t}������q��z=<<<�b��j�sV��������s'��� �@xx8������/���s��kz�!�,Y������c���.���������~�����(��>���+����s�N������-[�DDD/^���2����?��=�����8
		A^^���[<��s���u�uM�<�<�~��l��o������}�vt�������>0��sww����t��f'N��R�D���?��#���������]^^��������8z�(����uTW���M��p��<���`��13f
�9s�H�N��~��z�9sF:��7p�nW�t���;9���jlm5���c�300[�nE�=�������s'�����cG888�C�ptt���������s�J��QQQw\���;������6�T*���b��]�����lFDD���+V�@vv6"##�����7��������#==�F5�����D��=��o_���[�I���e���@���+x��Wp��It��������w�=�r��������@tt4��[gp�2;;+W�D��=��h�����X�f
.^�h0m��������?�(M+..��e�[�����/�����i�_��-[����Ugy����#((�`;;;��@v�.\(�,����aee��}�7��-,,�s�N��-^���e[[M~o��^���O@�������������dggw�m������g��?H�I�T������TTT���K���>Baa�m��������
�O?���G���]m�����}���� �����6m���>��T����J��'�|b�w��W_A��b��A5Z�f��u�V����_�~������n���QZZj0-005��H.<Hf��������o����/��w�A||<z���_|�������QVV�y��Im���zc���C=�#G�`����
?~<.\�Q�F!99M�4���~[[[�j�;w.6o��������������~���>|���
F�����K����������
n`������^BLL,,,0|�p�j����56o����G#,,�6m�����k�I�<����~�)
�a��j����6co��������{�9���!55���������'�f�,X�@�1�K�.������;� ((x�����BQZZ�{�=����Hl��	j���u��+�J|���0`BBB0f�4m�/^DBB4
��_��%!!aaa?~<���q��U���`����z�j���������8��A�������???4k�L�����Y�fa������?}�Q���a�������~�������{�FLL�o��Fc�������}���O<���`XZZb���������%�z%�m�Du�j�;���?/��!)bbb�������}��{��1XVii�x��WD�&M��������HJJ�z��mH�s���G}T���
777���/K���k!���c�����P�T"  @,Y�D��f���;����P���$lllD�������+����6���b��I���](
i�U��T7���������O�������Vxzz���g�6\������a�����pvv�=��8z��m��Sm��a`����i(�;
Os��l�B|���K�.���F888�v���������L�MVV�4h�pppn;~<<<���-M��{� """����?�C����B�V___��O�m����������y�����Jxyy��}��/����}�f���Vw�I~~�������R���w�	��g���s.�[�VVV���S������kmz��%BBB������T��o�4<P�P7��������~�\�rE������[;;;���(����~"2e
Q[�������A`@""""3�HDDDdf����� ��a$"""23�DDDDf��������I @��#33��lU"""�;B���J�y�c|������y"""�?���7x�9a|����:ODDDu#??��7����������h���s��e������� ��a$"""23�DDDDf�������0�@""""3�HDDDdf����� ��a$"""23�DDDDf�R������hK*p���/���y-�m�>�<�.��a$"""Y�U�Xf>_���Z������m|�l� ���
����#�8t^�#�p*�zq{[[�o�����#�M�r=@"""��V��_R�-���JT�tz=*��:�J��{�^�R��i�@�N��+EH�*@e5i�Kc-��������#�lU�l�9a$""2��\BZv���z��:[����A������������1��}gr�����*���
�6Vp���_{�%�,��T*`i������J�������Y��N6P(u��to�DDDf`��@�W�hcU�Kcc�%�J��������+����#�3��C�����q h""�FnG�e��V�S�F������	`$""j���<����K�DDD�[aY%��<�������`$""j�~?���J=�����F�r�D05b�R�_�}��7�`!	 Q#u���O]<��[�r��05R���^�}3G���]��z�qqq�������!C� --��M����P(^�?��A���4�������������J�6������3�j5����l����Y�h���`mm���0����������~I�y��nQ�p��������{���
DGG���������q��%�5o�<i�N���A�P^^�={�`���X�l�|�M�Mzz:
�>}� 55�'O���>�-[�Hm~��L�:�g�FJJ
:t���������� ""���#��5(�#�t�B�Y��������;v 22�q�c���?~����i���Lxzz�,Y�3f����P�T�1c6n���G�J�>|8����y�f@XX�u����z=�7o�I�&a�����=??����j��hxg��E	����4<������]�I���	��j���+V������m�Y�f���X�����v��I�bbb����c��Im�������$@yy9���
�(�JDEEImnUVV���|��)������:���N�g��zL�<=z�@��m��#G����/���q��a��1iii����YYY���>++��m���QRR�k��A��U��������s������Y�H�.��B��!M�.�L��066G�����
�O�0A��]�vh��	������O#00P�J��5k�N�*����G���e�����:U7�n�G[+��!$[�8q"6l���;w�Y�fwm8u����u������s���%��j��m4
lll`aa�j�T-�Vj�j��>�����~������|�U���
!0q�D�]���o����=?���
h���i���p9r��n���xh4Km�m�f����x���T*�t�b�F��c��mR""��&%�.���Ne��m<�.�LT��������+�n�:888H}�acc���Oc���8p \]]q��aL�2���h��= ::���x��g0o�<dee���_Gll�t��������1}�t�;��o�����q�F���S�b������+BCC1�|a��1��[���jE�����^�����2U�����t�R!�"22R����Z-�������*�Z��r��=+ lll����x��WDEE�A�����cG�R�D@@����}������G�T**���k��h�Z�������P^����]��� �r�.�d��[��l�8������s�����������}�7���o����jG����7a�����ADD��������~����_�@""�F`��l������}��.�L Q#���G�)
��!�HDD��i�+��v}l\�L�`$""j�6��
�@k/��t��j����u|�� Q��-���\�#��]5�DDD
��������f��r�C
 QVu��c�QM05P�/��E-,�
j���d<@""����o-��b���j@��� !~=����05@G.j�~��VJ����j`������?�{�Nm)w9��050:��������.� @""�f��\�����
�-��.� @""������vM���W9��""���R���^<�����05 �i�QPZ	/�5B�\�.�(@""���-�����G;zC�T�]5P�o���������������pP[bd���eQ�HDDd�*uz|����7�K+���f
h/Gk��������%������p"��D�������Q-`$""2!�y%x�������;}�l�0-�F����}���0���
��u�N��B�x*�S�����J����a$""��[�����#�j1 ���
F�����Q#�HDD$����������e��F�������P�r/�@""�z$����B�>p����B'��Pb\�?&�	���_�T�x��1�^�������ll9��������[{�����w���F2/�DDDu��R�=��`��l�����2i��B��-��Lw_�i�!k�d����jIaY%�r��X6O����R����D���	�B�V����^��>""�p�����~i��S�(���y�jD{":���PY*e���
 �}���/|�����47;D�x"&��9A����1���J=�'��@����BL��<�9��<@""��p4S��
=�m��~bO�>jP�����>H�
�����G
 �}8p�p��mD

 Q
���U�t���c$""�����W\+�m�(w9D5�HDDTC��^?����	V�*���G-Q
�|QC�HDDTC��:@""���+���X(���$w9D���������������S��������`7^��������_������Q�HDDd$mq���L
 ������f7{����7@"""#U
�����1I����6@"""#�V�p�B�@����n�����2d�������"66��������a����m�&##�
���-<<<���������Mbb":w��Z��� ,[���z-Z???X[[#,,�����-'"�����<T�<��q����R�p��������{���
DGG���Hj3e��_�k����;�����C�J�u:
���r�����/��e����oJm���1h� �������<y2�}�Yl��Ej��?`����={6RRR��C��� ''��57���P(�.�������b��B!�������X�f������DRR�B��~�M(�J���%������F�eeeB!�O�.BBB����O����}hh�������t:���-�����]��
B�����Q�0��}�w��t��K���o!d���j..�;�&''���QQQR���[���III���$�k����R���������cR���Q��j���HNN6h�T*%�!""�^ ���;�y5�>�P��c��������m����J������===���%��9�U���w�6���())��k�����ms���j�-++CYY��>??������Y�((������4r�C��d=��G��������������Qz5o�\�����H������3,���G
�lp�����a��Y3i���������g�>;;^^^R�[�
�z�6�666pss���E�m��q�Y�fA��J����?�> "���@s�j�=
!0q�D�]���o������.]����
��m�����!##������p9r��n���xh4Kmn^FU��e�T*t�����^���m��6�R���h4/""j��w5��066+W���u���� ��stt���
1n�8L�:...�h4�4i�����{w@tt4�����3�`��y����������X���������c����>}:������c�����q�T���S1z�ht������?>���0f����-DDd�2�#��*%:4w2�D
@}�v�����K�6%%%��_������V<������K�9{��0`����nnn��W^mD���J�������
�R�Dhh���w���������5������]
�~��C����|8::B���r0Q#5������y<�+3����������������x5�DDDwp��g�A����j<���������<�hk%w9D���������������0����������Ee�8�y������0U���<��M�l��d#w9D�������o\�
���?j|����Qu0o�������:=������5B�DDD�8zQ��
�l��n/w9D����������u�R����Z�HDDt������81�Dq�
 ��G� �MN_.���
X[)���Q�r�� �M������T������G6�M���������1����n|5b�DDD7\�����(@'�L� �
U��B�a�����:�HDDt��K���XVV���������HF�y������/c������F�A���q����������hK*��]ph2w
�3f�@jj*�z�-|��G���������:""�z�r�����������:u�����X�lbbb�F�6mPVV��DD�xH�����������D���-Z��Z���K���6""�z����9��M ��B�eMDDD���B�C��o!3q�K�B�l�
�B�VXX�N�:A��_v�z�j�VIDDT�_��\�������r�CT���.]Z����M��n>�A�X�5�=��*!""��� /��y��@���������?�����p�������e0���^���c��.""�z��U���J��-���A�r����O�NOO�^�7�v��E�?�.�"""�s��z��r�����0F������p������v=DDD��R����Db�eX[)19���%��S�����^/0��#��HTJ|�LWt��@�|�/_��7J��O�'''<��C8w�\]�GDDT��xk�q��|J>�	�-��.��^�{�=�������h�"��7nnn�2eJ]�HDDTk>���������z�]Q�3��������_0l�0L�0=z�@�����F""�Z�����d�)�[��`X�fr�D$�����#77�����_�~kkk����m�DDD�`��s��t�����[aT���%���3�������>�N�:������������������K���9
x�w ^�$wID�2���E����/���~���+ 99#F�����������0u�!<���cZ�]��B!w
U~~>��j��h�.���n����]v�:=�vj����J�B��Hf����8��v���O?��z/^|�����{w]�GDDt_��]��o�\�GL�'��_{�?��
�?��bbb`cc�������Z-�{��������F�ej1f�~�T���
�����������w�yK�,���_XYYI�{������������FN_.����#��]}���3]�����,"�bTLKKCdd�m����Wu���k�x��}�-*G��_��lUF?���l���p��������uQQ������/������v�fl(m���$��1��E�����/�����
����HJJ��i���o�}�DDD�B�ln1v������`��\�V���
V<���j�K$2YF��3gB���o��(..Fdd$�j5�M��I�&�}�DDD��a��\�>y����<��Q�����q��r���F���F�������S(,,Dpp0�����:�q����VI���^�q�/]�7���P���"Z��G��5u��z�{����g��T*�]5DDd��8|A����`���H9��r���M�&D�pC� 7t�s�MD���5C�5z!?��sm�CDDfH�m����'q���`^S'�rC�nx(�n��G�����""2;z�@�_��d�I��~y��J�^-����;z����
/���;��K���
w���?�����t���]�!C�H����b�������������W�^��I��~�z(�J6,0��x��a������pww��I�0}�t���Y�o����=�-Z��>����d�����z��������8�U�UY`T�����Y>�:fT����tTVV�E��O�<	+++�������"t��c����e����P���O=�.]����xTTT`��1�0aV�\	������(,Y�G����c����	&����#F ..����+1d�����m��FoO��x�n?����jK��!?���;��%�����������GL�������_"11��V�PT{0//���K��������k�����7c����p������g��_������R]�����3��/�����'�|EEE��a�������c��X�d�Q��."""�T���p�>�~�/�-1��?�����-l����o#�����G��M���;RSSk����Dxxx�U�Vx�����+�KJJ����� **
J������DFFJ�7.#������kR���(����� ))������\U���1��}��H���Ep����~-�{�����%�����P(PPPp�t�V�NW����C����?N�>��^{
@RR,,,�����XZZ���YYY���,���������9;;#++K�vs��eT���eee�����;�%"2g:=��\���S��Zp�����k�>"9###�U�V��������={�jA���~n����o���@$&&�o������������se�����	!��h�6������
�#�Lw_��9f�)0�/��>@dd$Z�j�����]���������i�pss��S���o_xyy!''��Mee%�^�
///������
�T��W�����5k�N�*����G���ka+�����xk�q�O�
p�W��^�����L�Q}���q��a<�����AAAF��'N���.\@nn.�4iG^^����6��o�^�GXX��f������������U�Vpvv��l���`]����c-j����EDd�r�K1��Cxd�n�O�
��/�m���{����?"T�g����B�:u
��S'��?�A�>}�����;��
���N�>�������G����0`����d�i��]�J��h�Z�j�
����1c�=��c����?6�W�^x���1h� |���x���j4�"""sVZ��W���8�������7��o��N6r�GtG����_��|}}���sEFF�1��*!!A��5z�hQ\\,�������������b���"++�`���b�����^h41f�QPP`����C�g��B�V��M������ZV�^-Z�l)T*�		7n���h�Z@h����DD
�^���^�m�36��cw��sW�.��(����3������e�p��Q���������?~������ "ss�|��p�]R���5fh�G�{C�����a��w
/���`��eX�jt:F���c��s��u[���DD�"K[�y[N����+��;�#`����<����}��������1c�TTT�]�vx���0f��z`7 "j�J�u�b�,�q%���
���cZ���Z��������JEE��]��K�">>��w��q�p�����k��u�t#5\U9��o\����:�������I����SRR�t�R�Z�
J��F������[Km�qt���.k%"�:V�����X�p
����:�`������Y]�!j��
���uC�~���g�a��!����>���O� "���*�}��.\���r~��/������11*�9s���wmcgg��K��V]DDT*tz��|�������0_��A�Hsrr���%=i���}�`aa��]��U}DDT�+�w�����B�@��a�#j��z\ll,��?���/"66�.�""�:P^������Q"f�|����A�7c��>������u���������S'?~�.�""�ZT^��O)�p�)\��~���A�zbd������Z�Fvv6�_�t	��|�7�)�1�>������$F����h��5������# //������W�5�}���o��z��p����P?"sgT����	___t��	���
OOO|���u]#��O�����K}[����~D�6m���c��8t�lll0f��1��1��H^�N���f
h��z�]��;����a��	u[
=�%;N��-i���[1��m�������<��7�L�n�{�]� @"�F��]g��o&G����[�]�(@"�F`��xg������A��R�����15p�$���������)������
�����������������_�emDDt+���������i���P(�.��L�Qp���HHHdee�_�~��?��������������g�_k�&D`F�?"2�Q����

�^�m����={�b�
,[���k$"�[�>p�~>����f�#"�+**�V�[�n���>
h��5.]�T����/`���cz���Am���F�
�!!!X�d	v�����x������	WW������n�9�^���F������DTcF�>���9z���#F�C��_�U�4LDDuk]�EL[s=�=��s
a�#���Bc�t:������Y�v��Y�������.k4Y���ptt�V��F���"j�~N��ikA/��>xwH[(�D����5P���d|���(((�T*����e}DDfM��[��������n�����Y����s���?222PVV�~�����|�����d���������U�0��#X��E�s����5�=0���������+�]�i���?�m���e}DDf�ZQ9��j?��yJ������m���Vup��]��gT*��t???\�x��j#"2K�W�0v��_)���������r�ED��QP��C���6���ppp�������������A�W���
��������,�.�.GGGc����{�B���B��=�������/^��_�C^q:4s�������N5��!N�<��]�����pss���;9��FNDN��Nb����m���':�Fe!wiD���k0`ee%���{>|������3�z�)��B�
 "zP����+3bx�/Q]����}���O?�t�VCDdF����o����UX(xgH[����,"2w������y��Gk�""��;}�HNw�C�1j
���;����z�����v������""3������������	_��
w��e��1� ����b���8��g����df�8��m�6<��������u��������+.��G[�������R�Ia���D$�������888���_��/��F��b��Eu_%Q#����2��G;�0��+����M/G���V���de�8���5���31q�D���-�{��g���8BDT�c�Z���8���
h�l�7#:�
�����md���<����������1cF]�ED��\+*�����r_���R"�w�G����{��t}�Q�]�������u��a���uUQ�P��c�������-�n��
lo'�}Z�.�`pp0�}�]$&&"<<�w�^���x��W��'�Hm_z��������$������p"����s
A�W�K#"�#������0�g������! 2_�J��oa��KG+L�n��>��0z�"����<���^���8!�e����X�?�z(�Sa����%��Tr�HDdMDtY�R�K���S."-�@����9�� ��<� Q�eTB��DBBrrrn{L��?�\W�����[�e�����}�
�:��,��������[�sX"j��
��'O����>}�����XQ����=���S.b��K(.�I�B�\0�sSh��6V��ID���
��~�-~��g8��+""�g�r
�S�E���E\��J�}]m1�S3<��)|\me����6P����J��y�H�-BZV>6�����|��%w����M����W<��Q2*��3s����_
jJD�M���+���"��R��+������Bg8���R����1�s3<���O� �F����O`��U��������������U}DDw��-����q�J��!�J�_-A�N���,��s����
t�#��f�������dT=z4������O?�M ;w���~���d\�t	k����!C��B��=�������=z���>C�-�6W�^��I��~�z(�J6,������������������I�&a�����Y�o����=�-Z��>`?G���E-��j��+n���P���~�v�w�������f�&k(���KD����q�Fl��={�|��C�;v,�z��y����O>��������7�x1118~�8���O=�.]����xTTT`��1�0aV�\	��;::QQQX�d	�9��c����	&L���#F�@\\��+Wb��!HIIA��mx;��n�������!��-=���~nv�����l`��GDT=a�V�Z�C���F��k�J��z����~��4-//O��j�j�*!������6�6m
�B\�xQ!��������(++����1C�j�Jz��O�A��&�{�9���j���j�5�v"�����77�������r�K"����B�������>}:��=[�a4==YYY�����9::",,III���$899�k��R���((�J���Oj	���e���AZZ�]�&��y=Um��S���2������~�=��g�����Jtp����p���|DD5a�%���~�������m7�\�z�V����xzzL�����eee����`���%\\\��������y��������z���s�>�6���}�
���J+��h��/��
��%"�)��������`��Y�:u��>??��7��&"s����	�&��R�>�����]8\�}2�.��������F�&M����������&''��s����z���y///dgg��z�6U���V��Vs�����x6^\��r���=�pd'�-�����Q}oVZZZg���������m��I�����o�>������������d��������&���s'**�74D||<Z�jggg�����jS�"2
��\���%�\���v^X�Tg�?"�dT,**������;;;8;;�j������HMMn��������(
L�<���~��W9r�F�����4V`�6m���?�������'b�������9*�
�����c���?`���o_~�el������q��	��3���k�=DTw��^��U�R/�XGo|2��,j��V""��1�
�����M�6��666����o���h�������jt�qBB�p�k���B�
��7����B�V��}����4�e����#F{{{��h��1cDAA�A�C���={
�Z-�6m*����jY�z�h���P�T"$$Dl���F�������������
�w�1��TQ���]5��B!���wW>>>���o��woh4��� ((�~�-V�Z��~��~���������#�Z-4���5������G 0"�9���O� �Z��o#/_�z�F#
���gO����n+$"��m�Y���z����GDT�
�HOO�n��W��_�NNNu[!���v���u��z�c��!DDu��8f�:t0s�L,Z�����2e
^}��������g�������z��Am�P0���������R?�����Me
�=8!>�v
o��R������������V~~~�����j���!0oK>K<
��n!wYDD��]/'%%a��
����������&L@YYY]�HD��s�������0�������z����9r���CTTf������#..�>�$�FD�xm�Q,�s����x6"@������]`jj*���+�����������:u*>����`""cT�����V���R|���Lw_��""2+w�x��5xzzJ�w���H��u������m�D�hT����}*6��?��v���,""�s�3���������HIIA����������*���+���������%XY(����DD2�k8p f���]�va��Y���EDD�4������:��+)�a�7����-������	���,""�u�K�o��6��^�z�����/�J�����������:���*,���e�/�*l�,����x(�M������QAk�Z�������`���Waooo
�	�$�;mI�,����<��-�lL7t�s��,"2s��6r hGG�j�����DT�kE�x��}8z1�6V�fl(:4�����L�}=	���n.���/�!-��v*|;.�����l""S�HD�*K[��_����E�pPc��0y8�]����j����x��}��Z��N6X�l����.���n�HD��ZQ9F~�������+�
C3g[��""�j0���^��O��Z[�~.�k��""�;��@�DD����4�:y6V�bT�?""�HDd��K�,�4���k��^������1�};�]�ik�G����DD
 ����
<�m2��up�����.�����HD5������\)���5��K������������%�B��l�,����.p�W�]� �HBZ���o�;����}�� @"2���"���O������]�@"2Jqy%��6����������]�'@"�'!f�t'�
�f��gOu���B�����>1�=}�;����R��Ou��#��AD��1�]%��E�����A����%�b$�;��+���)��������$""��DT��
^�.�E�n��{���B���,""��DT�9���Z8�Z��g��F��>��@"�������y(�'�;�����%Q-���"���
��R��+E8s���%7�q��_����y��~<30-�"[���iDDT����
��!����W���!���V�5��^�X��$""��Hd�.iK���38�S���E���������nvp����\�TP*P*P(�B�PJ?���������w�MDD� ��*.������dN��t��%������w#�����^�?i""�7~[������dN!��x5�����fg[+��#"��Hd�~L��5��T�w�C�nr�DDD���!21'���/G��Z2�Q�c$2!���xqE
J*t����>Ar�DDD� �	�����Ov���}�����1��[���;��.���)@"�~DDT��d�~DDT��d�~DDT��dts��O�wb�?""��D2�������t��$""2�D2���_D7��~DDT��dP��������H�D��p��Np�g�?""�_&����Ba�j���4������puu���=�
���l�eddd`��A������^}�UTVV�ILLD����V��e����6��b�?""2& $$�.]�^�w���M�2�����5k�c�dffb����|�N�A����{�������l�2����R���t4}��Ajj*&O��g�}[�l��m%��~DDd*B!w7�3g~������6O�����+W������N�8�6m� ))	��w��M�0x�`dff����d���1�/_�J���3�q�F=zTZ����������7]k~~>��j��hje���zu�!�I�5~{9��~��d��o=x��Ix{{#  O=�222������@TT���u�����ARR ))	�������� ??��������6U�����2�������~DDdJL.���a��e��y3>��3���#""����J������g<==������2U�����M~~>JJJ�X[\\�W���km���Z}�<����~DDd",�.�V�~n��=���������W���F��f����S�J������J�uxc�Q��|��~DDdL.����	-[���S���_?���#//��,`vv6���^^^����2������wgggC���5d��j���tG�v*�/�H����P*���Z���APr�?""2&w	�V���8}�4�4i�.]����
��m�����!##������p9r999R���xh4Kmn^FU��e=��^�������pwPc���1��DDd2L���i���#��������={6,,,0b�8::b��q�:u*\\\��h0i�$����{�����h��g���y������_��������.����1v�Xl���W����e�zj�J+t���V�?����Ov����i1�x���1���pwwG��=�w�^���>��c(�J6eee���������[XX`��
x��;;;�=o�������7n��)S�`�4k�_~�%bbbd�fj��\.D��?���|(��}[`��-��7""2I&7`C�q��������\7{�?�	=[��]���M� QCQZ��;��������w�'#:�Sc-wiDDDw�Ht��!ve
�^�>xl�@L�j	K����������6��W�FAY%�m���������e���H��z�m�K�8����OGt������ �.������|����L�i+^�%"�����^�b�7qI[
kK|�DGD{�I"""��Ht�e����B�w;|9�+���.�����0UC��O��X�p
���;>�	k+�K#""z`�D�(,����S���l�F���[��DD�h0��\n�sgBe������N��.����V1������2y��pP��Q]�����e�:@2{B|�tom8�^�Cs'|�L>����-@2k��z���(V�?��)���Vr�FDDTg�l]),��%���kP*��Zc|D
��ADD� ��c�ZL�&�J����'#;���KDDf����oG.����PR�C����A����� �
!�o=��N"[������h�������0�Y()�a������%��=�1s@kXZ(�.�����1R���-��o��-�,xwH;<����e�����������)(����<�a�r�EDD$+@j�6��+���R���jt7����]������l���}Z�����`��=����H�MI�����#�o������B�������0R���-��o��E��ADDt7��(:�����f"""c0R���P&����DDD�b��7{�@j�n�������5�7{���s�EX��<~L>�+����P��!m�d7�K#""j0��U���?����2���i����w��DDD5�H&���b�����/�Ja@�"Z�cd�������R�2���@2):=������3���eq}���Ovk���|���w�=@2	��������y��I�#Z�ad���=y������0�l�z�m'r�b�9�����}*��ks����vr�IDD��0R�������L,N<�S9���A���~��PY�lQ]a�zSZ��O)�d�i��Zp����P������@�s���X�/_�<#��s�Sa\�?����'w�3@�3��
,�sK�H���
@GkL���n>�QY�]"�Yb�Zw��_�N��I�PXV	�u���1�s3��#""� ����|��V��@Y����/�	��vM`�a\���L =!Nd`�g���P��>�K�f�����6�P*r�IDDD7a��������g���Y���=���@� W(~DDD����RZ����W���,l;���E��<���[���^���"k�DDDto�tG�����D~?���_AI�N��hc��m<�����S�P"""j(��M.\+F��l�~,��^�N/�yM�l�/��!�����g�5P�fJ���eH�*�^G.jq"���]k/D�x!:�!���#""j�@aY%��
�w���w"+g����T��\�����V�������0629��H:���3{��p����
��j�V�h�u��=�.v�z������`#q���N����P���6�S�FKO��r��_
�<��86"""3���iK*���3�jw:����������9Ia����yV����n`l�J�uX��,��8
mIp������ �DDDtW�
Ly������O�rA���=^�n��`O?"""�'�B�X��E����tSGsL���vh
>o�����d�#�.Z�~~~���FXX���/wI��t�b����5�p�Z	<�xgH[l���wj��GDDD5b�g��L�:K�,AXX������������C����y�
>���#�'[+��;�t����DDDt�BaD�F),,��u���z���7��I�0s��{~>??����j��h4�V���W1oK��_��,�lD�E�CcmUk�!""2Gu���������r$''c��Y�4�R���($%%U����2���I�������_eb�U�,����{��^]'�""""�c����+��t���4�����'NT����8��;��k��'z!�'Mm�|}DDDd^��&���5k�Z��:�|���Cc�w��c�#""�:a�g���`aa���l����������3j�j5/�Q�f�gU*�t��m��I��z=�m����pYk#"""�Kf{�N����G�k��

����QTT�1c��]Q�1����O����x��7�����;b�����BDDD����8���5<��6�>�DDDD��������0�@""""3�HDDDdf����� ��a$"""23f�$�U5�v~~����������Y������ys�K!""�*((�����e����{z����ppp�B���e����y��8���>��n�����������������������Ri���x�(�J4k��N���hL��q�����q����q����#s=�W�<c/�c$"""23�&J�Vc���P��r�b��������������������4�&""""3�3�DDDDf�������0�@""""3�h�-Z???X[[#,,�����$�1g�(
�W����.KV;w��#�<ooo(
���/��x��7��I��� **
'O�����v�������c�����+���8t��
�����!C���f�������puu���=�
���l�j�O�����{�v=�����\�>��3�o�^�9<<�6m�����c�M�?���S�b���HIIA�����K3!!!�t�����{��%����:t��E���?o�<|��'X�d	���;;;��������k����������Z�jU��(�;v 66{��E||<***���"���)S�~�z�Y�;v�@ff&�*k��������78����'[���Y�fx������������c�=�c��f~��,A&%44T���J�u:����qqq��e*f��-:t� w&�X�v��^��///���J�����Z��V���J����b��������&S���#�;vq�����k�������_�HJJ��Ry����W�^���_��.S���,���K?&�gMHyy9���%MS*����BRR������'O���x������!wI&+==YYY����#���xL�$11h��^x�����]���Z-���������
���u�����1�����Se��pssC��m1k�,�T��t:���{!<<���������+W�@������`���'N�8![]�$,,��-C�V�p��%��;8z�(�.��dee7���yzzJ��]���1t�P�������x���0`�$%%���B����^�������G�m��q�T*899�5�����#G����/���q��a��1iii����e��>9r���(--���=��]���`�����1A���0@��}�����/V�^�q���Z5L���~n����o���@$&&�o����&���X=z����������	����k�&M��o��8}�4e����j�
����j����1z�h���C���x	����������;�������%[]����	-[���S��.�$U7<��777�<�&N��
6 !!��5��{yy���yyy���8����NXX��q�R��.]� ..:t��x��(@�R���Kl��M�����m�6����Z��*,,������I�K1I������28�����o�>Swp�������1%�����v�Zl�������t�+++��(--
fq�k�T'55��8��^�GYY��?����M���S1z�ht������?>���0f��K3	��M�#�<___dffb��������#�.M6���g������
���`���x��w��E�����7����7�"k���n����s����a�������Oc���

BLL��u����X�\���������/���666ptt��q�0u�T���@��`��IG����.���k��>}+W����������c��)���D����.�^��5���


�r�J$&&b��-f��,�oC��}������G�T**���+wI&��'�M�4*�J4m�T<������Sr�%�����������1�o�!<==�Z�}��iiir�]o�����Ett�pwwVVV���W�?^dee�]v��n�K�.������_|Q8;;[[[�����K�.�Zw}������������E��j$^}�U��j�.���;V���
�J%���E��}����.�7���T)�����������@""""3�HDDDdf����� ��a$"""23�DDDDf����s��A���.���V0Q��d�888���R�VXX+++�����mbb"
N�>-C�DD�����>}���������^^^��oJKK��			���A```��!�0��E��A����r����HD
V�V���I$&&J���c����{��5���O������^��������gO8p���B���M���K��j�����u�>}�8q"�(++��i���iS���!,,���e�����	���+����V����Q�����N��A�����			���7z��%M/))��}���OL�>?���/_���!&&W�^5X���3��������B���
�>|={����#�p�B(
L�8III����q��a���@���q��I�s���������_���c������CDT-AD������vvv���B���KKK���#V�\)"##�Bl��Mg��VVVb������������7o�B���@���/��={��������?���������y�������������5k�B��K�
"55�N��1,��DD�w��(**��p��5�l������������R$&&"  Z������y+++�������2Xn��]o[WFF����w�}�'O��9r:�-[�4h_VVWWW��J���l"���A

B�f�����k���W�^ooo4o�{��ABB~��-�����i��������U�0v�Xh4����HNN�����g�����mll�P(�sK��j�Q���O$&&"11�`����Hl��	���G�>}�J�?��CjSQQ� 88��������
`mm���:u��N��������h���� 5x}���������*��^�z���?Gyy9���;;;���x��W�y�f?~���Gqq1��g������q�FXZZb��(,,D��-��SOa��Q�����������#..7n��-'"�?�D�����%%%

����4�W�^(((��������
�3�<���;���S��e����^���=6m�!
���",]��F��+���V�Za��!8p�|||�d����B!�.��������@""""3�HDDDdf����� ��a$"""23�DDDDf�������0�@""""3�HDDDdf�����������	]�-IEND�B`�

workers.pngimage/pngDownload

�PNG


IHDR��5���:tEXtSoftwareMatplotlib version3.10.3, https://matplotlib.org/f%��	pHYsaa�?�i�#IDATx���wx��������@B$$t)
JU"���"x��
T���)
�{U.��F/�"R��n����I�����I�$�2;���������f��2dO�����(�� �B!L����B!��$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�iX,��kt����{������,����&�%��Z�z5����Wr�W^y�����g
9��$N����c�X����K���];,�V�*u_�F�����NQ
Q;%&&��+�p��a�C���o�t�R����$����w���u�nOOOg��]xxx�~�z���;��c���5���y���CTBbb"�&M��I(�I@�tQQQ����J7l���(���(u����&���p���j=GEeee�r#������ah999X�VCcBW'	��E��������������uk������>���_��b�[�n�������7111����$77��8111�z��,_���;�������r�z���pss����So�����������'00�����{�n�����{	���C�r�-2|��r�c���w�^�BPPu����'�$''���}�~�!�[�������(��Cjjj�q\���I�>}�W�������r���;<�����c�X��*i��M������`������G��oY�s�-Z��/�H�
���#==��������q����������pn��&�n��>&!!�����-[���kW����9�TL)))<����_��k����co�������O�����k��y��c�������h��!���DFFr������U�����Y���?���7�������V�sh��u\{��������_�\����5kV��/kN��'�����_�>����n��O?�����X,������U��8~�8������p�z��2�/V�8�]`�X���b�����������\����{������}��Gvvv�^���)/L�{����;�M�6���EI^��]���+iii�����m����h���u����2{�l���N�y�6m���)S��gO������c��a<����=�������/��o����_F�
���s5j}������";;�3f�	l������������??�+�C�!&&�)S��q�F�O����3g������W^y�I�&��wo}�Q�����3��y3���������>)))�|�������/����Y�dI�?���sn����������K���_�_�~t����_~777>��3z���o���u�]w��^}�U���x��g�����������#����_2v�XZ�j��s�X�n{����k�Q�q��n�����a�����y��G���R��/������;v,���|���{��������O:��`�222x����X,L�:�A������>�f���<�����������+8z��z�U��,��n��'�`��������e���eu���;w���+��BAA/��2������gyN�>M����[aaa���O<�����3n��J?gu���/��W/�=�O<ATTs����_��q���`���<���\w�u<��C����z���X�L����[����	�������$LHB�w�V��W_UEQ���e�����(J����>�@QEIOOW������G+��(��mS��tx�g�}V�_�U��q��
�,[��T�2f�EQ��g�Q����Y�f��gdd(!!!�q�������`��G���/�P�����/+�r�m�9���c�)��}��J�������K�������B�q����(�~��C��7�l|_��(�7o������_.���{LqwwW��Z�J��M�>}�(V�U}\vv���t�M�=��U�@i��������^��
V�����GP�~�m����\�}��Jxx�����(��L�6M�y�������S�t��(�����(JRR�(u��U��?�>��o�Q����SEQ.\�������rc��yY�/��B�U�V9�^�s�,T|||�#G���%&&*���J���{��g��z�K��x@���T��=����C�*����9��g�)������8��b��=�������z[VV�_��T�w��(����2j��Rq������w���;�P���{��$��]�l���u��s��o�NVV����k���P��
(,,T������<������3���?8�K�>}��CQ����������Q���V�XAjj*��
��������;�:u*s����>Z��a��1�?���������_~!//�q����V��x����zO�$$$��������J����9s����:u*7�x#��m����}���;wN}O������k����|�Q�F����~_��
		a��M�<y���������V�������&%%�-[�@��ODD��
S����O<Aff&k��qx�����:u���_�����������������p�B�qU������C���,_�����Q#���-[���JE����b��(���Z���CZZ���}ET����������;�To���S+r�9��~T�#�<�����_��s��)B\�]X,�v���"\�~=������CQ����C��/%�9r777��v���p����ccc��c��9dff2c��q���g�26((��{6lX���i�����������e��k�_^:����E�&MJ�'W��G��I�x��wIHH`����}��x{{_���m��#�<��a��W�{Z2��TZZ�C�T�K�M+��S�Ne��QDGG��Cn��F�I�&M�����m��5��yl�;w���#4m��!a������{��P_�=����������g��~��t���[o���#G��Z+z^VTu��3g�p���R����*���|���T��������������T�9�{�9r����RsZ/}��r�+�.����_U=/�yH(t��{w���;v���������+�=�'N�`��uDEE������pyJV�.��[7�m�������!C

U����>w�\����KW�z{{�J*���T�������_~�������X�|9��?o��67n$  ����p���Y�f|���������_�o������s�]�oZ��2d�_=_�5?��3����x���X�d	�������������m��6���c��,]�������K/1e�~��W����J��5My�raa�����9b��r)���>gu��+��8U�?_��K������V)�p����;t����7�W�f��M�r�-�}�7�j�r����B�����T7n\�����:u*			�����+W%&X�����woM^��8�P�:x� V�U��_��j���o�C����GRRR�����3�;w���_g��>�E������x������IMM��_~)����i��V�y###y���x���HII��k����_wHO�<IVV�Cp���P�����}��X�V�������U}=�<��<��}��������7���ey�Eu����0|}}��WI���s��^��te�����0),,�����sV��k��1�v�BQ�����X��\�w�!(�E�
�t�������'*����\s�5|��dee9���'���Msx�w�y����W*��m����?�g�������AAA���e��;s�L%_qi|������3�D����w��xyy1}�t���?�����*��\�p�T��^�(����I�X�|9.,s��C�����������R�W�=�������p_xx8QQQ�^WAA�C����<����F����'99���;��{��G@@=z�������.�(..���@5��������d�:����;}��a���=zT�}��=,_����AAA��W��k�:�����z�����W_�k����N{�U�9K
W�����[8y�$_~��z[vv�&����������I�+�
������^{-���������]��]y���������c��Q���#55�=z��0{�l�.8����;��7�p�-�p��w�t�R����1c��s�\s
C�%,,��G���?��[7u�bU%%%q�m���o_6l���y�����i��]�^kXX&L`��I�����n��}�����r���2b��J�5{�l>��C������������>"((��[���;y��W���HIIa��y��1777>��c���G�������h��'N�`��U��w�U�}���fdd��aC���N��kG@@����7oV�5���(�z�->L�f�X�x1��m������������/��{/[�l!&&�/������3m�4��\Q����W�^2�V�Z�����_����:t(%P�9/��o���;o��iiix{{��gO����uM�4�e��q�����c���p�����c��c|�A�|�M|�A:v����k��jIo��&�V��S�N�=�V�Zq��y�n��/������h��5�;wf��	�?���P-ZDAA���U��=z4���>#G�d��-DFF2w��R����J�(J,���y���~�N�.��"D��Y���	@���k���,Y�J``�RPP�p_~~�2i�$%66V���T����	&(999�k������2�]�
��7�|�xxx(w�u��
c��UJ�>}���`���G���S���^��?�Tn��Q����_��mCbb�r��w*���J�:u��c�*/^��kU�Zv�h�B���T����<�����S�60[�nU�
�4j�H���V����[o���5+������(������K4h�R�n]���[i���2d�e������~�/������������s�=��k�N	T����v��)~�������Ci������*]�tQ|||���+���~�c�>}Z�����z��)^^^J�6mJ�8��>)��K��������1c�-Z(���Jpp���S'��#%��+��������&M��-ZJ����9T�5k�(:tP����&M�(3g�T�������x@	V�!C�()))e�:}��2f�%::Z���T"""�^�z)����w��!�w������R�~}������b��2[�T��SE9r��r�m�)~~~J�z��'�|RY�lY��S��{��Un�����W��0��8s�����k�#DY,�����
w��9C�z��G\"!!��g��9�(*�~��GK��w��)d�B!��H(�Ba2�
!�B���B!�0�
!�B��$�B!�&#	�B!���N �`�Z9y�$����g�B�"E!##���(�}��D�j8y�$���F�!�B�*8v�
64:CHX
�}?�;FPP���!��������������$��`�

�P!�p1f��e��o!�B�P!��d$B!�0��d��PPP@aa�������S�IB�I�(//�S�N���mt(�E���������!���$t��JRR���DEE���%�Q.EQ������3$%%��iS�6'B�|�:I^^V����h����G�___<==9r�yyy����B�ZJJN&UQr�!���|�!�B��K&�k��e��DEEa�XX�t�����0q�D"##����w��8p��1���g����<@ff���D�f�f�"$$��0�B�R\2����]�v|��e�?u�T�O����3��i����������1��g����X�������k���C��*j�c��q�����W7n��O>��s����bbb�X,X,|}}���a��!����N;��i����B!��\2������w�qG��Ea��i�����~���m��9s�p��I�R�g��-[��L�N����;����-������������c��8p��r��Af�����+������7:�
�<y2�N�b��}��3���z������nthB!��j�*���$�������z[pp0�:ub��
:�
6B�������7776m�Tfb	���Knn��}zz��_�������?��3���4j��������8�����3f@�&�_�5T>$$�i��q���BQ5��g���������������?������@����g$$$�Y�fA�����,\����T���*�z�-.�Z���P_�
7�@dd$'N��;��y��������{��~�
n��f�}�]�����x�UW0w�\<==y��G�<y2�����9�SO=�SO=E��-_��q��q��1�w��g�}Fdd��Z��(
��z�Cgd����������/�w�C1��>uy��c�{��pq�.LNN�~�����__�/99���p��=<<

US�)S�0i��*��(
�������B=��?����y������.""�����x�b>���
=_~~>}���K�.���oxxx��k���o_v��Q�f�c��%11�E���_M��}��s'M�6��s�=�������|��7<��������gO|�A�}�].^�����K
��=�x�?���?����z�F�1z�h�,YB�v�x���=z�������������sqssc��<�����?�Rqk�`J&o����qE��N!\$��M�r���2����r��%oK���K��6K.��n�����f�oV��/�4��w�b��pq�.t�	&���O�����]����_H����]�'�������@QZ�lY��-[�����9s�T]���c�Z���������>#$$���Ws��7_�9�=�g�}���G�����g�e��e|��g���W|��BCC	��������\}�����������i���������X,�7o���;y��w=z4������;T�����9s&qqqP��N�<�R1k%'�������*�8�����"��)�$����o�|�/��w��K��^�YN�aip+M��0��`�C�|u�+�N�����
<Z���u	�������l�O��}���cRRR~���������/���ooo��^S��,KE+w��o����:������C�*�;w����PM��rss�[�n���R���	����Y�j�w��!���;wv�zv�����~���B����=������DFF�:���`�w
���������������.����~��u�B������'���.�;���<�����z���o�}�>���/g����v4�6�U��@�6%o�����C��.���%""��+W�	_zz:�6m��G�����T�l�B����_�Z�t���i��z��8�����r�����x,{��)s��={S[�X,�R�b~~�z=33�:�9�E�������l���T�UV�v%������3�����?`��z��R��b�������e�gz�V�$��
���^H�	�v���;i��+�/������?���_��K�z�����D�C�}��x��9w�����~w
������	!*�%���L<�~�����m�

�Q�F�7��^{��M��K/�DTT��X�e��������G3s�L���;v,C�U���b�Th(�(u�����n��?����r��������3f�z[XX�N�R�?p�������\s
�/&<<����2�y�s�k�.n��F���j
III������k�������z.\s�5|��W�����Q����M����q#M�6U�R///
���Y��P�?�Y�pv�'�B�H��ye?�;������%{
m	_p����*����?0<=�G�?L�8�����U�����jn6r���� ���Q�F1k�,��y���x���HMM�{��,[��ao����3v�Xz������f������������k�������^{���Xv���s�=G�f��8q����={������K
?~�C�k�����_����og���4l��#G��d���y6lH��=y�����������w�!55U}�f��1|�pF���o���W_��3gX�r%m����������������'))�y�����3e�����h��G}��a�x���	

����,Z���?�XM��=��O?��?���[y���x����c����v�Z��������&��$����*{�~�%}G7BNj��yCD�l�-�nS[��_�F��m8����#x[apF&�?
;V�&@�%�L.;�f�X�<y�e'�����`�'E���6m����y��W2d)))(���A��;w.~~~�c�~�m���>���z��������-[�����X�v-���g��Addd��Az���V���~�o����#�������rH�)Z8��k���3�p��	���G��������7�8q"'N������:w����+�?**����3~�xn��frssi��1}��u��w���\�x����www�|�I����'O���&..���\��y/G��%�NNZW��_ ����~��Z��"������52�+�����I&�z�6GQ8�����/��X9��,==���`���J
q������Dll�C���������;�X����;����U�,�<o�9��h\��5��X���������%p�g�!]O?��b��F]l	������F;p����7��}
S,K���p��F�V��[5��GW�R���|������}~��KV��&M�DLL7n����s��	�Q\�QHZ[���!�x,u��ih�u����%�$��UpW��G�R��T��$r�}�����9��V[Nl_�?��%�k���U����\jH���^<��t�7�o�$����%���5����2PhA@!�`���F��	��h���������?�m!�5V�
��wA�����Z�-���|k>����8���,������)X��"�CV-H(�I�*�j8w��c��Yl����k�%>��Ct��"������F��f!O���#�0�� K�4n���
-H(�IH#�*�:k��??-N��$@��m�:L�^~w�;RsSi��^�z������������V��BK�
a���|�T@a>l�~{r�m�����m�[L��X��8�-G������nE��sFX��"�%I�0�ht 5��M��8HI�}�n~
��0:2C�=�����	������"�	I�E@]���!`�I�0	u�����?�~b���.��*���Hm�rg�;����L����H�TRZ�P�P�X�[a���^v�G����_����������w��[��~*���.�
����Y+DY,�.]jtf��AiV+�{>����F�=Ka����`���s3����@uX�8�T��$��{��X,��n������;vT�9f��EHHH��cbbt�RMK���
���7:�j�d�I	�&7�	��b[���6xt=��6y%%g%�<i9#[�t��^�J�9���60BK��2����S�Nq��)V�\�����z��a����*���Q����G/����������`�w��	�N�!s��W�{P���~GZ�m�p_�����$�B��2y{{ADD�����^���c�9s���Wc�XHMMU�m�6,�f����w�}����U�W^y����9�SO=��n�n�:���z|}}�����'� ++K�?&&�W_}��#G�C=��w����c���7�����{�(I�����_~ 77�'�x���p|||���;�7oV���~��':t����7���c��Il��]�y��Y���={�;��???�6m���~�����F�E����3{  ��:�'oL����r���j=������iR��-I�'E��4�����,233�7o�����[������+��M#((H�">���,Y���
2y�d�v�C���o_��;X�x1���sH������];���/^z�%z����M��5k�W��z��������k��<���|��W��=��[�O�>}8�q��/���o���={����x��gh����]w��>v��I2�;vp�-�0|��R�Wc���y��0�v�xt��V���_E|}�k2�3�	����7���~&���\��B}�4Y"�$�����
oD���/�J����O@@YYYDFF�����V�v^^^c�X���p�������@���L�����7nM�6e�������3f���@��=y��g��KHH��'����3xxx����K/�����y��GX�z5�^{-~~~dee1c�f��E�~�����X�b�|�	�=�����'O���nR������k�h���a�x��7�>}:��}��������Y�����Gm���z�]s+���L
����3�{Z���?+�~.eb�]�b��4JwS��&4'@Q�o��m���m�6������C�~�8r������};�f�"  @����V����$�q;vt��������P��Y�o����W_������5k��"���EU���|�u������'�]w{��qx�K�s9m��U����DJJJ��S����/�%���6��a�$����GWr"�!�!�P�c�	`��|d���@�
��<�l�8#�[I�������������Gq��7�%������^ff&?�0O<�D��5j�SI��n����W���MBBm��%77�]�v��������V:�K�s9����b�Z��>��f�x�����x�������s�[�i>_�2S<'��qvNZQ�T�X���~[���;F�I�d��l��b�������	���S��S������Eaa��@e�~�5������pVT�=����������_����n����_�������8���X�~=�7���u�����sy�{-�������_=h��z�$�-e��l����a-���8��dU[x�o�:��Z����L���$''�����={x������d������+�������x���~>&&���LV�\���g���Vo_�v-'N������?�����c��m�68�7�|SjHYHLLd���t��]�m���t��Q�����������s��l�2=z4���<���=FLLIIIl����g����[���H�[����M�������K�WA��_�&���[���IU�!`gR+��

�oCQ�e��Idd$�:ub���|��$$$��������w/m���������^s���]���#�p�]w���S�h��������S+�m��e��5�������������'u�3m��!$$������V(,,T�������<�{���k�����|�r��Y�����o_n��F���X�pa����@1���$X4
s��-0p&���K8�q��GWB������I
��A]"�����Rf������,��u+�3���g����3n���3��o/�|�^{-?��s��;|�p������j���}�2'I���0}�t�O�^�s%$$��s���|����n/��%{#�<EC�F��lSa�["����������U��5�+��4��c�K(�R��W@��!`�%�
a�@�#q���b��o��w�b��wk���t�X��V#��x��$j�����E$Z�P�(�X�3�5o������^A�FG�R���������5��_\�!`=�N BK�
a���
��JX�o����Cd�+��(!����=����W����s��0��3�45�����Z�P���C��'a�h[���}��N�#r9?�������S��M�W�g������X��*9��0�Z;l-����>��@�)FG�rEa���m1/w�
���"��l�FhI@!L��6��4���2<���B�������|�}���]��2����Aa�wW&s�$�djUx���l���u�gtD.i�n[����n�����b���T��l��*�SH�%I�0	u`m�Z��1P�M��QFG���f����h5�R?[2���|��G��2,�$	�&a���5�?�G7��~L�E/L_s���0����J��:l��B�8��"�$	�0�����X,5|
�g-�Q+��K;+'���4	�46:"�t!����������R�K�I�J��-I(J���{�X,X,������g���\�gg��EHH�.q��)^\���gC�.��~��qY������Z���c�������D*�z�9�BK�A�(S��}���������d��1xzz2a��CUTkV^�����
n�wlU���p�B(��U�2\<\tE�X�8��Z����L���DDD��qc}�Qz�����~�;��C�6m���'::��{���L(�����HKKS+����
����?���h��������O>q8��-[���#~~~t���}����k�ZQ,,�����w�Wv������s9���OL�*=�Z�R@�:�l'�$@)�������������|}}9w�nnnL�>���X���o{�1��y>��C�v���i��8q���0r�H6l�����i��III�=��e��������o�#�<����������(A�	��S�����.�	�_4:��(
sm�_�����g�����$	�S���]Q�H����������t�&�<�������r�J�/_���?��q���bbbx���x��G������"88��BDD�����������b�
z��
@�&MJ���_�G����������|||��p��6:����
��f�~���]�#rYNn�`�A|=|��Y���s�	I�M����$e����	 ??����w��+���/����)S��w/������Cvv6~~e'���m���]M����m�p^dd$)))4j�H�WgN.�������:�����W�5D�WP���8!)"	�.dXhA@�z����M���n��Ff�����QQQxxxp��an��V}�Q^�uBCCY�n<�yyy�&���;��g�0����*��bu�!��s�q��z��w��UU.`����Y��rx���~*�^,�@�A��-�oQY,�*��������x���l���j������h������///
nk��
V��5k��C�B.=�� /"�B�FG�����{5�Et`t���x��u�T���h�=Z�U�����������������f�����9��1111dff�r�J��=Kvv6111�5������K��������K%���\v8�4l������'m_�����|����lU����*w8?��_�V���mE� �IE��k��w�y���z��������3e���t���Gy�������0�N�
��3���;y���h���G�&++��WbN�m`\�Cd�;Ppt�fUkW"l�]H�5��ami����W��nEK2�9i#�$C���Y�f�{�SO=�SO=�p�=������3�1c��m>>>���;���;��3!!��/�����/9��b#��d��S���/�X�5����|��Vu�j�&�Y��Q���W2OC�Yn��qDu�=��(�O*�B��Zt�j��0�;A���qi�����T4�g���<gqE����a���3�C�60BK�
aV�}��d���g�n1]�p���f�*Vu����#�p�f�G.y���-�GD5����$�B���U��9i�o1:����Z�&�3�;������O%k��$ �v����q�����T�$�$W�XX>�]�:VV�V����������_��-�0%+��jMm#@���*�I�v{�����W�
3:��x.������xpw��5}�R[�Q"��9��"@�I�L�R����Eq�U������_�xV~'Q�^��9�f"�#������Jg�}X�hN�
-I�$�m�����E���Rr[<���������6���k4:���������l]�������
D���F�
-I@'qww'$$���/A???��{%�(
�����������a�������>�W
��FG���]@�R@��i]����_fP����eV�� 	�ED��[�I�W��7Z�����!Sa�����K�����}_�F�����9��!���`��
�@�b5:QH�D����H������7:Q�yzz:��gW����;��}o

;�K[zp)�4jL��N9FqBR�F�z�Kk���_�S�mF2,�$	�������.DE(%v����x���}58����Z�6~���=j��O,��xx�O�-��L�PC�DhIj�B���C���U����D���6�0:����_9�y�`�`n���i�)��u%�L��T��$�$j|������qi�w�`H�!�z8��NqBr	��)�E ��	
�������^z���X|}}�����W_u(�+�������������{s��C���j�J��4�m_�q�����m)��~f;�n��7~��z&]���J`��
��R�M�z�-f��������={x����:u*�������S�2}�tf����M�����O�>�����P�A'~9�BT���s�&���[���*s 2�l�
-��E ���;��~;��� &&����@����i�x�������3g���g���:�������&�������?jh���gg��������xQ�%w��SH��Z[���++W�d���l���u����_?���HNN�w������S'6l�`X�B8K���~
��������h\��=��*V�Fu�Y�fN?��\�^���S��?������^x���tZ�h���;�������>���d�����s���W��Tnn.��������N}
Bh������6�,���1:�������y��j�|��v�v������Z����?������`��n����������������S�L!88X�����4f!���VK��*�j�WddO������!`{(@-�"��Z�>��s���:�6m�p�=���SO1e�(�M����~�����n�5a������c����J����HM��='��;����h\V�5��{�CQ�O���r�_��$�L���.�
-��0;;�K��tww�j�M����%""��+W������i�&�t�R�sz{{��%������������I������������K�&�u;n�S����.B^�n��vRZ��s����N�F�h��5�������?�G7n���M�6%66��^z���(ht�Bh��XC(
���v����qY�����������n�.YiT�\x���?�g�����-��Lr�Ph��&����/���=�)))DEE���3q�D�1�?�<YYY<��C�����{w�-[�������V�XCR��p�o��o�o1:����?�s~������.]��P��N�z��Yg�n��q�V���R�M�6m��M+�1����'3y�d]c�5n��l����;��h\����mq�Q�����.y.Y��)a@8�����!`��Z;P���-������~s�j��i�Ysl
����R�;����P+RZ�P��Q����N��A��FG�������B��=�
����%��RU)������U�IPhI@!L�F5��W�w�P��qI�9�|s�F�eH���/�S�AkN�-I(�Y��
�}���V��}��)��ehK:��hH�]P$C��������'	�&��4:�>G�]���U�W�����Z�����lP��5'@�%I�0	���4���+�ou�k���8�s�p�p���1,����!`��*V�C��$�B�D�Y,����(���ex��x�yK�s�Z�*`�X+�DhI@!L�F�����_��J6�������z�2��`Ccq�^:f��M��}�������	
H(�I��U������
jm���e����"�;��X��/���d����T��$�$jD�������5`2��9p��O����������a��"��������
���E BK�
a2���'�q=�����M�@�F���6:����$���2C@M����N B�
a�
�
o�'����Xl��p]g/�����`d��F����)+��]f&�Sm&@�%��C����/2l�0RRl�~��'v���w(B��U6(�W�\#�T��������6�-���\i/`���e�$�Z�����tM��YC�6m��iK�,!33�������/���c�"u���Q���Y�o1�����K���{0��n1�f�DhI���^���^c��xyy�������7���c�"���]�������H�M�A@z5�9	�eW��JP�60RZ�5��s'w�qG�����9{V��L�6�N���W 44f�ZWeU�����-����atH�+KPS�V�����5		��������/4h�g(B���@{��&=����+\�o��p�a<�t����b�\R���U�Z�E BK�&�C�e���$''c�X�Z��_��g�}��#k��!j'����7qt�6;q6w6�O��)E=�.W�:��:FU;Y�K�BT��	�o�A�-���&33�V�Zq�
7��kW^|�E=C�t�E���^���c�l�e�_�$�Kds�f<,5��sY�&���.�y���L�;�Z��^[����Z@��$^^^|��GL�8��;w�����W_M��M�CSRW����|i�M�=���o�vs��D�GN��gS�s��lA�O���T���!`�%]��k������.����l���n�A�p�0����v����vm�Y�,OZ���5w���|*�
E��O�B
H�%]��h��]��/�����en��d�*`5���q]���(P
�X�#���6:�r]v�
DK�ZhI��@�J�^��5k���rB�\�T�/��?m�%����l���%����e)�L���v����	��ba��	��;��c����O���TNl!����@Nl��<������������qPczD��}��60��
`Oe��d����	�=�4h���_~�%���#55U�0�0'#*�%����
)�����iyn�j*��S
��Z2�7��W_��Ajj*�z����������u�xd��R�+��c�r"��������\�+��n ����n�60��tMG������}DDk���W�^4j�H�P�0u�^�_a>��v�q7�������~�l��W|���9��
K��V�(��k��>���m�����=[�0�0��E :e���C~6�������m?��mg������-�6:�
)nN\��	`�Y(�/���ed������;vp�UW�����;.���m�:;!L��
�N�������+�W�n���z����B�����}C����<�4�D��8qEnE�vRZpz��}{���	�}��X,������b������aZ�+�u:����������/P�?_�x�nn�*`�1�BI�LA-9=LJJ",,L�.�0��@k!�`�.	`���3�b�Kd��ift8�&%��I�	�4���$�BNO7n\�u!����xf/���WD���+I�Kg��%�j=��p*��t�LRP�v)�`�E�-�21g��������\��o������7�xC�0�05{7=J���v����u]k����_BvA6�!�t�r��i�N �y��
FVW�-���?����^�>))����E�.]�2e
��M�#!LK���}���,�o�g��yP��������^�H3h�HPhI�?����O��y�������Y3�/_E��{�=���G8B���[���j�l�Q�������������rK�[�������rI�T5!@�%]*�g���a�����V�b����			>|X�P�0-E���r3!%�v��$���(
�w���k1owo�C���-����@�C*�BK�$�����:e���j������sg����<9��p��U�N�Om�
�Q��c��?O����{�v�fH�!F�S%W�
�jE*�BK�$�			����;v�i��a�ZIHHP�OLL$&&F�P�0-�*�'��.���+��hk�|[�m���N�\q+8J$��CA�.q�F���-j=]������t�M4n�www�O�����z���s�����ab:�������:�v�5��0������
U}���������`U!C�BK�$�111������wFTT����&Mr�#(���}�V7g��@d�e��3�
{�$����TY��R�m%p�Q�0�$�U������PD-�[�.��kW�}��.���Z5pf�������f�(�����7����/��B@���0��a�JRZ���0	]�������g��}��sr
sh��k#�5:�j�P���Q��N����'��Dh@@!LB�c+8���8�..�0��{pO�{\����*���
 @�$�U�f����Oh@@!LB�
�}i]��~��������7����T[��T�M��-I(�I�?��Vp����l�ep�EQ[����n<�=�����JT+�2���9��

�����o�1�.]�p���/��s��n�:�C�����;y��a-�t����\8���/w6���p4Q�0�
�P+�DhA������O�>�����_��kk����o��g(B��}�����$��]��
�uk0�R���U�����`�`���D��t�!�h�e�Y��qvX��-�����k��9��>�O����n���u�V=C�t�^48��v���rM.`���X��t��K�;�\�������2�J�
����	��}����J�Ljj���a:N_l�F�u������s������F���9��r��xX�*�9�BK�&�<x�������I���/�+P����'�R,����|����j=��p4U�U�H+��r��>�%Z�5=z4O>�$�6m�b�p��I������>���>�g(B�Nq�	O��l��eq��VN8�k[�w��|����}x{����:\��[��B���L�-��:S��^�j���W/�����������g�����3!L���2����z���W��wa99,��j��oeQ&T$'Q{JX2,��kh�X����?�{�9<Hff&�Z�" @�����Y�'�2�[����%57�
������h��(C��f��M��2Cz5xyy���	���60Nxru��,)��X���[�������TfX�T�������Z@��F7�x�eW ����z�#��>;cX������q8�0�j:��p��xX���s%�
i#��k��������|�m���]�5�v����q�Vp�p�o�uI��}�����{��S��'kE�{07
r��'����22PhI���w�-��W^y���L��w��	���O?�Dvv6���|��gt�h��TQ^~�e>��#RSS���3f��i����"����h,y��20
��i��.k��=����w��nt8NS�E ���9i�a`I+�i=<�)��pYF����~��s^�p�n������O?�Dbb"o��6u��Q3u�T�O����3��i����������H�>Nk-��e�W�n������q�J��A����8��
���1#y��
���h��o�����|��g�m����uEQ�6m/��"��~;s���~��,]���C�j�Fs^�h���J�JfY�2F����[*���a����
�
dXhI�p� �I���p��)���O^z�%M�������O����f�4h�c�=����HJJ"99���{�?L�N���a�$���q�@���p�B
�:��@�z����,��L"�`���ZY,�O�088��{777�7o��������5=�����3x�����?�����y��'���b��Q$''P�~}���_��z��rss���U�OOO�4f!��^�q�2,���D�uI������_�	�T�7��Re��T�tMK�:��j�c�����\}�������3gVy���)S�4i���
��CC�
��P�^�����~MF^���#����8]q��C��H�I�'4��"���T>��c&L�������u+'Nh;Y��t��-9z�(�I��O�vx������.5a������c��i���8!���^�eXp�k�Uh-d^�<�iyn������'kEG%e��d�����;v��W/BBB8|�0�G�&44�%K�p��Q�������u���}�n��?�7��!�\�R�O�����M�x��G�|Nooo���5�Q=9e�=o��s��_������{s[�mF����a�
*���8io�����$�B��y���Os�}�q���U���rk����XO=�7n��7�����,X������3�~i�7��^{�o����;w2r�H���8p���Q8e�Zl�����9�m�i6__���E�W�+�9�}�y��B�FhI�
�������[��
��������Z���k&L��������e��i^������'++��z���T�w���e�4oI#DM��U�)%��Mn���l;�
O7O��ft8��t@o���H;�u�\-%@�]@oo�2W�������0��w���r����{��ba���L�<Y�cQ���(,����l�eX���{a~��>��*��]H�-L=
Q�+��K��*�";��j�u���nc������C�/��G�2~�x�g(B���j���g��C`-���a=�:�q�_���=��1:]�HI*�C!����qBD�W��O����tM�~�m233	����������xy����E�Q�����kn��������X�������GW�s+�Cu����SmuiP������+X�n;v� 33�k���a7!�s�%k2����t�X���#�Gw����RC��P*��Q���TEu�p����������4_"-`X�	�����-������xH%�:EM�e��$��kx��'N�-!�F�60g��o�g�[����G�rR~���S�J/�Jp8�$���k���_;|���ORR���I(�3i��0��]7q���
Ng�&�'��M��!�w����h���zf����w_��J(�I�����*u[zz:��{/w�q���a:VE�U���+���v5EQ��8��-���n�]�*�����r<��m�$�R2�*�{O���QePP�&M���^2:!j5uX�PV����%����]��2:�T��_Z�T���Z2<HKK#--��0����4$l2�O��
�@�O���F�l>RGVW���Z�ux����+���S��;w.����3!L�^1��h��p�a�[&l�|)��d�lB�V�`E�q��p]�w�}��{777���5j&L�3!L�x+8
����=�PP���
4	nbt8���VpH�*d����	`RR���B��h�
��
`�L�I����0��(��1\q�J�f���0P��E5��:55�����Ell,���z� ��U�x>	����A�[�����)��ehK�����pW�U��h�~��G������2RZ�m�������?����S�N\}����W�a��q��i�q���z�$�����f
h�������L7y�y,��������jT���7(VH;�}`����Z��x��1:w����'���*-[� 11�3f��sg���/��]��={?~�a	a*�C��|���l�u�����1�G�^<K�_8}c�N��&����� ��;`k���j�{ChH���W^�y��,_������SO��o_�����E��I��l��d�DEaN��nq7�2d	U���Nc[(�+D��-��.[����;$v�������$$$���s�������Y��9+�Nm����z�rg�;����W�U�G�t��Y��\�@Q]�L�9{�,111����I<<<�����GSRw��8�=5W�^��#�������(^R��V0�"�@��tI###ILL,��]�v�G(B��&��,�8i�n�!�����z,X�j����(�V[�H3���60BK�$���g����3��KIIa���8P�P�01
v9�������g���{���Q/����F��(C��&@Q]��|���������c���h�EQ��g, ""��'���e��U�60&��w��Y�;��ZK��K�{W�����3��������F,((R��KX�N6m��?��O-ZDjj*!!!�}����������� ����h_l����-&��O�zmi���pj����$$�u�;r� �(���:�Z�b��(�TE���H�:u�1c~��:&�T����D�!`�Ts
rX�w1#[���We�j@��Cc��v��%	��W��j�����b!<<���p�e*���?/�3l��D��j����������W�^F�S#�m`���D5�>�@\����
��.s��$���C�����Ds���9�m�_F�����[���*ogg?��\���v'!�H(�I������}�v=��W;������wN��V�����	�����!`�I�0�`S@����]\5�������$�+��pj,��K���VtIG��W_}Eff��B��ZT1p�j	P���+�������nq��wN�V���s���� W>#��^�*V�C.N�0>>�7�x���0�����38q���B��Vp&J����cn&2 ��pj�j�H���6`2|ERZ�%�8q"[�l���0��K�G��<y2��m�#!L��[��7���Y�,KZ��V���J�=���D5�:�a��<��c,_��3g�0~�x���G��=i��1c��e���z�$�iT�b�����+��]H�R@��h]�����x�?(����� �
�J�,��m���2lH`` C�a����9s�O?�www6l�`THB�j��*
��7������l���#[�4:�P�����J/��*~�%�S#[�����W/z��F�B8K�N U�������ub�������d�e�(�	�	F����T�I�t�I�i#�ITk/�I�������i\5E���y�����=�P����60��9�frGV����pB���`�V�
��#�����U��:�����	�������ehS,js�<d��$��J]�/@QM�
a�C�U������:�7��{6C�
�����p\�&[�y@`Q�isY�Fh��055���0��
��
��3��vfnk1��p\J�A��0p��@�]���z�����2��u���A�o��g(B���qQ��B����mk�|K�-����K�W����G�������
����	���3���`���X���~��~����s��������J�Z{+�'2O���_@Z�T�&s�
`eI(�K�60���j����3d�n��fbbb������a*%��*=�����n���y���*V:Gv�yhs��q9���D(��/K���Vt���S�c���l�2z��
E'raa���a*%?+*]�K?J!�{C@���+#/�%�0��l�V�UK����\�DhE�
��A�����i��)����_�~���_�������0R���J��Q�D�����4�����.�&.8�nQ���%i�@���A^&d�@`}-��u�(��k�������G�:u*�:u��{L�P�0�j
�����|������#��C�(>����xxCH#�����%,�l'��[�����?�K/�Dl��VRO=��^aaJ%?**��^�e��V^ArV2�>��o���p\V�@
�k^������x��G�-	����s<==�����:���1�{�N��!JQ�$�Z�m1owo�CrY�m`4HH�[�.S�V��j)u�B��D5�:�g���,]�T�C
!.�Tz��6��rz����������et8.M�E aE	�I�#C�B+��l��)�'Of���t�����x�	=��4��
8��
����jt8.M�E HX�DhE���O>!$$�-[��e���,�$�B8I���J�����m�kI�H�V[
�=��1:��i�^3�e��:�u5x��Er�jt(����&%%�y8!D�*��]z�o�3����((����71:��Y#h��J����*����)�TEu��+//�}��QPP`���0�!�Jd�%����6)�9�|s�F����Z�h�ki�<�G�'�]����e�5�����������[s�����������o������R��5����_�S�C��\q����
�;Sh$�h;�3��z�ZEvZ�5�0a��og�����������������R����m��`�_^a�.`d+i����h�j���US�X{�����K��x�b:w�����u��:$��,R���~��������7�������F*�W"m`�Vt��9s����R�gee�_�B8�&s]��(�N�
��-�������j
MWS�@V
d���Ik����5���#?�����=�����������a*%?,�LX�xj#.����;��it8�����)Z	��v]��K�9�B+����������D


���Cbb"���;k���3!L��X�j����f���9���^��#������V��H��piE�`b�LI�ZhE�
`�����m�i�������p6l�@��ES��V���l�~����J���^8����`aD�F�S�8�-��R.�(��k�k�.����������?HLLd��y�i���{����X,���������1c�[�.<���O;-!�T�ZP���i�m�!��	J's���W�^D��k���+�&$��K���VtM���S�n _}���w�17o����_��m�p�SO=�w�}�_|��5k8y�$�
rJB�qp3�����N�Ig/���C�0��H��������H��Rj(C���tM|�Az��Mrr�z����9r$�f���x���>��>��:u���JKK��O>��w��g��t����>������7j�F��g���P�pt�x�b��������a���V�|��rdXhE�p��I�r�-��������`����>����?����7f����O���n��e�����h��F��a��r�/77���t�/!\����R��\��S���������GZM9�S�8�N���]�:�R�?QM��x���>|8�;w���,\���o�]��,Z���[��y��R�%''���EHH������w�N^j��)L�4I�X�p6���[e����0�5����������w���	Q�7������8y�t���]�:����P��sz������m��A���o6���>���n�������'�d��[�U��	x�����������I����'��������\o��X��{�[��M��uM�)@��6��GH����6i#�����,��O?��O?����),,���[�l!%%�k��F������k������|�r���HMMu��>}����r����oooMbBO��B�I�u��;�����	�`PSY��L����	ID������>o-!sEu9=�Z�/S�����;�j����h�����'::OOOV�\������oG��ID�T\�`�����C@��T�w�?n:�������Z�h%pAxxi|�$sY�Vt����o����9��M�:�X���\u�U����S�n]��x���~���P���x�������;wvjlBA�l��g�}H`$������s{�#��-�o���R�Xq�3��F�l������������n���.�~�{zz�c��wE���.nnn<���\�����~htXB8���������
��I�������DDN���
��b>���!	`i-��k�#F��'��yH�����6m�����|����'++�%K�\v���L�l��Z���:���e�j=��pL�i��)1,AT�DhE�����>��S~��:t�������������a�n���{P�pM�5�����pL�i@$,S��-m`Du�����K]���~��db�������`���l���H�OWN�	��d�(��d^;I#h�]�U�V�y8!D��!�J��B��}}�k2�2h������1
Kq��z���r� ���q�A\�l'������?�����:��b���x��v)�2/q���7�a��L��s=� �����,�����je�����qc7nLHH����!��0�2[��A�i�u^ul�3���mq��($*�^��
���!����E�H�DhE�!�������O>��7��[�n�[��W^y���^�u=��4*U�8e�	wo�����4bo�2���<���T���������M*�vRZ�5�={6�����m���A�<��c�
�$j���T[�D����gv�W�_x�y0��0��1gNY	|)�
��:|��yZ�hQ��-Zp��y=C�T���T��j���W�n���0�0��1��60NJH����~��sB*�B+�&��������/u����O�v��ESQ��+R�S[��� '2O���
F�it8���
�O���]?��YGqRZ�ux�������_~��.]��a��;��?��g(B�Je���R��{�cU�t��L���F�cJ���G\��������|�FhE�
`�=��ow�q������2h� ������_�g(B����BC���&#/�%��4~6Tq�����AG���-�;�������Kp��Q������5j$�=��Y�hQE���. K,!+?���8�Eu3:��W����aQx�O���VT��!`Q]�$�G����&//���n��Fz��I��=����#!L�x'�
<�*���|���5~�z�l%i �ng��< ����F�;V
'����.C��W�&55�_~��#Fp�����~4h@�-x��G���/�ES�7���GG^6\,ZiY���r����	�	���F�cj����xO������t��j>�@�"�'���m���77�x#�&Mb��5����b�
��:t�^�aZnW����]z�O�.1U��(��=��-����mtH�����JR��M>P�
���
 //�
6�z�jV�Z��M����b���z�"�iTx8�(��M���le���x�{sW�����t�R�d��R�60B#�$�k��uH�5jD�=x����7o
����B�:u���Q�p@}��TU�����������SNNHt�]����������P��5s��tI������E��_��~�QW����q�v�����H�V[
�=��1:�g�n<xCn�$B�97��@�Vt�������q�����x�������8{�����*�Q��l������&�EA���7�$������m`�����Bc�� �FhE���7�d����;w���z???�N�JTTW]uc����/��#!L��aq�9�5�����7����Fq�Vp%�BY"4��"������G�~�8�<������3g����P�p�0
k�g�Wg��y�4���9�9�m�u��(�I�������(m`�VtM�V+�7of����^��������I�F�4h���a2�]�+�c��PT��f�5��@��pg�CNZ�mU�L�DhE�p���j����A�
HHH`��i�x���������.���j����~��������7�����������vI=j����c�0�W��K8m�4������7�H||��B�T\�r������5'T�����/�Z����?j*�������#��"���6tIO�<��a���P�^��/?=�����6r��|=|�G���Dqcb���+�XG~���5���Z�m+8!�q��d�����S�������{�o�WMg�:�
�]�n����� W�c� ����$�B��:|����j�
�����z,X���4~��t]LQCh�p(��[u:h�!@�I�0{u��m`2�	`�Y<w�\z6�ItP����2��)!�Xl��G��s�D*�B+�%���p��Qrrr�:��N�
�2��a+��^<����`T�QF�#���Vp%�����JPT��	`||<�����B�"�C���kX����g��M�6�kot8��	��������@�#O��;�V�-tss�i���;wN�C
!�(��`�����]������&���	�Hx+[��,H�����'@�]�����<��s���K��
az���A����������w��F�#.C�9�nn��>�\�`��!�]��#G����];|}}	

u�B8Gq�����U�27���cx��x���c��$�9e�� ���	�[�������7��i��<�������A�E�u��u'����D�g����5��[���.��V[U��
����	��Q��O#�m`����,������V�9��0��`��E\�S�l����
)�q����X�A�8s��$�����S�!��������M�[��r����������'4�b���j�nY"��K�������e�<<d���b��(w��}���+������Ld���Q��V��i�<����B����u}������a��O���*�l!���[g��������H�g�R���0����������z��@�VtIo���R�����^x��������3y�d=B�������5`���)P
�&�Z�kmX�r�E F<�9FA�I[���(t%�@�Vt�x��IF�M�6m(((`��m��=������q�U����������T�\Nq�������|HPhE�0--��������Y�r%�}�W]e��[B��aQ��!�*�@c�.=����6�G��� ���`(9�$	�T�FtI�N�J�&M����Y�p!���;�_��B�h��V��q�Bk�����V�����{��[l��F[����
B?RZ�e�/����/�����=���g���%K���&TT,�.cwY}l5�3���mq��~|Q=��������V[����>H������)��a������3'
rl��Og'�� ���]�y��~|Q=��RR\O�$�2,4�K8k�,=#�(G�"�22���������3;�+�/<�<�b����(�����{�o���W�f[8�U���!J��)�N��@�I�5~�%����u?����
��A4��!�����@�O���V$�����:c���7;�y�GV0��H]�-�cO
ic��	q	������,��m����P�^n�=�5��{�cU�t��L����[h��"e���.��`t$N%s�V$����2��d��.��K3�2Xr���_���F,hz3X�!e7����X�H���H(��\�����K���".8��
��v\�=������BL7���?��IPT�$�B�����Y�:�[���g5~�Q��b�f��h��v���&�2,�"	�&�.)s��C������dB}B�5�V]�)��&����vytd�5:�P���o�pe�
a��.d���aXQf��5~�|(���N?�p����k@F�"��b���s5���Z�P(�	�R����
�C�[S����n������]N?�p>�9e�)�H-_
,m`�V$��=�J��fo�����7 n�>�N?�p��!����(�x�W��2:���.b5:��$���n�f�� ��������a��� �_j��!`����B���9����hN�����p��)\{�������o��crrr3fu��%  ���s��i�b�i��q����((\��z��4q���>�U�5%�X����;�4:�����E�Pk�5k�0f�6n���+������o&+�xH����������/�`��5�<y�A���P�Vp:�N�M����0��(�K��F�#m��.�/���
�����8��e�+�f��Exx8[�l��n --�O>����gO>��3Z�l�������A��=����tj�����)��y��\q�S�%�Ucv)�~kom�$�[�P{���!`��Z[�TZZ�����[�l!??���{��i���5b��
e>Gnn.���_B����
����
�X�wU���s�Rc��T�;m�;�0:��
��.S$�V��q����[7���*������"$$������'99����2e
����Wtt�.�Q]�Vu�)�'�^<K�o8}c�:�8��s�Z�*R���:H;at4��60B+�H����]�X�hQ��g��	����_���,F!����\�*`'
+����9k9OwO�G�x��H.�u��&w}et4��9�B+�>;v,��=�V��a���������������OQ�sy{{��%�+�T�ZA��!���6���~|=|�G�8��X5v�M�9��s�#���Z��	��(�;�����_����X��;t����'+W�To��oG��K�.D,��A�<l��
�H�w�S�!�Uc+����$���=FG�	�
Nh��&�c��a��y,X����@���INN���������O�j�*�l��}��G�.]d��u�
`�!`��|(��N����{Z�����������	�]hz����9FG������pY�6�1ciii$$$�~-^�X}������������������`��%��-�3��@J��@�mu�3���&��g��D�����xH��H���^����s�����J��X��>>>|��|����$�Q����}���'����sUu��Y�;�H��Z�x/�*�7GC�1H���etD�";����
��X�"��J����7���>'��G�zmh�^��5�[Mmc����=��3���6�
L�M����P�}V��L����4��)�a�^[����GJ��Z��\=,�pt��5:�j���*V�C.N@!L@)k��V���\��@�����OWV#���TP4+jB�e���h�F���%H(���=��
`�bU[�o9�Z;�X��S5>�x��r���ht4U&m`�V$����s����HJK"�3�AMi�������U E�zBp#�I���Z�T�H(�	��N���V��t0^�=���\�����]��>X]s���Z�PP�Z�����{���	w�;�[��9E��s���Apv?���h�D���V$���X�!`{����7��s���M]�jt$�T����FGS%�!`IEuI(�	�{��9i��������~��2	I�Gl�Y��M�����60��$����PX��l�5^�w!J��_�U�����	�Q<lt$����v}�{FGSi.��F�x�
a"����E� ��[���������U�R�3��!I�e��2�8����T��Z�P(��>��
�������R2�2h����	�
U�uV�+�#W��V���_FGS)2PhE@!L���}���^����Z�����h5w7�j=�p=�#�.��$��v�c��ot4����E�&	�&P�QQ�i�&���]}l5�3����q�W���k���^�ei����UWO1:�
�!`�I�0��C�jZ����8�!�����W� �K��,��\�XT���wM��Yl�R�%	�&`��P��d��]V�����J�7���A��%�������u�����0:�
�
���$�B�@�F��I�5~�%�����MD���K�7���lt4W�.q�7\��
a�W-<�y�GV0���~13������A��m�z�e��
��.I�0�R[W3��g>V�J��N4m�I��5���\�*�s"x����}���\���Z�P(nS�!����X��V��R�$K������������9iFGT.�KO�5�$�B�@�jA5�,9����,�7�[�nE(\�C��8�����nS�:k�M�d���$�B����F��kS`-`��yP4����B��[�
�K�������o��o�	����L��sV�5�*��K~{a�m`,��
m7�Wn'�GV���L�O(�����H��)Qt�����m�����(,0:�rIPT�$�B�����B����xT�9���m���6����Sb���*�Z����7x��lx��hJ�E B+�
a%��K.�D��)[�}n7^n^���.'*\��*`�JP�-�n�8�����"�I�0�
��R��sv�?�@�O����������������ZhtD*�
�H(�	�������+�+��I?��c�@?�K��
 E���� 8�g�Z\��Jmy��Q$������P��8��\O��&�	R���@�^|����m��5o�������TEuI(�	�m`��pZn���Q����pT�
L�KI��kF�^�WBF���m`�(�K@!L�^-p�B���_p��"��4������pq�2)�7�[�D��������$�B��R�*�+�/�g�����Z����E�8����<}a�l��������34Y"�"	�&R�!������g�
�oL_�(\��Z����k
?�]�4����Pd���$�B���
�R�U�%?k9OwO�*\���l|�B��m�z��lLj���f=H(�	�\\�
���M���__���N�Q�&����'%����G�b�/�����T�V$�>+*���������+9/�Z�s����7@^&���w�2WhC@!L�������|����J=����`aD�z�)\�[�E f�Jyx����Z�x����vq�60V���1E�$	�&`�X�-H�]��/�r?7q.=��QP#=B.�R����;�	�� �,��M�$P���H(�	�3~�EW��=w���d�7QIf(�|C��������i8�����E B#�
a
��|{X����}������^��Z����5)��Q�A��E�����N=��8�Z$����+Ts
rX�wU������,1cR�_F}q� ?���8�*C�B+�
aj�_4�������B�"�#�����
WV�;�Iy���E��.��O�wO@A���R�kS��g�PPJ
�N���9�s�r8n�)\��h5sR��w�zO��#[�����zL��HPhE@!L�Z�!�u'�������?���9B���������b���`����7��n��K
a{��
��.I�0���e���m��
n:��]�.N�%8��
��-�M���%o�X
�ZhE@!L�JC�{��eS�&�-�o9�����$2/���X��7��>�v|Q�Ri�|Ky�E�H(���:&����M�o"* J���kS��HN���n�'���5��3��A�;R�T�)e/`�I�0�GEY	�������H�gQE2,y�:�#������
��]���!#�RO%�@�V$�E���2��,�����k���MX��.K*���7<�m���b���~5,�?�8]��1k�m�9I�0"7�V���d�g�����Z��j�
L%�����m��
��5���>Lk?<g^��eXhE@!L��@�%���w��,=������Ih�`l��e���tU4����_B���06�w�9a�PXP���,��mIEuI�W!L@Q",E-(�(�2o�<�iu�n�F�(\�:,II�X,��&[������C���^e�
jW�����f��9�B#�
a
k)�l��c�9�q� � n�����KS�����X�I������g�����a�[���m��AX"��$�B�
a
�ZN����P�}�|~�~FF'\�Ev�N��i$L�=��v9�N��}�xCd}��3���������Q$	�&��c��m��3;���7��ftx��Y�.0�j����}e��%����r���/�����w��)l������W 	�&�(��D�^��%�����
N����@�����B�{�Y��@�	�B�J��+l_�^u5D���6��ZH�P�"	�f`-��%�����8b�����B�!`�$�Ng�	���l�
��n�#l	��M�/;7O[q�kj�\7B���L�0+�'�|�����HNN�]�v���{\w�uF�%��B�O�i)������VR��)��C�����9�n����Ad;�W�Gm%�������Nm�]���!�TP["
AQ����E��o��c��1u�x�b�~�if��I�N��6m}��a��}������=���p* ��.`T�QG%jY��2�m���h�n�Z�@��������!8w����
�'l_��������g�A���e�u� [Q��w H[���	�;�����������9s&?���~�)/�����	���<�/�%��d���Ip�5�ftX������c�l5�b��F�����>oK�'�Z�������0�$d���Tr����2�J$������K�����-��(��������Mz�*�Bs�M�����e&LPosss�w��l��������%77W�>==�)�����c��UNynaN���l��:/�^�#[�T����.{J2u�^B|=
��vK��T�n��������
�R�����5%O%%%��+%wk���o7	@.pr���VE�!�n����{k����i��g�RXXH���n�_�>{��-�g�L���I������l��p�q��x[ �P�P�7�_��bB��H��e��3F�R�Y<�	�+y$[���W�[�x}i,�5��	L�V��	x�������������8������j��W���w�-;��aw|<�O����w_���gd+8�$e�p!�2��j��M��6���a�J�M������;�O�v�����DDD��3���x{;����=�vt�q�B������41:Q�v���:t`����mV���+W��KCcB!�p&�V�~�iF�E�������6mYYY��`!�B����	�]w���3g�8q"����o��e���Z"�BQ�X��[e������FPP���!������s�B!�J@!�B��P!��d$B!�0I�B!LF@!�B��P!��d$B!�0I�B!L��[�U�}���t�CB!D�?����$�����@tt���!���2226:C�^��`�Z9y�$���X,M�;==���h�;f�}
+J����������8y�*N���s�{�(
DEE��f��pR�7776l��c�/�
�������8y�*N���������^���gg��W!���$B!�0Ik(ooo^~�e�������������8y�*N���������r.Y"�Ba2RB!�0I�B!LF@!�B��P!��d$��>��bbb����S�N���F�d�W^y������E������C��u	`����>}�����v�Z@TT���K�:��(
'N$22___z����s��y�NPP!!!<��dff��J��J�����[�<�����c��^M�2�k�����@���8p ���sxLE��=z�������Gxx8�=�:����{���P��z��Gc��j���m�Vm���K~��'�~9��#	`
�x�b�~�i^~�e�n�J�v����)))F�f���[s��)�k��u�}O=��}�_|�k������4��x����E�v���������L�>��3g�i�&������999�c�����Y�b��=k�������U��J�@��}���:�o��j��5�3��7�b�
����������Rs��w���������<~��wf����Y��8q�A��9*�^�=����:u�z�Y���
���o�e����Oz����������A�)})�F�����1c���*QQQ��)S��h/�����]�2�KMMU<==�/��B�m��=
�l��A�(�(_�����jU"""���_�m��������p�BEQ%11Q���7����������8qB�W��K�+EQ�Q�F)��~{�?c��JQ%%%E�5k�(J��������������>f��JPP����k��������(J�=�'�|���1�{�(�R�N���?�sJgR�A�����e�{�Vosss�w��l�����j�E�&M>|8G�`��-���;�o-Z��Q�F�����HNNvxo���������l�����:v��>�w������i�&C�6����	�y��<����;wN����UZZ���P��w6l�M�6��__}L�>}HOOW+>����������W�W]u&L ;;[����Uaa!�-"++�.]��9�3����=Kaa���
P�~}���kX\5A�N��5k��7���SL�4�����]�v������!!!?S�~}���
��&�����)�}������;����Ahh�����}�2h� bcc9t����?���6l��������je��qt������
���+��KNN.�����Y���^�}��4n����(v��������oK�,��W;w��K�.�����_M�V���m��S:�P��~������mK�N�h��1��9�����&j��C�����iC��m���c����������4f�v���0�V������<�6m�I�^�8t�qqqDj�����m�6������/5jk��1:,��!��^�z����Z�t��i"""��&
		�Y�f<x������HMMux��o���r�TDDD�EF�?���_�&M�W���Wc������g��U4l�P��"��"""�<�(qn�&��We������e��������x:t���)Sh���������$�5���:t`����mV���+W��KCc�i2339t����t��OOO��m��}=z���[ll,�Mzz:�6mR��.]������-[������X�V�C���?��s������W��0v�X���k~��Wbcc�����.]��s�N��y����U+_�s]��*��m��-3�We�Z�����9�7�W�G�-R����Y�f)����C=����8�x2�g�yFY�z������_�^����R�^=%%%EQEy��G�F�)�������*]�tQ�t�bt�����P���/����R��w�Q���/���#��(��o�����(�|���c����oWbcc��/����o_����V6m���[�Ni���2l�0_�s�;�E�q�3�k�����d���z��ABa��P�Rx���C�1�;t����z��$J��F��H)�M0�>��yE�yB������~��}�>���V�ZZZ���>{�����i�����b�������ZuuuYqq�MLLX&��-�l6������_�ZMM�%	{������Z$�����f�=�W���9���iK��v��};r��566��p�V699i�t�fffl``�<����13��E������C�,
��c�ljj*�!����***,
����������r��|�b.\���R+**�S�NY&�	t�;%�J��
���g��z����-�Zaa�577�����>����|��}��Y2������f�}��U6��D"a�H�


����:;;7\|�R���$�n�����G����_[KK��������������J3�>����7o�������o���VUUe�������u��P����[ee��B!�D"�����>��<�s��#x�1@�Cp�1@�C��s������Av\~����y����\������w�p�o)���>��{WCCC�������/���[�~K�������X���k�}�-�'N���[===*--U4���7���g%�I���WUUUz�����z���ZZZ������:::����f
?�k��sG�pXO�<Qww����t��i?~\��=S"�PGG����$���jjjR<����FGG���[�9s&���?"�����t��b���{�n��auvv*�ihhH����$��qC�x\W�^Uuu����n���T*�W�^=���uuu��yyy*++Smmm�-�J���{'Iz���R�����������;2n�7��FAA������k������}�$}��I����v����***�}��_�'444���{:|����9���@�	/^����������S�������J&�Z]]
zx�) �����GZ]]U"�Pmm�zzzTRR�]�8��5y�k����)�c��!8�� �c��!8�� �c��!8���?��_�0��IEND�B`�

queue.pngimage/pngDownload

#11

Thomas Munro

thomas.munro@gmail.com

5 months ago

In reply to: Dmitry Dolgov (#10)

Re: Automatically sizing the IO worker pool

On Wed, Jul 30, 2025 at 10:15 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Thanks. I was experimenting with this approach, and realized there isn't
much metrics exposed about workers and the IO queue so far. Since the

Hmm. You can almost infer the depth from the pg_aios view. All IOs
in use are visible there, and the SUBMITTED ones are all either in the
queue, currently being executed by a worker, or being executed
synchronously by a regular backend because the queue was full and in
that case it just falls back to synchronous execution. Perhaps we
just need to be able to distinguish those three cases in that view.
For the synchronous-in-submitter overflow case, I think f_sync should
really show 't', and I'll post a patch for that shortly. For
"currently executing in a worker", I wonder if we could have an "info"
column that queries a new optional callback
pgaio_iomethod_ops->get_info(ioh) where worker mode could return
"worker 3", or something like that.

worker pool growth is based on the queue size and workers try to share
the load uniformly, it makes to have a system view to show those

Actually it's not uniform: it tries to wake up the lowest numbered
worker that advertises itself as idle, in that little bitmap of idle
workers. So if you look in htop you'll see that worker 0 is the most
busy, then worker 1, etc. Only if they are all quite busy does it
become almost uniform, which probably implies you've reached hit
io_max_workers and should probably set it higher (or without this
patch, you should probably just increase io_workers manually, assuming
your I/O hardware can take more).

Originally I made it like that to give higher numbered workers a
chance to time out (anticipating this patch). Later I found another
reason to do it that way:

When I tried uniform distribution using atomic_fetch_add(&distributor,
1) % nworkers to select the worker to wake up, avg(latency) and
stddev(latency) were both higher for simple tests like the one
attached to the first message, when running several copies of it
concurrently. The concentrate-into-lowest-numbers design benefits
from latch collapsing and allows the busier workers to avoid going
back to sleep when they could immediately pick up a new job. I didn't
change that in this patch, though I did tweak the "fan out" logic a
bit, after some experimentation on several machines where I realised
the code in master/18 is a bit over enthusiastic about that and has a
higher spurious wakeup ratio (something this patch actually measures
and tries to reduce).

Here is one of my less successful attempts to do a round-robin system
that tries to adjust the pool size with more engineering, but it was
consistently worse on those latency statistics compared to this
approach, and wasn't even as good at finding a good pool size, so
eventually I realised that it was a dead end and my original work
contrentrating concept was better:

https://github.com/macdice/postgres/tree/io-worker-pool

FWIW the patch in this branch is in this public branch:

https://github.com/macdice/postgres/tree/io-worker-pool-3

Regarding the worker pool growth approach, it sounds reasonable to me.

Great to hear. I wonder what other kinds of testing we should do to
validate this, but I am feeling quite confident about this patch and
thinking it should probably go in sooner rather than later.

With static number of workers one needs to somehow find a number
suitable for all types of workload, where with this patch one needs only
to fiddle with the launch interval to handle possible spikes. It would
be interesting to investigate, how this approach would react to
different dynamics of the queue size. I've plotted one "spike" scenario
in the "Worker pool size response to queue depth", where there is a
pretty artificial burst of IO, making the queue size look like a step
function. If I understand the patch implementation correctly, it would
respond linearly over time (green line), one could also think about
applying a first order butterworth low pass filter to respond quicker
but still smooth (orange line).

Interesting.

There is only one kind of smoothing in the patch currently, relating
to the pool size going down. It models spurious latch wakeups in an
exponentially decaying ratio of wakeups:work. That's the only way I
could find to deal with the inherent sloppiness of the wakeup
mechanism with a shared queue: when you wake the lowest numbered idle
worker as of some moment in time, it might lose the race against an
even lower numbered worker that finishes its current job and steals
the new job. When workers steal jobs, latency decreases, which is
good, so instead of preventing it I eventually figured out that we
should measure it, smooth it, and use it to limit wakeup propagation.
I wonder if that naturally produces curves a bit like your butterworth
line when it's going down already, but I'm not sure.

As for the curve on the way up, hmm, I'm not sure. Yes, it goes up
linearly and is limited by the launch delay, but I was thinking of
that only as the way it grows when the *variation* in workload changes
over a long time frame. In other words, maybe it's not so important
how exactly it grows, it's more important that it achieves a steady
state that can handle the oscillations and spikes in your workload.
The idle timeout creates that steady state by holding the current pool
size for quite a while, so that it can handle your quieter and busier
moments immediately without having to adjust the pool size.

In that other failed attempt I tried to model that more explicitly,
with "active" workers and "spare" workers, with the active set sizes
for average demand with uniform wakeups and the spare set sized for
some number of standard deviations that are woken up only when the
queue is high, but I could never really make it work well...

But in reality the queue size would be of course much more volatile even
on stable workloads, like in "Queue depth over time" (one can see
general oscillation, as well as different modes, e.g. where data is in
the page cache vs where it isn't). Event more, there is a feedback where
increasing number of workers would accelerate queue size decrease --
based on [1] the system utilization for M/M/k depends on the arrival
rate, processing rate and number of processors, where pretty intuitively
more processors reduce utilization. But alas, as you've mentioned this
result exists for Poisson distribution only.

Btw, I assume something similar could be done to other methods as well?
I'm not up to date on io uring, can one change the ring depth on the
fly?

Each backend's io_uring submission queue is configured at startup and
not changeable later, but it is sized for the maximum possible number
that each backend can submit, io_max_concurrency, which corresponds to
the backend's portion of the array of PgAioHandle objects that is
fixed. I suppose you could say that each backend's submission queue
can't overflow at that level, because it's perfectly sized and not
shared with other backends, or to put it another way, the equivalent
of overflow is we won't try to submit more IOs than that.

Worker mode has a shared submission queue, but falls back to
synchronous execution if it's full, which is a bit weird as it makes
your IOs jump the queue in a sense, and that is a good reason to want
this patch so that the pool can try to find the size that avoids that
instead of leaving the user in the dark.

As for the equivalent of pool sizing inside io_uring (and maybe other
AIO systems in other kernels), hmm.... in the absolute best cases
worker threads can be skipped completely, eg for direct I/O queued
straight to the device, but when used, I guess they have pretty
different economics. A kernel can start a thread just by allocating a
bit of memory and sticking it in a queue, and can also wake them (move
them to a different scheduler queue) cheaply, but we have to fork a
giant process that has to open all the files and build up its caches
etc. So I think they just start threads on demand immediately on need
without damping, with some kind of short grace period just to avoid
those smaller costs being repeated. I'm no expert on those internal
details, but our worker system clearly needs all this damping and
steady state discovery heuristics due to the higher overheads and
sloppy wakeups.

Thinking more about our comparatively heavyweight I/O workers, there
must also be affinity opportunities. If you somehow tended to use the
same workers for a given database in a cluster with multiple active
databases, then workers might accumulate fewer open file descriptors
and SMgrRelation cache objects. If you had per-NUMA node pools and
queues then you might be able to reduce contention, and maybe also
cache line ping-pong on buffer headers considering that the submitter
dirties the header, then the worker does (in the completion callback),
and then the submitter accesses it again. I haven't investigated
that.

As a side note, I was trying to experiment with this patch using
dm-mapper's delay feature to introduce an arbitrary large io latency and
see how the io queue is growing. But strangely enough, even though the
pure io latency was high, the queue growth was smaller than e.g. on a
real hardware under the same conditions without any artificial delay. Is
there anything obvious I'm missing that could have explained that?

Could it be alternating full and almost empty due to method_worker.c's
fallback to synchronous on overflow, which slows the submission down,
or something like that, and then you're plotting an average depth that
is lower than you expected? With the patch I'll share shortly to make
pg_aios show a useful f_sync value it might be more obvious...

About dm-mapper delays, I actually found it useful to hack up worker
mode itself to simulate storage behaviours, for example swamped local
disks or cloud storage with deep queues and no back pressure but
artificial IOPS and bandwidth caps, etc. I was thinking about
developing some proper settings to help with that kind of research:
debug_io_worker_queue_size (changeable at runtime),
debug_io_max_worker_queue_size (allocated at startup),
debug_io_worker_{latency,bandwidth,iops} to introduce calculated
sleeps, and debug_io_worker_overflow_policy=synchronous|wait so that
you can disable the synchronous fallback that confuses matters.
That'd be more convenient, portable and flexible than dm-mapper tricks
I guess. I'd been imagining that as a tool to investigate higher
level work on feedback control for read_stream.c as mentioned, but
come to think of it, it could also be useful to understand things
about the worker pool itself. That's vapourware though, for myself I
just used dirty hacks last time I was working on that stuff. In other
words, patches are most welcome if you're interested in that kind of
thing. I am a bit tied up with multithreading at the moment and time
grows short. I will come back to that problem in a little while and
that patch is on my list as part of the infrastructure needed to prove
things about the I/O stream feedback work I hope to share later...