Invalid memory access in pg_stat_get_subscription
Hello hackers,
While exploring some code in logical replication worker
implementation, I noticed that we're accessing an invalid memory while
traversing LogicalRepCtx->workers[i].
For the above structure, we're allocating
max_logical_replication_workers times LogicalRepWorker amount of
memory in ApplyLauncherShmemSize. But, in the for loop, we're
accessing the max_logical_replication_workers + 1 location which is
resulting in random crashes.
Please find the patch that fixes the issue. I'm not sure whether we
should add a regression test for the same.
--
Thanks & Regards,
Kuntal Ghosh
Attachments:
0001-Fix-terminating-condition-while-fetching-the-replica.patchapplication/octet-stream; name=0001-Fix-terminating-condition-while-fetching-the-replica.patchDownload
From f8efb6e98bd99e544542e0060d621109ef2bdefb Mon Sep 17 00:00:00 2001
From: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Date: Tue, 7 Jun 2022 22:24:55 +0530
Subject: [PATCH] Fix terminating condition while fetching the replication
worker status
While fetching the replication worker status in
pg_stat_get_subscription, we're accessing invalid memory resulting to
random crash.
---
src/backend/replication/logical/launcher.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index bd5f78cf9a..2bdab53e19 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -935,7 +935,7 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS)
/* Make sure we get consistent view of the workers. */
LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
- for (i = 0; i <= max_logical_replication_workers; i++)
+ for (i = 0; i < max_logical_replication_workers; i++)
{
/* for each row */
Datum values[PG_STAT_GET_SUBSCRIPTION_COLS];
--
2.27.0
Kuntal Ghosh <kuntalghosh.2007@gmail.com> writes:
While exploring some code in logical replication worker
implementation, I noticed that we're accessing an invalid memory while
traversing LogicalRepCtx->workers[i].
For the above structure, we're allocating
max_logical_replication_workers times LogicalRepWorker amount of
memory in ApplyLauncherShmemSize. But, in the for loop, we're
accessing the max_logical_replication_workers + 1 location which is
resulting in random crashes.
I concur that that's a bug, but eyeing the code, it seems like an
actual crash would be improbable. Have you seen one? Can you
reproduce it?
Please find the patch that fixes the issue. I'm not sure whether we
should add a regression test for the same.
How would you make a stable regression test for that?
regards, tom lane
Hello Tom,
On Wed, Jun 8, 2022 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kuntal Ghosh <kuntalghosh.2007@gmail.com> writes:
While exploring some code in logical replication worker
implementation, I noticed that we're accessing an invalid memory while
traversing LogicalRepCtx->workers[i].
For the above structure, we're allocating
max_logical_replication_workers times LogicalRepWorker amount of
memory in ApplyLauncherShmemSize. But, in the for loop, we're
accessing the max_logical_replication_workers + 1 location which is
resulting in random crashes.I concur that that's a bug, but eyeing the code, it seems like an
actual crash would be improbable. Have you seen one? Can you
reproduce it?
Thank you for looking into it. Unfortunately, I'm not able to
reproduce the crash, but I've seen one crash while executing the
function. The crash occurred at the following line:
if (!worker.proc || !IsBackendPid(worker.proc->pid))
(gdb) p worker.proc
$6 = (PGPROC *) 0x2bf0b9
The PGPROC structure was pointing to an invalid memory location.
--
Thanks & Regards,
Kuntal Ghosh