Memory leak of SMgrRelation object on standby

Started by Jingtang Zhang5 months ago9 messages
#1Jingtang Zhang
mrdrivingduck@gmail.com
1 attachment(s)

Hi~ hackers

Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and
smgrclose nolonger removes SMgrRelation object from the hashtable, leaving
the work to smgrdestroyall. But I find a place that relies on the removing
behavior previously, but is still calling smgrclose.

Startup process of standby will redo table dropping with DropRelationFiles,
using smgrdounlinkall to drop buffers and unlink physical files, and then
uses smgrclose to destroy the SMgrRelation object. I think it should use
smgrdestroy here, or the object memory will be leaked.

With concurrent clients, the following pgbench script will produce the
memory leak of a standby startup process easily. Entries will be entered
into the hashtable but never removed.

pgbench -f bench.sql -n -c 32 -j 32 -T 600

```sql
DROP TABLE IF EXISTS tbl:client_id;
CREATE TABLE tbl:client_id (id int);
```

The attached patch export smgrdestroy as a public function, and use it in
DropRelationFiles.


Regards, Jingtang

Attachments:

v1-0001-Fix-SMgrRelation-object-memory-leak-in-DropRelationF.patchapplication/octet-stream; name=v1-0001-Fix-SMgrRelation-object-memory-leak-in-DropRelationF.patch; x-unix-mode=0644Download
From 3d0dcea619b72a2409bb6747ded5a16f7428631d Mon Sep 17 00:00:00 2001
From: Jingtang Zhang <mrdrivingduck@gmail.com>
Date: Fri, 15 Aug 2025 18:11:02 +0800
Subject: [PATCH] Fix SMgrRelation object memory leak in DropRelationFiles

to be done
---
 src/backend/storage/smgr/md.c   | 2 +-
 src/backend/storage/smgr/smgr.c | 3 +--
 src/include/storage/smgr.h      | 1 +
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2ccb0faceb5..f9d65cb5b65 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1607,7 +1607,7 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
-		smgrclose(srels[i]);
+		smgrdestroy(srels[i]);
 	pfree(srels);
 }
 
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index bce37a36d51..a646c2ef725 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -163,7 +163,6 @@ static dlist_head unpinned_relns;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void smgrdestroy(SMgrRelation reln);
 
 static void smgr_aio_reopen(PgAioHandle *ioh);
 static char *smgr_aio_describe_identity(const PgAioTargetData *sd);
@@ -319,7 +318,7 @@ smgrunpin(SMgrRelation reln)
 /*
  * smgrdestroy() -- Delete an SMgrRelation object.
  */
-static void
+void
 smgrdestroy(SMgrRelation reln)
 {
 	ForkNumber	forknum;
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 3964d9334b3..3b07b3ceea1 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -82,6 +82,7 @@ extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
 extern void smgrpin(SMgrRelation reln);
 extern void smgrunpin(SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
+extern void smgrdestroy(SMgrRelation reln);
 extern void smgrdestroyall(void);
 extern void smgrrelease(SMgrRelation reln);
 extern void smgrreleaseall(void);
-- 
2.39.3

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Jingtang Zhang (#1)
Re: Memory leak of SMgrRelation object on standby

On Sat, Aug 16, 2025 at 12:50 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:

Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and
smgrclose nolonger removes SMgrRelation object from the hashtable, leaving
the work to smgrdestroyall. But I find a place that relies on the removing
behavior previously, but is still calling smgrclose.

Thanks for the report! Replying here rather than the pgsql-bugs
thread because your patch is here.

Startup process of standby will redo table dropping with DropRelationFiles,
using smgrdounlinkall to drop buffers and unlink physical files, and then
uses smgrclose to destroy the SMgrRelation object. I think it should use
smgrdestroy here, or the object memory will be leaked.

DropRelationFiles() is also called by FinishPreparedTransaction(). At
first I thought that might be a problem too, but looking a bit more
closely and trying it out... if a prepared transaction dropped a
table, then it called RelationDropStorage(), RelationCloseSmgr(),
smgrunpin(), and I can't immediately think of any way to repin it
while the relation is locked, so you can't break the assertion about
that in smgrdestroy(), where we make sure there are no Relation
objects with dangling references. It's already left unpinned or never
pinned and later destroyed in both DROP; PREPARE TRANSACTION; ...
COMMIT PREPARED; and CREATE; PREPARE TRANSACTION; ... ROLLBACK
PREPARED sequences, with different details.

Now I'm left wondering if two-phase commit should do this explicitly
or not. For the isRedo case it seems clear, it was the intention of
21d9c3ee to destroy it on commit/abort, which must be here I think.
The following description from the commit message *probably* also fits
the two-phase case, even though it already doesn't leak without it.
It would be good to get a comment explaining the new smgrdestroy()
call, and point to where our tests exercise all relevant cases. Hmm,
it's not immediately obvious how to introspect the cached state to
verify that the memory isn't leaked.

Guarantee that the object won't be destroyed until the end of the
current transaction, or in recovery, the commit/abort record that
destroys the underlying storage.

#3Jingtang Zhang
mrdrivingduck@gmail.com
In reply to: Thomas Munro (#2)
1 attachment(s)
Re: Memory leak of SMgrRelation object on standby

Hi~

Thanks for looking.

DropRelationFiles() is also called by FinishPreparedTransaction(). At
first I thought that might be a problem too, but looking a bit more
closely and trying it out... if a prepared transaction dropped a
table, then it called RelationDropStorage(), RelationCloseSmgr(),
smgrunpin(), and I can't immediately think of any way to repin it
while the relation is locked, so you can't break the assertion about
that in smgrdestroy(), where we make sure there are no Relation
objects with dangling references. It's already left unpinned or never
pinned and later destroyed in both DROP; PREPARE TRANSACTION; ...
COMMIT PREPARED; and CREATE; PREPARE TRANSACTION; ... ROLLBACK
PREPARED sequences, with different details.

Another call of DropRelationFiles seems to have been handled by AtEOXact_SMgr,
so there is no leak w/o patch. I have pgbench'ed in one connection for 10 min
with CREATE; PREPARE TRANSACTION; ... ROLLBACK PREPARED; sequence and there
is no rising of RES.

Now I'm left wondering if two-phase commit should do this explicitly
or not. For the isRedo case it seems clear, it was the intention of
21d9c3ee to destroy it on commit/abort, which must be here I think.
The following description from the commit message *probably* also fits
the two-phase case, even though it already doesn't leak without it.
It would be good to get a comment explaining the new smgrdestroy()
call, and point to where our tests exercise all relevant cases.

Explicit call seems to be a duplication if AtEOXact_SMgr can already handle?
Should be do it only for redo?

Some comment has been added to smgrdestroy() to define when it should be
called inside v2 patch.

it's not immediately obvious how to introspect the cached state to
verify that the memory isn't leaked.

It sure is...The way I'm using is benching for a long time to see if RES is
rising…


Regards, Jingtang

Attachments:

v2-0001-Fix-SMgrRelation-object-memory-leak-in-DropRelationF.patchapplication/octet-stream; name=v2-0001-Fix-SMgrRelation-object-memory-leak-in-DropRelationF.patch; x-unix-mode=0444Download
From 7bfbc9de2f0142584089c6778a09ccb7fe1bc4e9 Mon Sep 17 00:00:00 2001
From: Jingtang Zhang <mrdrivingduck@gmail.com>
Date: Tue, 19 Aug 2025 11:13:54 +0800
Subject: [PATCH] Fix SMgrRelation object memory leak in DropRelationFiles

SMgrRelation object would stay alive until end of a transaction, or by
explicit smgrdestroy* call outside of a transaction by background
processes or WAL redo, after the underlying storage is gone. Currently,
the object is closed but not destroyed during WAL redo, causing memory
leak by those already destroyed relations. Fixed by explicit destroy the
objects at the end of DropRelationFiles.
---
 src/backend/storage/smgr/md.c   |  2 +-
 src/backend/storage/smgr/smgr.c | 10 ++++++++--
 src/include/storage/smgr.h      |  1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2ccb0faceb5..f9d65cb5b65 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1607,7 +1607,7 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
-		smgrclose(srels[i]);
+		smgrdestroy(srels[i]);
 	pfree(srels);
 }
 
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index bce37a36d51..f30b8957bd5 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -163,7 +163,6 @@ static dlist_head unpinned_relns;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void smgrdestroy(SMgrRelation reln);
 
 static void smgr_aio_reopen(PgAioHandle *ioh);
 static char *smgr_aio_describe_identity(const PgAioTargetData *sd);
@@ -318,8 +317,15 @@ smgrunpin(SMgrRelation reln)
 
 /*
  * smgrdestroy() -- Delete an SMgrRelation object.
+ *
+ * Remove the object out of hashtable. We must ensure there is no dangling
+ * reference to the object by this time.
+ *
+ * Called at the end of a transaction through AtEOXact_SMgr(). Also called
+ * outside of a transaction through smgrdestroyall() or directly itself,
+ * for background processes and WAL redo.
  */
-static void
+void
 smgrdestroy(SMgrRelation reln)
 {
 	ForkNumber	forknum;
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 3964d9334b3..3b07b3ceea1 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -82,6 +82,7 @@ extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
 extern void smgrpin(SMgrRelation reln);
 extern void smgrunpin(SMgrRelation reln);
 extern void smgrclose(SMgrRelation reln);
+extern void smgrdestroy(SMgrRelation reln);
 extern void smgrdestroyall(void);
 extern void smgrrelease(SMgrRelation reln);
 extern void smgrreleaseall(void);
-- 
2.39.3

#4Jingtang Zhang
mrdrivingduck@gmail.com
In reply to: Thomas Munro (#2)
Re: Memory leak of SMgrRelation object on standby

Hi~

On Sat, Aug 16, 2025 at 12:50 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:

Back to v17, commit 21d9c3ee gave SMgrRelation a well-defined lifetime, and
smgrclose nolonger removes SMgrRelation object from the hashtable, leaving
the work to smgrdestroyall. But I find a place that relies on the removing
behavior previously, but is still calling smgrclose.

Also, in this situation, should startup process be treated as a background
worker similar to bgwriter/checkpointer and call smgrdestroyall in some
period? Even if startup process begins to call smgrdestroy inside
DropRelationFiles, suppose, there are a lot of transactions keep creating
tables on primary, the startup process of standby will open and create but
do not have any chance to destroy a SMgrRelation object, so the memory
will always grow. It seems to be true even if smgrclose is responsible for
destroying the object previously, because I can't find any smgrclose during
WAL recovery, except for DROP DATABASE which is rarely used in production.


Regards, Jingtang

#5邱宇航
iamqyh@gmail.com
In reply to: Jingtang Zhang (#4)
Re: Memory leak of SMgrRelation object on standby

2025年8月21日 23:07,Jingtang Zhang <mrdrivingduck@gmail.com> 写道:

Also, in this situation, should startup process be treated as a background
worker similar to bgwriter/checkpointer and call smgrdestroyall in some
period?

Agree with that. Maybe we can call smgrdestroyall in startup process when
replaying CHECKPOINT records, just like bgwriter/checkpointer, which free
all smgr objects after any checkpoint.

Best regards,
Yuhang Qiu

#6Jingtang Zhang
mrdrivingduck@gmail.com
In reply to: 邱宇航 (#5)
1 attachment(s)
Re: Memory leak of SMgrRelation object on standby

Hi~

Agree with that. Maybe we can call smgrdestroyall in startup process when
replaying CHECKPOINT records, just like bgwriter/checkpointer, which free
all smgr objects after any checkpoint.

That seems reasonable, in that case a startup process would behave just the
same as bgwriter or checkpointer.

I purpose a patch which calls smgrdestroyall() when redo each
XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling
smgrdestroyall() as background processes on primary. I don't call it for
XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the
memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it
seems to be a place holder only.


Regards, Jingtang

Attachments:

v3-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patchapplication/octet-stream; name=v3-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patch; x-unix-mode=0644Download
From 605b8a383e54e9c9282d129f49bbe7c8a9f4f04e Mon Sep 17 00:00:00 2001
From: mrdrivingduck <mrdrivingduck@gmail.com>
Date: Mon, 25 Aug 2025 23:46:06 +0800
Subject: [PATCH] Fix SMgrRelation object memory leak during startup redo

---
 src/backend/access/transam/xlog.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7ffb2179151..cadd390db39 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8438,6 +8438,14 @@ xlog_redo(XLogReaderState *record)
 							checkPoint.ThisTimeLineID, replayTLI)));
 
 		RecoveryRestartPoint(&checkPoint, record);
+
+		/*
+		 * After any checkpoint, free all smgr objects.  Otherwise we
+		 * would never do so for dropped relations, as the startup does
+		 * not process shared invalidation messages or call
+		 * AtEOXact_SMgr().
+		 */
+		smgrdestroyall();
 	}
 	else if (info == XLOG_OVERWRITE_CONTRECORD)
 	{
-- 
2.39.5 (Apple Git-154)

#7Jingtang Zhang
mrdrivingduck@gmail.com
In reply to: Jingtang Zhang (#6)
1 attachment(s)
Re: Memory leak of SMgrRelation object on standby

Hi~

I purpose a patch which calls smgrdestroyall() when redo each
XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling
smgrdestroyall() as background processes on primary. I don't call it for
XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the
memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it
seems to be a place holder only.

Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be
called, since the startup may not exit on standby.

The patch is updated.


Regards, Jingtang

Attachments:

v4-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patchapplication/octet-stream; name=v4-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patch; x-unix-mode=0644Download
From 9144987b609767e6fec75aff5dfb64d797a8ed52 Mon Sep 17 00:00:00 2001
From: Jingtang Zhang <mrdrivingduck@gmail.com>
Date: Tue, 26 Aug 2025 11:54:29 +0800
Subject: [PATCH] Fix SMgrRelation object memory leak during startup redo

---
 src/backend/access/transam/xlog.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7ffb2179151..7817255616f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8385,6 +8385,14 @@ xlog_redo(XLogReaderState *record)
 							checkPoint.ThisTimeLineID, replayTLI)));
 
 		RecoveryRestartPoint(&checkPoint, record);
+
+		/*
+		 * After any checkpoint, free all smgr objects.  Otherwise we
+		 * would never do so for dropped relations, as the startup does
+		 * not process shared invalidation messages or call
+		 * AtEOXact_SMgr().
+		 */
+		smgrdestroyall();
 	}
 	else if (info == XLOG_CHECKPOINT_ONLINE)
 	{
@@ -8438,6 +8446,14 @@ xlog_redo(XLogReaderState *record)
 							checkPoint.ThisTimeLineID, replayTLI)));
 
 		RecoveryRestartPoint(&checkPoint, record);
+
+		/*
+		 * After any checkpoint, free all smgr objects.  Otherwise we
+		 * would never do so for dropped relations, as the startup does
+		 * not process shared invalidation messages or call
+		 * AtEOXact_SMgr().
+		 */
+		smgrdestroyall();
 	}
 	else if (info == XLOG_OVERWRITE_CONTRECORD)
 	{
-- 
2.39.5 (Apple Git-154)

#8邱宇航
iamqyh@gmail.com
In reply to: Jingtang Zhang (#7)
Re: Memory leak of SMgrRelation object on standby

2025年8月26日 11:59,Jingtang Zhang <mrdrivingduck@gmail.com> 写道:

Hi~

I purpose a patch which calls smgrdestroyall() when redo each
XLOG_CHECKPOINT_ONLINE, so that it can keep the same frequency of calling
smgrdestroyall() as background processes on primary. I don't call it for
XLOG_CHECKPOINT_SHUTDOWN because the process is about to exit so that the
memory will go soon, and don't call it for XLOG_CHECKPOINT_REDO because it
seems to be a place holder only.

Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be
called, since the startup may not exit on standby.

The patch is updated.


Regards, Jingtang

<v4-0001-Fix-SMgrRelation-object-memory-leak-during-startup-r.patch>

LGTM.

Best regards,
Yuhang Qiu

#9Michael Paquier
michael@paquier.xyz
In reply to: 邱宇航 (#8)
Re: Memory leak of SMgrRelation object on standby

On Tue, Sep 09, 2025 at 11:58:51AM +0800, 邱宇航 wrote:

Oops. When redo XLOG_CHECKPOINT_SHUTDOWN, smgrdestroyall should also be
called, since the startup may not exit on standby.

The patch is updated.

True that the situation sucks for the startup process, bloating its
memory. That's hard to reach, still for long-running startup
processes, which is a common thing, that's rather bad.

LGTM.

Hmm. I was playing a bit with the startup process and, after planting
a few calls to hash_get_num_entries(SMgrRelationHash) the bloat is
measurable. On wraparound, it would mean that the hash table could
point to past entries in this context.

I can get behind the patch and the proposal of forcing a cleanup each
time a checkpoint record is replayed, outside of
RecoveryRestartPoint(), so I'll see about applying and backpatching
that. Thanks for the report.
--
Michael