WaitEventSet resource leakage

Started by Tom Lanealmost 3 years ago7 messages
#1Tom Lane
tgl@sss.pgh.pa.us

In [1]/messages/by-id/423731.1678381075@sss.pgh.pa.us I wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

The following script:
[ leaks a file descriptor per error ]

Yeah, at least on platforms where WaitEventSets own kernel file
descriptors. I don't think it's postgres_fdw's fault though,
but that of ExecAppendAsyncEventWait, which is ignoring the
possibility of failing partway through. It looks like it'd be
sufficient to add a PG_CATCH or PG_FINALLY block there to make
sure the WaitEventSet is disposed of properly --- fortunately,
it doesn't need to have any longer lifespan than that one
function.

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

regards, tom lane

[1]: /messages/by-id/423731.1678381075@sss.pgh.pa.us

#2Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#1)
1 attachment(s)
Re: WaitEventSet resource leakage

(Alexander just reminded me of this off-list)

On 09/03/2023 20:51, Tom Lane wrote:

In [1] I wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

The following script:
[ leaks a file descriptor per error ]

Yeah, at least on platforms where WaitEventSets own kernel file
descriptors. I don't think it's postgres_fdw's fault though,
but that of ExecAppendAsyncEventWait, which is ignoring the
possibility of failing partway through. It looks like it'd be
sufficient to add a PG_CATCH or PG_FINALLY block there to make
sure the WaitEventSet is disposed of properly --- fortunately,
it doesn't need to have any longer lifespan than that one
function.

Here's a patch to do that. For back branches.

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

Agreed. The current signature of CurrentWaitEventSet is:

WaitEventSet *
CreateWaitEventSet(MemoryContext context, int nevents)

Passing MemoryContext makes little sense when the WaitEventSet also
holds file descriptors. With anything other than TopMemoryContext, you
need to arrange for proper cleanup with PG_TRY-PG_CATCH or by avoiding
ereport() calls. And once you've arrange for cleanup, the memory context
doesn't matter much anymore.

Let's change it so that it's always allocated in TopMemoryContext, but
pass a ResourceOwner instead:

WaitEventSet *
CreateWaitEventSet(ResourceOwner owner, int nevents)

And use owner == NULL to mean session lifetime.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Fix-resource-leak-when-a-FDW-s-ForeignAsyncReques.patchtext/x-patch; charset=UTF-8; name=v1-0001-Fix-resource-leak-when-a-FDW-s-ForeignAsyncReques.patchDownload
From b9ea609855b838369cddb33e4045ac91603dd726 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 15 Nov 2023 23:44:56 +0100
Subject: [PATCH v1 1/1] Fix resource leak when a FDW's ForeignAsyncRequest
 function fails

If an error is thrown after calling CreateWaitEventSet(), the memory
of a WaitEventSet is free'd as it's allocated in the short-lived
memory context, but the file descriptor (on epoll- or kqueue-based
systems) or handles (on Windows) that it contains are leaked. Use
PG_TRY-FINALLY to ensure it gets freed.

In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.

The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.

Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane.

Discussion: https://www.postgresql.org/message-id/17828-122da8cba23236be@postgresql.org
Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
---
 .../postgres_fdw/expected/postgres_fdw.out    |  7 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  6 ++
 src/backend/executor/nodeAppend.c             | 66 +++++++++++--------
 3 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 64bcc66b8d..22cae37a1e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10809,6 +10809,13 @@ SELECT * FROM result_tbl ORDER BY a;
 (2 rows)
 
 DELETE FROM result_tbl;
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+ERROR:  relation "public.non_existent_table" does not exist
+CONTEXT:  remote SQL command: SELECT a, b, c FROM public.non_existent_table
+DROP FOREIGN TABLE async_p_broken;
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2d14eeadb5..075da4ff86 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3607,6 +3607,12 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505;
 SELECT * FROM result_tbl ORDER BY a;
 DELETE FROM result_tbl;
 
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+DROP FOREIGN TABLE async_p_broken;
+
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..99818d3ebc 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1025,43 +1025,51 @@ ExecAppendAsyncEventWait(AppendState *node)
 	/* We should never be called when there are no valid async subplans. */
 	Assert(node->as_nasyncremain > 0);
 
+	Assert(node->as_eventset == NULL);
 	node->as_eventset = CreateWaitEventSet(CurrentMemoryContext, nevents);
-	AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
-					  NULL, NULL);
-
-	/* Give each waiting subplan a chance to add an event. */
-	i = -1;
-	while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
+	PG_TRY();
 	{
-		AsyncRequest *areq = node->as_asyncrequests[i];
+		AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+						  NULL, NULL);
 
-		if (areq->callback_pending)
-			ExecAsyncConfigureWait(areq);
-	}
+		/* Give each waiting subplan a chance to add an event. */
+		i = -1;
+		while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
+		{
+			AsyncRequest *areq = node->as_asyncrequests[i];
 
-	/*
-	 * No need for further processing if there are no configured events other
-	 * than the postmaster death event.
-	 */
-	if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
+			if (areq->callback_pending)
+				ExecAsyncConfigureWait(areq);
+		}
+
+		/*
+		 * No need for further processing if there are no configured events
+		 * other than the postmaster death event.
+		 */
+		if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
+		{
+			FreeWaitEventSet(node->as_eventset);
+			node->as_eventset = NULL;
+			return;
+		}
+
+		/* Return at most EVENT_BUFFER_SIZE events in one call. */
+		if (nevents > EVENT_BUFFER_SIZE)
+			nevents = EVENT_BUFFER_SIZE;
+
+		/*
+		 * If the timeout is -1, wait until at least one event occurs.  If the
+		 * timeout is 0, poll for events, but do not wait at all.
+		 */
+		noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
+									 nevents, WAIT_EVENT_APPEND_READY);
+	}
+	PG_FINALLY();
 	{
 		FreeWaitEventSet(node->as_eventset);
 		node->as_eventset = NULL;
-		return;
 	}
-
-	/* We wait on at most EVENT_BUFFER_SIZE events. */
-	if (nevents > EVENT_BUFFER_SIZE)
-		nevents = EVENT_BUFFER_SIZE;
-
-	/*
-	 * If the timeout is -1, wait until at least one event occurs.  If the
-	 * timeout is 0, poll for events, but do not wait at all.
-	 */
-	noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
-								 nevents, WAIT_EVENT_APPEND_READY);
-	FreeWaitEventSet(node->as_eventset);
-	node->as_eventset = NULL;
+	PG_END_TRY();
 	if (noccurred == 0)
 		return;
 
-- 
2.39.2

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: WaitEventSet resource leakage

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 09/03/2023 20:51, Tom Lane wrote:

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

Let's change it so that it's always allocated in TopMemoryContext, but
pass a ResourceOwner instead:
WaitEventSet *
CreateWaitEventSet(ResourceOwner owner, int nevents)
And use owner == NULL to mean session lifetime.

WFM. (I didn't study your back-branch patch.)

regards, tom lane

#4Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#3)
1 attachment(s)
Re: WaitEventSet resource leakage

On 16/11/2023 01:08, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 09/03/2023 20:51, Tom Lane wrote:

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

Let's change it so that it's always allocated in TopMemoryContext, but
pass a ResourceOwner instead:
WaitEventSet *
CreateWaitEventSet(ResourceOwner owner, int nevents)
And use owner == NULL to mean session lifetime.

WFM. (I didn't study your back-branch patch.)

And here is a patch to implement that on master.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Use-ResourceOwner-to-track-WaitEventSets.patchtext/x-patch; charset=UTF-8; name=v1-0001-Use-ResourceOwner-to-track-WaitEventSets.patchDownload
From cc88af75011208fc7e9a2bba6b27e437edfab952 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 16 Nov 2023 11:16:58 +0100
Subject: [PATCH v1 1/1] Use ResourceOwner to track WaitEventSets.

A WaitEventSet holds file descriptors or event handles (on Windows).
If FreeWaitEventSet is not called, those fds or handles are leaked.
Use ResourceOwners to track WaitEventSets, to clean those up
automatically on error.

This was a live bug in async Append nodes, if a FDW's
ForeignAsyncRequest function failed. (In back branches, I will apply a
more localized fix for that based on PG_TRY-PG_FINALLY.) The added
test doesn't check for leaking resources, so it passed even before
this commit. But at least it covers the code path.

In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.

Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane. Fixes bug #17828.

Discussion: https://www.postgresql.org/message-id/472235.1678387869@sss.pgh.pa.us
---
 .../postgres_fdw/expected/postgres_fdw.out    |  7 +++
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  6 ++
 src/backend/executor/nodeAppend.c             |  5 +-
 src/backend/libpq/pqcomm.c                    |  2 +-
 src/backend/postmaster/postmaster.c           |  2 +-
 src/backend/postmaster/syslogger.c            |  2 +-
 src/backend/storage/ipc/latch.c               | 63 +++++++++++++++++--
 src/include/storage/latch.h                   |  4 +-
 src/include/utils/resowner.h                  |  1 +
 9 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 64bcc66b8d..22cae37a1e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10809,6 +10809,13 @@ SELECT * FROM result_tbl ORDER BY a;
 (2 rows)
 
 DELETE FROM result_tbl;
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+ERROR:  relation "public.non_existent_table" does not exist
+CONTEXT:  remote SQL command: SELECT a, b, c FROM public.non_existent_table
+DROP FOREIGN TABLE async_p_broken;
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2d14eeadb5..075da4ff86 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3607,6 +3607,12 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505;
 SELECT * FROM result_tbl ORDER BY a;
 DELETE FROM result_tbl;
 
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+DROP FOREIGN TABLE async_p_broken;
+
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..af8e37205f 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1025,7 +1025,8 @@ ExecAppendAsyncEventWait(AppendState *node)
 	/* We should never be called when there are no valid async subplans. */
 	Assert(node->as_nasyncremain > 0);
 
-	node->as_eventset = CreateWaitEventSet(CurrentMemoryContext, nevents);
+	Assert(node->as_eventset == NULL);
+	node->as_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
 	AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
 					  NULL, NULL);
 
@@ -1050,7 +1051,7 @@ ExecAppendAsyncEventWait(AppendState *node)
 		return;
 	}
 
-	/* We wait on at most EVENT_BUFFER_SIZE events. */
+	/* Return at most EVENT_BUFFER_SIZE events in one call. */
 	if (nevents > EVENT_BUFFER_SIZE)
 		nevents = EVENT_BUFFER_SIZE;
 
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 522584e597..2802efc63f 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -207,7 +207,7 @@ pq_init(void)
 		elog(FATAL, "fcntl(F_SETFD) failed on socket: %m");
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, FeBeWaitSetNEvents);
+	FeBeWaitSet = CreateWaitEventSet(NULL, FeBeWaitSetNEvents);
 	socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
 								   MyProcPort->sock, NULL, NULL);
 	latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET,
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 7b6b613c4a..7a5cd06c5c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1695,7 +1695,7 @@ ConfigurePostmasterWaitSet(bool accept_connections)
 		FreeWaitEventSet(pm_wait_set);
 	pm_wait_set = NULL;
 
-	pm_wait_set = CreateWaitEventSet(CurrentMemoryContext,
+	pm_wait_set = CreateWaitEventSet(NULL,
 									 accept_connections ? (1 + NumListenSockets) : 1);
 	AddWaitEventToSet(pm_wait_set, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch,
 					  NULL);
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 858a2f6b2b..96dd03d9e0 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -311,7 +311,7 @@ SysLoggerMain(int argc, char *argv[])
 	 * syslog pipe, which implies that all other backends have exited
 	 * (including the postmaster).
 	 */
-	wes = CreateWaitEventSet(CurrentMemoryContext, 2);
+	wes = CreateWaitEventSet(NULL, 2);
 	AddWaitEventToSet(wes, WL_LATCH_SET, PGINVALID_SOCKET, MyLatch, NULL);
 #ifndef WIN32
 	AddWaitEventToSet(wes, WL_SOCKET_READABLE, syslogPipe[0], NULL, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 2fd386a4ed..b5c6b1e9b2 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -62,6 +62,7 @@
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
 #include "utils/memutils.h"
+#include "utils/resowner.h"
 
 /*
  * Select the fd readiness primitive to use. Normally the "most modern"
@@ -101,6 +102,8 @@
 /* typedef in latch.h */
 struct WaitEventSet
 {
+	ResourceOwner owner;
+
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
@@ -195,6 +198,30 @@ static void WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event);
 static inline int WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 										WaitEvent *occurred_events, int nevents);
 
+/* ResourceOwner support to hold WaitEventSets */
+static void ResOwnerReleaseWaitEventSet(Datum res);
+
+static const ResourceOwnerDesc wait_event_set_resowner_desc =
+{
+	.name = "WaitEventSet",
+	.release_phase = RESOURCE_RELEASE_AFTER_LOCKS,
+	.release_priority = RELEASE_PRIO_WAITEVENTSETS,
+	.ReleaseResource = ResOwnerReleaseWaitEventSet,
+	.DebugPrint = NULL
+};
+
+static inline void
+ResourceOwnerRememberWaitEventSet(ResourceOwner owner, WaitEventSet *set)
+{
+	ResourceOwnerRemember(owner, PointerGetDatum(set), &wait_event_set_resowner_desc);
+}
+static inline void
+ResourceOwnerForgetWaitEventSet(ResourceOwner owner, WaitEventSet *set)
+{
+	ResourceOwnerForget(owner, PointerGetDatum(set), &wait_event_set_resowner_desc);
+}
+
+
 /*
  * Initialize the process-local latch infrastructure.
  *
@@ -323,7 +350,7 @@ InitializeLatchWaitSet(void)
 	Assert(LatchWaitSet == NULL);
 
 	/* Set up the WaitEventSet used by WaitLatch(). */
-	LatchWaitSet = CreateWaitEventSet(TopMemoryContext, 2);
+	LatchWaitSet = CreateWaitEventSet(NULL, 2);
 	latch_pos = AddWaitEventToSet(LatchWaitSet, WL_LATCH_SET, PGINVALID_SOCKET,
 								  MyLatch, NULL);
 	if (IsUnderPostmaster)
@@ -541,7 +568,7 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 	int			ret = 0;
 	int			rc;
 	WaitEvent	event;
-	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3);
+	WaitEventSet *set = CreateWaitEventSet(CurrentResourceOwner, 3);
 
 	if (wakeEvents & WL_TIMEOUT)
 		Assert(timeout >= 0);
@@ -716,9 +743,12 @@ ResetLatch(Latch *latch)
  *
  * These events can then be efficiently waited upon together, using
  * WaitEventSetWait().
+ *
+ * The WaitEventSet is tracked by the given 'resowner'.  Use NULL for session
+ * lifetime.
  */
 WaitEventSet *
-CreateWaitEventSet(MemoryContext context, int nevents)
+CreateWaitEventSet(ResourceOwner resowner, int nevents)
 {
 	WaitEventSet *set;
 	char	   *data;
@@ -744,7 +774,10 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	sz += MAXALIGN(sizeof(HANDLE) * (nevents + 1));
 #endif
 
-	data = (char *) MemoryContextAllocZero(context, sz);
+	if (resowner != NULL)
+		ResourceOwnerEnlarge(resowner);
+
+	data = (char *) MemoryContextAllocZero(TopMemoryContext, sz);
 
 	set = (WaitEventSet *) data;
 	data += MAXALIGN(sizeof(WaitEventSet));
@@ -770,6 +803,12 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	set->nevents_space = nevents;
 	set->exit_on_postmaster_death = false;
 
+	if (resowner != NULL)
+	{
+		ResourceOwnerRememberWaitEventSet(resowner, set);
+		set->owner = resowner;
+	}
+
 #if defined(WAIT_USE_EPOLL)
 	if (!AcquireExternalFD())
 	{
@@ -834,6 +873,12 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 void
 FreeWaitEventSet(WaitEventSet *set)
 {
+	if (set->owner)
+	{
+		ResourceOwnerForgetWaitEventSet(set->owner, set);
+		set->owner = NULL;
+	}
+
 #if defined(WAIT_USE_EPOLL)
 	close(set->epoll_fd);
 	ReleaseExternalFD();
@@ -2300,3 +2345,13 @@ drain(void)
 }
 
 #endif
+
+static void
+ResOwnerReleaseWaitEventSet(Datum res)
+{
+	WaitEventSet *set = (WaitEventSet *) DatumGetPointer(res);
+
+	Assert(set->owner != NULL);
+	set->owner = NULL;
+	FreeWaitEventSet(set);
+}
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 99cc47874a..9efc33add8 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -102,6 +102,8 @@
 
 #include <signal.h>
 
+#include "utils/resowner.h"
+
 /*
  * Latch structure should be treated as opaque and only accessed through
  * the public functions. It is defined here to allow embedding Latches as
@@ -173,7 +175,7 @@ extern void SetLatch(Latch *latch);
 extern void ResetLatch(Latch *latch);
 extern void ShutdownLatchSupport(void);
 
-extern WaitEventSet *CreateWaitEventSet(MemoryContext context, int nevents);
+extern WaitEventSet *CreateWaitEventSet(ResourceOwner resowner, int nevents);
 extern void FreeWaitEventSet(WaitEventSet *set);
 extern void FreeWaitEventSetAfterFork(WaitEventSet *set);
 extern int	AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd,
diff --git a/src/include/utils/resowner.h b/src/include/utils/resowner.h
index 0735480214..ddbf19d8da 100644
--- a/src/include/utils/resowner.h
+++ b/src/include/utils/resowner.h
@@ -74,6 +74,7 @@ typedef uint32 ResourceReleasePriority;
 #define RELEASE_PRIO_TUPDESC_REFS			400
 #define RELEASE_PRIO_SNAPSHOT_REFS			500
 #define RELEASE_PRIO_FILES					600
+#define RELEASE_PRIO_WAITEVENTSETS			700
 
 /* 0 is considered invalid */
 #define RELEASE_PRIO_FIRST					1
-- 
2.39.2

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: WaitEventSet resource leakage

On Fri, Nov 17, 2023 at 12:22 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 16/11/2023 01:08, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 09/03/2023 20:51, Tom Lane wrote:

After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.

Let's change it so that it's always allocated in TopMemoryContext, but
pass a ResourceOwner instead:
WaitEventSet *
CreateWaitEventSet(ResourceOwner owner, int nevents)
And use owner == NULL to mean session lifetime.

WFM. (I didn't study your back-branch patch.)

And here is a patch to implement that on master.

Rationale and code look good to me.

cfbot warns about WAIT_USE_WIN32:

[10:12:54.375] latch.c:889:2: error: ISO C90 forbids mixed
declarations and code [-Werror=declaration-after-statement]

Let's see...

WaitEvent *cur_event;

for (cur_event = set->events;

Maybe:

for (WaitEvent *cur_event = set->events;

#6Alexander Lakhin
exclusion@gmail.com
In reply to: Thomas Munro (#5)
Re: WaitEventSet resource leakage

20.11.2023 00:09, Thomas Munro wrote:

On Fri, Nov 17, 2023 at 12:22 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

And here is a patch to implement that on master.

Rationale and code look good to me.

I can also confirm that the patches proposed (for master and back branches)
eliminate WES leakage as expected.

Thanks for the fix!

Maybe you would find appropriate to add the comment
/* Convenience wrappers over ResourceOwnerRemember/Forget */
above ResourceOwnerRememberWaitEventSet
just as it's added above ResourceOwnerRememberRelationRef,
ResourceOwnerRememberDSM, ResourceOwnerRememberFile, ...

(As a side note, this fix doesn't resolve the issue #17828 completely,
because that large number of handles might be also consumed
legally.)

Best regards,
Alexander

#7Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Alexander Lakhin (#6)
Re: WaitEventSet resource leakage

On 22/11/2023 15:00, Alexander Lakhin wrote:

I can also confirm that the patches proposed (for master and back branches)
eliminate WES leakage as expected.

Thanks for the fix!

Maybe you would find appropriate to add the comment
/* Convenience wrappers over ResourceOwnerRemember/Forget */
above ResourceOwnerRememberWaitEventSet
just as it's added above ResourceOwnerRememberRelationRef,
ResourceOwnerRememberDSM, ResourceOwnerRememberFile, ...

Added that and fixed the Windows warning that Thomas pointed out. Pushed
the ResourceOwner version to master, and PG_TRY-CATCH version to 14-16.

Thank you!

(As a side note, this fix doesn't resolve the issue #17828 completely,
because that large number of handles might be also consumed
legally.)

:-(

--
Heikki Linnakangas
Neon (https://neon.tech)