Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Started by Mikhail Kot4 months ago16 messages
#1Mikhail Kot
mikhail.kot@databricks.com
1 attachment(s)

Hi,

I've encountered the following segmentation fault lately. It happens when
Postgres is experiencing high memory pressure. There are multiple OOM errors in
the log as well.

Core was generated by `postgres: neondb_owner neondb ::1(46658) BIND
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_atomic_read_u32_impl (ptr=0x8) at
../../../../src/include/port/atomics/generic.h:48
#1 pg_atomic_read_u32 (ptr=0x8) at ../../../../src/include/port/atomics.h:239
#2 LWLockAttemptLock (lock=lock@entry=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:821
#3 0x000056446bce129f in LWLockConditionalAcquire (lock=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1386
#4 0x000056446bd0bacf in pgstat_lock_entry
(entry_ref=entry_ref@entry=0x56446d9f4340, nowait=nowait@entry=true)
at pgstat_shmem.c:625
#5 0x000056446bd0a3c9 in pgstat_relation_flush_cb
(entry_ref=0x56446d9f4340, nowait=<optimized out>) at
pgstat_relation.c:794
#6 0x000056446bd069f5 in pgstat_flush_pending_entries
(nowait=<optimized out>) at pgstat.c:1217
#7 pgstat_report_stat (force=<optimized out>, force@entry=false) at
pgstat.c:658
#8 0x000056446bcf16c1 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4623
#9 0x000056446bc716b3 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4465
#10 BackendStartup (port=<optimized out>) at postmaster.c:4193
#11 ServerLoop () at postmaster.c:1782
#12 0x000056446bc726ea in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x56446cd803b0) at postmaster.c:1466
#13 0x000056446b9d5a00 in main (argc=3, argv=0x56446cd803b0) at main.c:238

The error originates from pgstat_shmem.c file where shhashent is left in
half-initialized state if pgstat_init_entry(), calling dsa_allocate0(), errors
out with OOM. Then shhashent causes a segmentation fault on access. I propose a
patch which solves this issue. The patch is for main branch, but the code is
nearly identical in Postgres 13-17 so I suggest backporting it to other
supported versions.

The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory
allocation failed. It also adds sanity checks to routines accepting arguments
returned by pgstat_init_entry().

Reproducing this behaviour is tricky, because under OOM Postgres doesn't
necessarily reach the condition where specific dsa_allocate0() call errors.

Attachments:

0001-fix-sshashent-access-segfault-when-it-s-half-initial.patchapplication/octet-stream; name=0001-fix-sshashent-access-segfault-when-it-s-half-initial.patchDownload
From 295352adb54faf36fd730922ada032a4bbaaa941 Mon Sep 17 00:00:00 2001
From: Mikhail Kot <to@myrrc.dev>
Date: Tue, 2 Sep 2025 21:02:04 +0100
Subject: [PATCH] fix sshashent access segfault when it's half initialized due
 to OOM

---
 src/backend/utils/activity/pgstat_database.c     |  3 +++
 src/backend/utils/activity/pgstat_function.c     |  3 +++
 src/backend/utils/activity/pgstat_relation.c     |  3 +++
 src/backend/utils/activity/pgstat_shmem.c        | 13 ++++++++++---
 src/backend/utils/activity/pgstat_subscription.c |  3 +++
 5 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index b31f20d41bc..71e897d684b 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -430,6 +430,9 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	PgStatShared_Database *sharedent;
 	PgStat_StatDBEntry *pendingent;
 
+	if (!entry_ref)
+		return false;
+
 	pendingent = (PgStat_StatDBEntry *) entry_ref->pending;
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index 6214f93d36e..02916b80cb2 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -195,6 +195,9 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	PgStat_FunctionCounts *localent;
 	PgStatShared_Function *shfuncent;
 
+	if (!entry_ref)
+		return false;
+
 	localent = (PgStat_FunctionCounts *) entry_ref->pending;
 	shfuncent = (PgStatShared_Function *) entry_ref->shared_stats;
 
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 69df741cbf6..20dfa137ba6 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -821,6 +821,9 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	PgStat_StatTabEntry *tabentry;	/* table entry of shared stats */
 	PgStat_StatDBEntry *dbentry;	/* pending database entry */
 
+	if (!entry_ref)
+		return false;
+
 	dboid = entry_ref->shared_entry->key.dboid;
 	lstats = (PgStat_TableStatus *) entry_ref->pending;
 	shtabstats = (PgStatShared_Relation *) entry_ref->shared_stats;
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 62de3474453..76b12b0b1b4 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -311,7 +311,12 @@ pgstat_init_entry(PgStat_Kind kind,
 	pg_atomic_init_u32(&shhashent->generation, 0);
 	shhashent->dropped = false;
 
-	chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
+	chunk = dsa_allocate_extended(pgStatLocal.dsa,
+								  pgstat_get_kind_info(kind)->shared_size,
+								  DSA_ALLOC_ZERO | DSA_ALLOC_NO_OOM);
+	if (!chunk)
+		return NULL;
+
 	shheader = dsa_get_address(pgStatLocal.dsa, chunk);
 	shheader->magic = 0xdeadbeef;
 
@@ -508,7 +513,9 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 		shhashent = dshash_find_or_insert(pgStatLocal.shared_hash, &key, &shfound);
 		if (!shfound)
 		{
-			shheader = pgstat_init_entry(kind, shhashent);
+			if ((shheader = pgstat_init_entry(kind, shhashent)) == NULL)
+				return NULL;
+
 			pgstat_acquire_entry_ref(entry_ref, shhashent, shheader);
 
 			if (created_entry != NULL)
@@ -699,7 +706,7 @@ pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, uint64 objid,
 	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
-	if (!pgstat_lock_entry(entry_ref, nowait))
+	if (!entry_ref || !pgstat_lock_entry(entry_ref, nowait))
 		return NULL;
 
 	return entry_ref;
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index f9a1c831a07..209a37607e4 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -105,6 +105,9 @@ pgstat_subscription_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	PgStat_BackendSubEntry *localent;
 	PgStatShared_Subscription *shsubent;
 
+	if (!entry_ref)
+		return false;
+
 	localent = (PgStat_BackendSubEntry *) entry_ref->pending;
 	shsubent = (PgStatShared_Subscription *) entry_ref->shared_stats;
 
-- 
2.47.2

#2Michael Paquier
michael@paquier.xyz
In reply to: Mikhail Kot (#1)
Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Tue, Sep 02, 2025 at 09:09:54PM +0100, Mikhail Kot wrote:

The error originates from pgstat_shmem.c file where shhashent is left in
half-initialized state if pgstat_init_entry(), calling dsa_allocate0(), errors
out with OOM. Then shhashent causes a segmentation fault on access. I propose a
patch which solves this issue. The patch is for main branch, but the code is
nearly identical in Postgres 13-17 so I suggest backporting it to other
supported versions.

Ugh. Support for pgstats in shared memory has been added in v15, so
v13 and v14 are out of scope, aren't they?

The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory
allocation failed. It also adds sanity checks to routines accepting arguments
returned by pgstat_init_entry().

@@ -430,6 +430,9 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
PgStatShared_Database *sharedent;
PgStat_StatDBEntry *pendingent;

+ if (!entry_ref)
+ return false;

The flush callbacks are by design expected to return false *if and
only if* the flush of the stats is *not forced* and they could not be
flushed due to lock contention. This addition means that when
pgstat_report_stat() uses its "force" mode, then it may randomly
behave incorrectly and would fail to fulfill its design contract.

Reproducing this behaviour is tricky, because under OOM Postgres doesn't
necessarily reach the condition where specific dsa_allocate0() call errors.

Deterministic testing would not be complicated, one can use an
injection point that does a stack manipulation with
IS_INJECTION_POINT_ATTACHED() or just return an error, but I don't see
much value in doing that here as long as a fix is local to
pgstat_init_entry().

Requiring that all the callers of pgstat_init_entry() need to cope
with a potential OOM error path seems like a recipe that could be
easily forgotten, even if we redesign the flush callbacks to be able
to know about a more complex error state, which means to rewrite the
code to return an enum state made of three possible values for OOM,
success and lock contention.

Anyway, couldn't we flip the order of the operations in
pgstat_init_entry() so as we do first an allocation and avoid any
inconsistency in the shared state? Or we could use DSA_ALLOC_NO_OOM,
and put back the shared state in a consistent state before issuing an
error if we find that the result of the allocation is NULL. Requiring
an error on allocation seems more important to me here, we do that for
palloc() and I don't see why we should not just do the same here for
this DSA usage in the pgstats code.
--
Michael

#3Steven Niu
niushiji@gmail.com
In reply to: Mikhail Kot (#1)
回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

I found there are many cases of following pattern:

ptr_1 = dsa_allocate();
ptr_2 = dsa_get_address(xxx, ptr_1);
ptr_2->yyy = zzz;

Inside dsa_get_address(dsa_area *area, dsa_pointer dp):

      /* Convert InvalidDsaPointer to NULL. */
      if (!DsaPointerIsValid(dp))
            return NULL;

So unless dsa_allocate() can ensure never returns InvalidDsaPointer, there is risk of SegV.
In fact the function dsa_allocate() does return InvalidDsaPointer in some cases.

So, maybe should we add pointer check in all places where dsa_get_address is called. Comments?

________________________________
发件人: Mikhail Kot <mikhail.kot@databricks.com>
已发送: 2025 年 9 月 03 日 星期三 04:09
收件人: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
抄送: to@myrrc.dev <to@myrrc.dev>
主题: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Hi,

I've encountered the following segmentation fault lately. It happens when
Postgres is experiencing high memory pressure. There are multiple OOM errors in
the log as well.

Core was generated by `postgres: neondb_owner neondb ::1(46658) BIND
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_atomic_read_u32_impl (ptr=0x8) at
../../../../src/include/port/atomics/generic.h:48
#1 pg_atomic_read_u32 (ptr=0x8) at ../../../../src/include/port/atomics.h:239
#2 LWLockAttemptLock (lock=lock@entry=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:821
#3 0x000056446bce129f in LWLockConditionalAcquire (lock=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1386
#4 0x000056446bd0bacf in pgstat_lock_entry
(entry_ref=entry_ref@entry=0x56446d9f4340, nowait=nowait@entry=true)
at pgstat_shmem.c:625
#5 0x000056446bd0a3c9 in pgstat_relation_flush_cb
(entry_ref=0x56446d9f4340, nowait=<optimized out>) at
pgstat_relation.c:794
#6 0x000056446bd069f5 in pgstat_flush_pending_entries
(nowait=<optimized out>) at pgstat.c:1217
#7 pgstat_report_stat (force=<optimized out>, force@entry=false) at
pgstat.c:658
#8 0x000056446bcf16c1 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4623
#9 0x000056446bc716b3 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4465
#10 BackendStartup (port=<optimized out>) at postmaster.c:4193
#11 ServerLoop () at postmaster.c:1782
#12 0x000056446bc726ea in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x56446cd803b0) at postmaster.c:1466
#13 0x000056446b9d5a00 in main (argc=3, argv=0x56446cd803b0) at main.c:238

The error originates from pgstat_shmem.c file where shhashent is left in
half-initialized state if pgstat_init_entry(), calling dsa_allocate0(), errors
out with OOM. Then shhashent causes a segmentation fault on access. I propose a
patch which solves this issue. The patch is for main branch, but the code is
nearly identical in Postgres 13-17 so I suggest backporting it to other
supported versions.

The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory
allocation failed. It also adds sanity checks to routines accepting arguments
returned by pgstat_init_entry().

Reproducing this behaviour is tricky, because under OOM Postgres doesn't
necessarily reach the condition where specific dsa_allocate0() call errors.

#4Michael Paquier
michael@paquier.xyz
In reply to: Steven Niu (#3)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Wed, Sep 03, 2025 at 07:22:00AM +0000, Steven Niu wrote:

So unless dsa_allocate() can ensure never returns InvalidDsaPointer,
there is risk of SegV. In fact the function dsa_allocate() does
return InvalidDsaPointer in some cases.

So, maybe should we add pointer check in all places where dsa_get_address is called. Comments?

dsa_allocate() calls dsa_allocate_extended() without DSA_ALLOC_NO_OOM,
hence on allocation failure the code does a ereport(ERROR). It would
be a problem to not check the result if DSA_ALLOC_NO_OOM is used.

The problem dealt with here is different, as far as I understand: we
set some data in shared memory without considering that the DSA
allocation could fail, leaving shared memory in an inconsistent state
when an allocation failure occurs. The problem is not in the
allocation failure in itself, but in the dependency that we have
between the state in shared memory and the allocation attempt for this
pgstats code path.
--
Michael

#5Steven Niu
niushiji@gmail.com
In reply to: Michael Paquier (#4)
回复: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

________________________________________
发件人: Michael Paquier
已发送: 2025 年 9 月 03 日 星期三 17:43
收件人: Steven Niu
抄送: Mikhail Kot; pgsql-hackers@lists.postgresql.org; to@myrrc.dev
主题: Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Wed, Sep 03, 2025 at 07:22:00AM +0000, Steven Niu wrote:

So unless dsa_allocate() can ensure never returns InvalidDsaPointer,
there is risk of SegV. In fact the function dsa_allocate() does
return InvalidDsaPointer in some cases.

So, maybe should we add pointer check in all places where dsa_get_address is called. Comments?

dsa_allocate() calls dsa_allocate_extended() without DSA_ALLOC_NO_OOM,
hence on allocation failure the code does a ereport(ERROR). It would
be a problem to not check the result if DSA_ALLOC_NO_OOM is used.

Thanks, Michael, you are correct.

The problem dealt with here is different, as far as I understand: we
set some data in shared memory without considering that the DSA
allocation could fail, leaving shared memory in an inconsistent state
when an allocation failure occurs. The problem is not in the
allocation failure in itself, but in the dependency that we have
between the state in shared memory and the allocation attempt for this
pgstats code path.
--
Michael

#6Ranier Vilela
ranier.vf@gmail.com
In reply to: Mikhail Kot (#1)
Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Em qua., 3 de set. de 2025 às 03:34, Mikhail Kot <mikhail.kot@databricks.com>
escreveu:

Hi,

I've encountered the following segmentation fault lately. It happens when
Postgres is experiencing high memory pressure. There are multiple OOM
errors in
the log as well.

Core was generated by `postgres: neondb_owner neondb ::1(46658) BIND
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_atomic_read_u32_impl (ptr=0x8) at
../../../../src/include/port/atomics/generic.h:48
#1 pg_atomic_read_u32 (ptr=0x8) at
../../../../src/include/port/atomics.h:239
#2 LWLockAttemptLock (lock=lock@entry=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:821
#3 0x000056446bce129f in LWLockConditionalAcquire (lock=0x4,
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1386
#4 0x000056446bd0bacf in pgstat_lock_entry
(entry_ref=entry_ref@entry=0x56446d9f4340, nowait=nowait@entry=true)
at pgstat_shmem.c:625
#5 0x000056446bd0a3c9 in pgstat_relation_flush_cb
(entry_ref=0x56446d9f4340, nowait=<optimized out>) at
pgstat_relation.c:794
#6 0x000056446bd069f5 in pgstat_flush_pending_entries
(nowait=<optimized out>) at pgstat.c:1217
#7 pgstat_report_stat (force=<optimized out>, force@entry=false) at
pgstat.c:658
#8 0x000056446bcf16c1 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4623
#9 0x000056446bc716b3 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4465
#10 BackendStartup (port=<optimized out>) at postmaster.c:4193
#11 ServerLoop () at postmaster.c:1782
#12 0x000056446bc726ea in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x56446cd803b0) at postmaster.c:1466
#13 0x000056446b9d5a00 in main (argc=3, argv=0x56446cd803b0) at main.c:238

The error originates from pgstat_shmem.c file where shhashent is left in
half-initialized state if pgstat_init_entry(), calling dsa_allocate0(),
errors
out with OOM. Then shhashent causes a segmentation fault on access. I
propose a
patch which solves this issue. The patch is for main branch, but the code
is
nearly identical in Postgres 13-17 so I suggest backporting it to other
supported versions.

The patch changes pgstat_init_entry()'s behaviour, returning NULL if memory
allocation failed.

I'm wondering if it wouldn't be better to raise elog(ERROR), and avoid
many checks for this NULL.

best regards,
Ranier Vilela

#7Mikhail Kot
mikhail.kot@databricks.com
In reply to: Ranier Vilela (#6)
1 attachment(s)
Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Hi Michael, Steven, and Ranier,

Anyway, couldn't we flip the order of the operations in

pgstat_init_entry() so as we do first an allocation and avoid any inconsistency
in the shared state?

The issue is not only in pgstat_init_entry(). Currently it errors on OOM but
this doesn't prevent us from calling pgstat_lock_entry() through
pgstat_get_entry_ref() which accesses a non-initialized lock.

Here's the second version of the patch. Now we remove inserted hash entry
on OOM which would prevent accessing the entry

Attachments:

0002-fix-sshashent-access-segfault-when-it-s-half-initial.patchapplication/octet-stream; name=0002-fix-sshashent-access-segfault-when-it-s-half-initial.patchDownload
From 53bdecfeb44e6f6dcc7f7fcc20d1569f4d01285b Mon Sep 17 00:00:00 2001
From: Mikhail Kot <to@myrrc.dev>
Date: Wed, 3 Sep 2025 22:34:46 +0100
Subject: [PATCH] remove pgstate hash entry if pgstat_init_entry errors with
 OOM

---
 src/backend/utils/activity/pgstat_shmem.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 62de3474453..fffa1e56941 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -508,7 +508,16 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 		shhashent = dshash_find_or_insert(pgStatLocal.shared_hash, &key, &shfound);
 		if (!shfound)
 		{
-			shheader = pgstat_init_entry(kind, shhashent);
+			PG_TRY();
+			{
+				shheader = pgstat_init_entry(kind, shhashent);
+			}
+			PG_CATCH();
+			{
+				dshash_delete_entry(pgStatLocal.shared_hash, shhashent);
+				PG_RE_THROW();
+			}
+			PG_END_TRY();
 			pgstat_acquire_entry_ref(entry_ref, shhashent, shheader);
 
 			if (created_entry != NULL)
-- 
2.47.2

#8Steven Niu
niushiji@gmail.com
In reply to: Mikhail Kot (#7)
回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Hi, Mikhail,

If pgstat_init_entry() errors on OOM, what would it returns to shheader, NULL?
That would bring trouble to dshash_delete_entry().

Thanks,
Steven

________________________________
发件人: Mikhail Kot <mikhail.kot@databricks.com>
已发送: 2025 年 9 月 04 日 星期四 05:39
收件人: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
抄送: to@myrrc.dev <to@myrrc.dev>
主题: Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Hi Michael, Steven, and Ranier,

Anyway, couldn't we flip the order of the operations in

pgstat_init_entry() so as we do first an allocation and avoid any inconsistency
in the shared state?

The issue is not only in pgstat_init_entry(). Currently it errors on OOM but
this doesn't prevent us from calling pgstat_lock_entry() through
pgstat_get_entry_ref() which accesses a non-initialized lock.

Here's the second version of the patch. Now we remove inserted hash entry
on OOM which would prevent accessing the entry

#9Michael Paquier
michael@paquier.xyz
In reply to: Steven Niu (#8)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Thu, Sep 04, 2025 at 02:31:34AM +0000, Steven Niu wrote:

If pgstat_init_entry() errors on OOM, what would it returns to shheader, NULL?
That would bring trouble to dshash_delete_entry().

Based on the proposal of patch 0002, the code would throw an error
after cleaning up the shared memory state. The generation and
refcount number assigned inside pgstat_init_entry() would not matter
as well because the entry created by dshash_find_or_insert() would be
entirely gone. So I am not sure what's the point you are trying to
make here.
--
Michael

#10Michael Paquier
michael@paquier.xyz
In reply to: Mikhail Kot (#7)
2 attachment(s)
Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Wed, Sep 03, 2025 at 10:39:04PM +0100, Mikhail Kot wrote:

The issue is not only in pgstat_init_entry(). Currently it errors on OOM but
this doesn't prevent us from calling pgstat_lock_entry() through
pgstat_get_entry_ref() which accesses a non-initialized lock.

Spent more time on that, finally. So your argument is that this leads
to an inconsistent state in the hash table because the dshash API is
designed to force a new entry creation if it cannot be found.

-            shheader = pgstat_init_entry(kind, shhashent);
+            PG_TRY();
+            {
+                shheader = pgstat_init_entry(kind, shhashent);
+            }
+            PG_CATCH();
+            {
+                dshash_delete_entry(pgStatLocal.shared_hash, shhashent);
+                PG_RE_THROW();
+            }
+            PG_END_TRY();

Hmm. There are a couple of extra errors that can be reached, through
get_segment_by_index() or dsa_get_address() for example, but these
point to scenarios that should never happen or programming errors when
using DSAs.

Here's the second version of the patch. Now we remove inserted hash entry
on OOM which would prevent accessing the entry

There are only two callers of pgstat_init_entry(), so I am wondering
if a better long-term thing would be to track this behavior in
pgstat_init_entry(), and let the callers deal with the cleanup
consequences, rather than have the callers of pgstat_init_entry()
guess that they may need to do something with a TRY/CATCH block. I
doubt that the number of callers of pgstat_init_entry() will raise,
but people like doing fancy things.

In pgstat_read_statsfile(), an interesting side effect is the
possibility to issue a soft error. I have never seen anybody complain
about an OOM from the startup process when reading the stats file,
TBH, still prioritizing availability is an interesting take when
reading the stats file when facing a DSA allocation failure.

In order to reproduce one failure pattern, you can use the attached
0002, then use this sequence to emulate the OOM path and the dshash
table inconsistency (install src/test/modules/injection_points first):
create extension injection_points;
select injection_points_attach('pgstat-init-entry-oom', 'notice');
-- SQL query able to create fresh pgstats entry
-- ERROR with patch 0001, crash on HEAD.

Note that none of that seems worth a backpatch, we have an history of
treating unlikely-going-to-happen errors like OOMs as HEAD-only
improvements. This one is of the same class.
--
Michael

Attachments:

v3-0001-Fix-OOM-failure-handling-with-pgstats-entry-creat.patchtext/x-diff; charset=us-asciiDownload
From 6fe33b6446f1c57ca7570f742f39f844e36a7c23 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 4 Sep 2025 16:27:51 +0900
Subject: [PATCH v3 1/2] Fix OOM failure handling with pgstats entry creations

---
 src/backend/utils/activity/pgstat.c       |  7 ++++++
 src/backend/utils/activity/pgstat_shmem.c | 27 ++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index ffb5b8cce344..8cba6e9cdf8c 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1975,6 +1975,13 @@ pgstat_read_statsfile(void)
 
 					header = pgstat_init_entry(key.kind, p);
 					dshash_release_lock(pgStatLocal.shared_hash, p);
+					if (header == NULL)
+					{
+						elog(WARNING, "could not allocate entry %u/%u/%" PRIu64 " of type %c",
+							 key.kind, key.dboid,
+							 key.objid, t);
+						goto error;
+					}
 
 					if (!read_chunk(fpin,
 									pgstat_get_entry_data(key.kind, header),
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 62de34744536..8f8b57e8ee8a 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -289,6 +289,12 @@ pgstat_detach_shmem(void)
  * ------------------------------------------------------------
  */
 
+/*
+ * Initialize entry newly-created.
+ *
+ * Returns NULL in the event of an allocation failure, giving the possibility
+ * for callers to deal with any cleanup actions.
+ */
 PgStatShared_Common *
 pgstat_init_entry(PgStat_Kind kind,
 				  PgStatShared_HashEntry *shhashent)
@@ -311,7 +317,12 @@ pgstat_init_entry(PgStat_Kind kind,
 	pg_atomic_init_u32(&shhashent->generation, 0);
 	shhashent->dropped = false;
 
-	chunk = dsa_allocate0(pgStatLocal.dsa, pgstat_get_kind_info(kind)->shared_size);
+	chunk = dsa_allocate_extended(pgStatLocal.dsa,
+								  pgstat_get_kind_info(kind)->shared_size,
+								  DSA_ALLOC_ZERO | DSA_ALLOC_NO_OOM);
+	if (chunk == InvalidDsaPointer)
+		return NULL;
+
 	shheader = dsa_get_address(pgStatLocal.dsa, chunk);
 	shheader->magic = 0xdeadbeef;
 
@@ -509,6 +520,20 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 		if (!shfound)
 		{
 			shheader = pgstat_init_entry(kind, shhashent);
+			if (shheader == NULL)
+			{
+				/*
+				 * Failed the allocation of a new entry, so clean up the shared
+				 * hashtable before giving up.
+				 */
+				dshash_delete_entry(pgStatLocal.shared_hash, shhashent);
+
+				ereport(ERROR,
+						(errcode(ERRCODE_OUT_OF_MEMORY),
+						 errmsg("out of memory"),
+						 errdetail("Failed while allocating entry %u/%u/%" PRIu64 " of kind %u.",
+								   key.kind, key.dboid, key.objid, kind)));
+			}
 			pgstat_acquire_entry_ref(entry_ref, shhashent, shheader);
 
 			if (created_entry != NULL)
-- 
2.51.0

v3-0002-Add-injection-point-for-OOM-failure-emulation.patchtext/x-diff; charset=us-asciiDownload
From 08178c5820364143bc039717a4ac98a27c0a4cdb Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 4 Sep 2025 16:19:03 +0900
Subject: [PATCH v3 2/2] Add injection point for OOM failure emulation

---
 src/backend/utils/activity/pgstat_shmem.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 8f8b57e8ee8a..c48a52db1e88 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -14,6 +14,7 @@
 
 #include "pgstat.h"
 #include "storage/shmem.h"
+#include "utils/injection_point.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
@@ -317,6 +318,9 @@ pgstat_init_entry(PgStat_Kind kind,
 	pg_atomic_init_u32(&shhashent->generation, 0);
 	shhashent->dropped = false;
 
+	if (IS_INJECTION_POINT_ATTACHED("pgstat-init-entry-oom"))
+		return NULL;
+
 	chunk = dsa_allocate_extended(pgStatLocal.dsa,
 								  pgstat_get_kind_info(kind)->shared_size,
 								  DSA_ALLOC_ZERO | DSA_ALLOC_NO_OOM);
-- 
2.51.0

#11Steven Niu
niushiji@gmail.com
In reply to: Michael Paquier (#9)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

_______________________________________
From: Michael Paquier
Sent: Thursday, September 04, 2025 14:30
To: Steven Niu
Cc: Mikhail Kot; pgsql-hackers@lists.postgresql.org; to@myrrc.dev
Subject: Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Thu, Sep 04, 2025 at 02:31:34AM +0000, Steven Niu wrote:

If pgstat_init_entry() errors on OOM, what would it returns to shheader, NULL?
That would bring trouble to dshash_delete_entry().

Based on the proposal of patch 0002, the code would throw an error
after cleaning up the shared memory state. The generation and
refcount number assigned inside pgstat_init_entry() would not matter
as well because the entry created by dshash_find_or_insert() would be
entirely gone. So I am not sure what's the point you are trying to
make here.
--
Michael

Sorry, I made a mistake. I should say:
"If pgstat_init_entry() errors on OOM, the local variable shheader may be NULL. This would bring trouble to pgstat_acquire_entry_ref() in the line 30 of patch 002".

#12Rider
oceanustz@gmail.com
In reply to: Steven Niu (#11)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Hi Steven,

I think when an error is thrown within the PG_TRY block, the assignment to
shheader is interrupted and never completes. As a result, shheader retains
its initial, NULL value.

And, the PG_RE_THROW() within the PG_CATCH block causes a non-local jump,
immediately aborting the current execution path to handle the error at a
higher level. This guarantees that the code following PG_END_TRY is
unreachable in the error scenario.

Steven Niu <niushiji@gmail.com> 于2025年9月4日周四 15:38写道:

Show quoted text

_______________________________________
From: Michael Paquier
Sent: Thursday, September 04, 2025 14:30
To: Steven Niu
Cc: Mikhail Kot; pgsql-hackers@lists.postgresql.org; to@myrrc.dev
Subject: Re: 回复: Fix segfault while accessing half-initialized hash table
in pgstat_shmem.c

On Thu, Sep 04, 2025 at 02:31:34AM +0000, Steven Niu wrote:

If pgstat_init_entry() errors on OOM, what would it returns to shheader,

NULL?

That would bring trouble to dshash_delete_entry().

Based on the proposal of patch 0002, the code would throw an error
after cleaning up the shared memory state. The generation and
refcount number assigned inside pgstat_init_entry() would not matter
as well because the entry created by dshash_find_or_insert() would be
entirely gone. So I am not sure what's the point you are trying to
make here.
--
Michael

Sorry, I made a mistake. I should say:
"If pgstat_init_entry() errors on OOM, the local variable shheader may be
NULL. This would bring trouble to pgstat_acquire_entry_ref() in the line 30
of patch 002".

#13Mikhail Kot
mikhail.kot@databricks.com
In reply to: Rider (#12)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

Hi,

Do you want me to update the patch based on your suggestion on fault
injection, or update the try/catch to the callers as discussed, or
it's good to be included in Postgres?

#14Michael Paquier
michael@paquier.xyz
In reply to: Mikhail Kot (#13)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Fri, Sep 05, 2025 at 09:46:55PM +0100, Mikhail Kot wrote:

Do you want me to update the patch based on your suggestion on fault
injection, or update the try/catch to the callers as discussed, or
it's good to be included in Postgres?

I would prefer the approach of letting the callers deal with the error
handling, by making the callers of pgstat_init_entry() be aware of the
problem based on the result returned, as posted at:
/messages/by-id/aLlAym4DHW4PM8Gg@paquier.xyz

What I do not like in my patch is the change in
pgstat_read_statsfile(). I have wondered about the availability
argument but there's a risk of discarding the stats based on the
memory pressure, which is different from the current pattern where we
rely entirely on the state of the on-disk file for corruption. So it
should be changed to generate an ERROR, with an errdetail() similar to
pgstat_get_entry_ref() but consistent in style to the other messages
in pgstat_read_statsfile().

The injection point business is useful for testing, but I don't see a
point in including something in the patch, because the code changes
influence the test outputs.

A last thing that I was not able to spend much time on is how much it
is possible to mess up with the shared memory state. If it is worse
than I suspected initially, where an OOM in a first session can cause
crashes because of incorrect dshash entries in shmem depending on the
stats kind fetched, a backpatch will be required, indeed. The change
is not really invasive, so that's OK on this side.

If you can send an updated patch version among these lines, that would
be of course helpful for me. I should be able to get back to the
problem on Monday my time.
--
Michael

#15Michael Paquier
michael@paquier.xyz
In reply to: Rider (#12)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Thu, Sep 04, 2025 at 03:49:19PM +0800, Rider wrote:

And, the PG_RE_THROW() within the PG_CATCH block causes a non-local jump,
immediately aborting the current execution path to handle the error at a
higher level. This guarantees that the code following PG_END_TRY is
unreachable in the error scenario.

Please see details in utils/elog.h, if you want to study this area of
the code of course. There is a large portion about volatile variables
and compiler expectations which is also very interested to know about.
And that's useful if you write your own extension code, outside of the
core Postgres code.
--
Michael

#16Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#14)
Re: 回复: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c

On Sat, Sep 06, 2025 at 10:01:24AM +0900, Michael Paquier wrote:

A last thing that I was not able to spend much time on is how much it
is possible to mess up with the shared memory state. If it is worse
than I suspected initially, where an OOM in a first session can cause
crashes because of incorrect dshash entries in shmem depending on the
stats kind fetched, a backpatch will be required, indeed. The change
is not really invasive, so that's OK on this side.

OK, I have played a bit more with all that, corrupting the shared
hashtable of pgstats. At the end, I have used a version close to what
I have sent previously that changes pgstat_init_entry() to return NULL
on OOM with dsa_allocate_extended(), as any other possible errors that
could happen in this call involve elog(ERROR) and not-reachable cases.

pgstat_read_statsfile() has been changed to raise an ERROR instead,
which is what we did previously, giving priority to the on-disk stats
when the environment is under memory pressure at startup. The patch
has required a few tweaks in the back-branches, nothing huge.

Thanks for the report!
--
Michael