Crash related to Shared Memory

Started by Reshmithaa23 days ago7 messages
Jump to latest
#1Reshmithaa
reshmithaa.b@zohocorp.com

Hi,

We have developed a PostgreSQL extension and are currently encountering intermittent crashes related to shared memory on PostgreSQL 17.7. The issue does not occur everytime, and we have not been able to reliably reproduce it in our local environment. We have attached the relevant pg logs and backtraces below for reference.

We also noticed a recent commit addressing a DSM-related issue:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1d0fc2499

Could you please confirm whether this change can resolve the type of crash we are encountering?

Thanks & Regards,

Reshmithaa B

Member Technical Staff

ZOHO Corporation

Attachments:

Trace2.txttext/plain; charset=us-ascii; name=Trace2.txtDownload
Trace1.txttext/plain; charset=us-ascii; name=Trace1.txtDownload
#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Reshmithaa (#1)
Re: Crash related to Shared Memory

Hi,

On 2/15/26 09:53, Reshmithaa wrote:

Hi,

We have developed a PostgreSQL extension and are currently encountering
intermittent crashes related to shared memory on PostgreSQL 17.7. The
issue does not occur everytime, and we have not been able to reliably
reproduce it in our local environment. We have attached the relevant pg
logs and backtraces below for reference.

We also noticed a recent commit addressing a DSM-related issue:

https://git.postgresql.org/gitweb/?
p=postgresql.git;a=commitdiff;h=1d0fc2499 <https://git.postgresql.org/
gitweb/?p=postgresql.git;a=commitdiff;h=1d0fc2499>

Could you please confirm whether this change can resolve the type of
crash we are encountering?

Maybe, but it's hard to say based on the logs you provided.

In the commit message you linked, Robert speculated this should not be
happening in core Postgres, but that maybe extensions could trip over
this. What extensions are you using? Especially extensions third-party
extensions?

It's interesting both traces end with a segfault, but in both cases
there are "strange" errors immediately before that. In particular the
first trace shows

[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp

For the process that then crashes with a segfault. The second trace
unfortunately does not show what happened to PID 2037344 before it
crashes, which would be very interesting to know. Was it an OOM too?

But there are a couple other suspicious errors from other PIDs, like

[2136038]: LOCATION: get_segment_by_index, dsa.c:1781
been freed
[2136038]: LOCATION: get_segment_by_index, dsa.c:1781

I don't think this should be happening in core code, at least I don't
recall seeing anything like that recently.

I wonder if the cleanup after OOM could lead to the crash because the
error cleanup destroys the short-lived context, in a way Robert did not
envision in the commit message. But I haven't tried and it's just a
speculation.

FWIW even if the commit (and upgrading to 17.8) fixes the crash, I don't
think that'll fix the other errors shown in the traces. You should
probably look into that.

regards

--
Tomas Vondra

#3Reshmithaa
reshmithaa.b@zohocorp.com
In reply to: Tomas Vondra (#2)
Re: Crash related to Shared Memory

Hi Tomas Vondra,

What extensions are you using? Especially extensions third-party

extensions?

We are working on our own columnar extension and have modified the PostgreSQL source code to enable pg stats for foreign tables and to support autoanalyze and autovacuum on foreign tables.

From PostgreSQL 15 onward, statistics were moved to shared memory. We started observing this issue in production after upgrading from PostgreSQL 14 to PostgreSQL 17.2 / 17.7.

Could this dsa issue be related to pg stats?

The second trace

unfortunately does not show what happened to PID 2037344 before it

crashes, which would be very interesting to know. Was it an OOM too?

The process with PID 2037344 did not terminate due to an OOM intead ended with segfalut.

Regards
Reshmithaa B

From: Tomas Vondra < mailto:tomas@vondra.me >
To: "Reshmithaa"< mailto:reshmithaa.b@zohocorp.com >, "pgsql-hackers"< mailto:pgsql-hackers@postgresql.org >
Date: Mon, 16 Feb 2026 03:34:53 +0530
Subject: Re: Crash related to Shared Memory

Hi,

On 2/15/26 09:53, Reshmithaa wrote:

Hi,

We have developed a PostgreSQL extension and are currently encountering
intermittent crashes related to shared memory on PostgreSQL 17.7. The
issue does not occur everytime, and we have not been able to reliably
reproduce it in our local environment. We have attached the relevant pg
logs and backtraces below for reference.

We also noticed a recent commit addressing a DSM-related issue:

https://git.postgresql.org/gitweb/?
p=postgresql.git;a=commitdiff;h=1d0fc2499 < https://git.postgresql.org/
gitweb/?p=postgresql.git;a=commitdiff;h=1d0fc2499>

Could you please confirm whether this change can resolve the type of
crash we are encountering?

Maybe, but it's hard to say based on the logs you provided.

In the commit message you linked, Robert speculated this should not be
happening in core Postgres, but that maybe extensions could trip over
this. What extensions are you using? Especially extensions third-party
extensions?

It's interesting both traces end with a segfault, but in both cases
there are "strange" errors immediately before that. In particular the
first trace shows

[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp

For the process that then crashes with a segfault. The second trace
unfortunately does not show what happened to PID 2037344 before it
crashes, which would be very interesting to know. Was it an OOM too?

But there are a couple other suspicious errors from other PIDs, like

[2136038]: LOCATION: get_segment_by_index, dsa.c:1781
been freed
[2136038]: LOCATION: get_segment_by_index, dsa.c:1781

I don't think this should be happening in core code, at least I don't
recall seeing anything like that recently.

I wonder if the cleanup after OOM could lead to the crash because the
error cleanup destroys the short-lived context, in a way Robert did not
envision in the commit message. But I haven't tried and it's just a
speculation.

FWIW even if the commit (and upgrading to 17.8) fixes the crash, I don't
think that'll fix the other errors shown in the traces. You should
probably look into that.

regards

--
Tomas Vondra

#4Reshmithaa
reshmithaa.b@zohocorp.com
In reply to: Reshmithaa (#3)
Re: Crash related to Shared Memory

Hi,

The second trace

unfortunately does not show what happened to PID 2037344 before it

crashes, which would be very interesting to know. Was it an OOM too?

Kindly find the attachment below.

Regards,

Reshmithaa B

From: Reshmithaa <reshmithaa.b@zohocorp.com>
To: <tomas@vondra.me>
Cc: <pgsql-hackers@postgresql.org>
Date: Mon, 16 Feb 2026 14:12:25 +0530
Subject: Re: Crash related to Shared Memory

Hi Tomas Vondra,

What extensions are you using? Especially extensions third-party

extensions?

We are working on our own columnar extension and have modified the PostgreSQL source code to enable pg stats for foreign tables and to support autoanalyze and autovacuum on foreign tables.

From PostgreSQL 15 onward, statistics were moved to shared memory. We started observing this issue in production after upgrading from PostgreSQL 14 to PostgreSQL 17.2 / 17.7.

Could this dsa issue be related to pg stats?

The second trace

unfortunately does not show what happened to PID 2037344 before it

crashes, which would be very interesting to know. Was it an OOM too?

The process with PID 2037344 did not terminate due to an OOM intead ended with segfalut.

Regards

Reshmithaa B

From: Tomas Vondra < mailto:tomas@vondra.me >
To: "Reshmithaa"< mailto:reshmithaa.b@zohocorp.com >, "pgsql-hackers"< mailto:pgsql-hackers@postgresql.org >
Date: Mon, 16 Feb 2026 03:34:53 +0530
Subject: Re: Crash related to Shared Memory

Hi,

On 2/15/26 09:53, Reshmithaa wrote:

Hi,

We have developed a PostgreSQL extension and are currently encountering
intermittent crashes related to shared memory on PostgreSQL 17.7. The
issue does not occur everytime, and we have not been able to reliably
reproduce it in our local environment. We have attached the relevant pg
logs and backtraces below for reference.

We also noticed a recent commit addressing a DSM-related issue:

https://git.postgresql.org/gitweb/?
p=postgresql.git;a=commitdiff;h=1d0fc2499 < https://git.postgresql.org/
gitweb/?p=postgresql.git;a=commitdiff;h=1d0fc2499>

Could you please confirm whether this change can resolve the type of
crash we are encountering?

Maybe, but it's hard to say based on the logs you provided.

In the commit message you linked, Robert speculated this should not be
happening in core Postgres, but that maybe extensions could trip over
this. What extensions are you using? Especially extensions third-party
extensions?

It's interesting both traces end with a segfault, but in both cases
there are "strange" errors immediately before that. In particular the
first trace shows

[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp

For the process that then crashes with a segfault. The second trace
unfortunately does not show what happened to PID 2037344 before it
crashes, which would be very interesting to know. Was it an OOM too?

But there are a couple other suspicious errors from other PIDs, like

[2136038]: LOCATION: get_segment_by_index, dsa.c:1781
been freed
[2136038]: LOCATION: get_segment_by_index, dsa.c:1781

I don't think this should be happening in core code, at least I don't
recall seeing anything like that recently.

I wonder if the cleanup after OOM could lead to the crash because the
error cleanup destroys the short-lived context, in a way Robert did not
envision in the commit message. But I haven't tried and it's just a
speculation.

FWIW even if the commit (and upgrading to 17.8) fixes the crash, I don't
think that'll fix the other errors shown in the traces. You should
probably look into that.

regards

--
Tomas Vondra

Attachments:

Trace2.txttext/plain; charset=us-ascii; name=Trace2.txtDownload
#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Reshmithaa (#3)
Re: Crash related to Shared Memory

On 2/16/26 09:42, Reshmithaa wrote:

Hi Tomas Vondra,

What extensions are you using? Especially extensions third-party

extensions?

We are working on our own columnar extension and have modified the
PostgreSQL source code to enable pg stats for foreign tables and to
support autoanalyze and autovacuum on foreign tables.

Then it matches the explanation in Robert's commit message. Chances are
that commit will solve the crash, but the other errors in your traces
are likely to remain.

From PostgreSQL 15 onward, statistics were moved to shared memory. We
started observing this issue in production after upgrading from
PostgreSQL 14 to PostgreSQL 17.2 / 17.7.

Could this dsa issue be related to pg stats?

You literally have the traces full of error messages like this:

[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp
[568804]: STATEMENT: ANALYZE tab2_temp

That's clearly related to pgstat. My bet is that's a bug in your
extension, not sure where.

The second trace

unfortunately does not show what happened to PID 2037344 before it

crashes, which would be very interesting to know. Was it an OOM too?

The process with PID 2037344 did not terminate due to an OOM intead
ended with segfalut.

The question is what happened to the process before it segfaults.

cheers
--
Tomas Vondra

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Reshmithaa (#4)
Re: Crash related to Shared Memory

On 2/16/26 10:17, Reshmithaa wrote:

Hi,

The second trace

unfortunately does not show what happened to PID 2037344 before it

crashes, which would be very interesting to know. Was it an OOM too?

Kindly find the attachment below.

I have no idea what

LOG: 00000: PRIMITIVE NOT USED: map_pg_backend_pid
LOCATION: LogPrimFunctions, primitiveMgr.cpp:39
STATEMENT: select pg_backend_pid()

means, that's clearly something in your fork/extension.

regards

--
Tomas Vondra

#7Reshmithaa
reshmithaa.b@zohocorp.com
In reply to: Tomas Vondra (#6)
Re: Crash related to Shared Memory

Hi,

Thanks for your reply, we will look into it.

Regards,
Reshmithaa B

From: Tomas Vondra <tomas@vondra.me>
To: "Reshmithaa"<reshmithaa.b@zohocorp.com>
Cc: "pgsql-hackers"<pgsql-hackers@postgresql.org>
Date: Mon, 16 Feb 2026 16:13:07 +0530
Subject: Re: Crash related to Shared Memory

On 2/16/26 10:17, Reshmithaa wrote:

Hi,

The second trace

unfortunately does not show what happened to PID 2037344 before it

crashes, which would be very interesting to know. Was it an OOM too?

Kindly find the attachment below.

I have no idea what

LOG: 00000: PRIMITIVE NOT USED: map_pg_backend_pid
LOCATION: LogPrimFunctions, primitiveMgr.cpp:39
STATEMENT: select pg_backend_pid()

means, that's clearly something in your fork/extension.

regards

--
Tomas Vondra