Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

Started by Ana Almeida18 days ago6 messagesbugs
Jump to latest
#1Ana Almeida
Ana.Almeida@timestamp.pt

Hello,

While running a REINDEX TABLE CONCURRENTLY on a table indexes in a PostgreSQL 17.7 database, we got the following error:
2026-03-17 18:19:55.244 WAT [2261667] LOG: server process (PID 2382873) was terminated by signal 11: Segmentation fault
2026-03-17 18:19:55.244 WAT [2261667] DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY sibs.purchases;
2026-03-17 18:19:55.244 WAT [2261667] LOG: terminating any other active server processes
2026-03-17 18:19:55.257 WAT [2261667] LOG: all server processes terminated; reinitializing
2026-03-17 18:19:55.354 WAT [2382972] LOG: database system was interrupted; last known up at 2026-03-17 18:18:58 WAT
2026-03-17 18:19:55.449 WAT [2382972] LOG: database system was not properly shut down; automatic recovery in progress
2026-03-17 18:19:55.457 WAT [2382972] LOG: redo starts at 310/8142BEC0
2026-03-17 18:19:56.352 WAT [2382972] LOG: invalid record length at 310/988BAA18: expected at least 24, got 0
2026-03-17 18:19:56.352 WAT [2382972] LOG: redo done at 310/988BA9E0 system usage: CPU: user: 0.28 s, system: 0.34 s, elapsed: 0.89 s
2026-03-17 18:19:56.360 WAT [2382973] LOG: checkpoint starting: end-of-recovery immediate wait

This is the coredump associated with error.
PID: 2382873 (postgres)
UID: 26 (postgres)
GID: 26 (postgres)
Signal: 11 (SEGV)
Timestamp: Tue 2026-03-17 18:19:53 WAT (16h ago)
Command Line: postgres: postgres easysms_restore_250226 [local] REINDEX
Executable: /usr/pgsql-17/bin/postgres
Control Group: /system.slice/postgresql-17.service
Unit: postgresql-17.service
Slice: system.slice
Boot ID: 0a323ad20fcf403791a46425b08245f6
Machine ID: 9db3f77eadaf498bab9ee45abb0ba721
Hostname: cerstidbtstc
Storage: /var/lib/systemd/coredump/core.postgres.26.0a323ad20fcf403791a46425b08245f6.2382873.1773767993000000.lz4
Message: Process 2382873 (postgres) of user 26 dumped core.

Stack trace of thread 2382873:
#0 0x00000000005d67a8 validate_index_callback (postgres)
#1 0x00000000005738bd btvacuumpage (postgres)
#2 0x0000000000573d8a btvacuumscan (postgres)
#3 0x0000000000573f00 btbulkdelete (postgres)
#4 0x00000000005d9873 validate_index (postgres)
#5 0x000000000066b583 ReindexRelationConcurrently.isra.1 (postgres)
#6 0x000000000066c784 ExecReindex (postgres)
#7 0x0000000000870d2f ProcessUtilitySlow.isra.5 (postgres)
#8 0x000000000086fa60 standard_ProcessUtility (postgres)
#9 0x000000000086e4cf PortalRunUtility (postgres)
#10 0x000000000086e603 PortalRunMulti (postgres)
#11 0x000000000086eb5b PortalRun (postgres)
#12 0x000000000086adab exec_simple_query (postgres)
#13 0x000000000086c598 PostgresMain (postgres)
#14 0x00000000008672b5 BackendMain (postgres)
#15 0x00000000007d850f postmaster_child_launch (postgres)
#16 0x00000000007dbe5c ServerLoop (postgres)
#17 0x00000000007dda78 PostmasterMain (postgres)
#18 0x000000000050f843 main (postgres)
#19 0x00007f91323277c3 __libc_start_main (libc.so.6)
#20 0x000000000050fe1e _start (postgres)

We were able to run the REINDEX TABLE CONCURRENTLY successfully multiple times, so it doesn't happen in every execution of the REINDEX.

Cumprimentos,

Ana Almeida
Consultant
AMS - Advanced Managed Services
[cid:image001.png@01DCB6BB.B2ACC940]
_____________________________________________________________________________________________________________________________________________

Head Office Praça de Alvalade 6, 11 F, 1700-036 Lisboa
Oporto Office R. Dominguez Alvarez 44 - Esc 2.04, 4150-801 Porto
E-mail ana.almeida@timestamp.pt<mailto:ana.almeida@timestamp.pt>
Website www.timestampgroup.com<http://www.timestampgroup.com/&gt;
Phone +351 213 504 870
Mobile +351 910 234 477

_____________________________________________________________________________________________________________________________________________

AVISO LEGAL
Esta mensagem é confidencial e dirigida apenas ao destinatário. Se a recebeu por erro solicitamos que o comunique ao remetente e a elimine assim como qualquer documento anexo. Não há renúncia à confidencialidade nem a nenhum privilégio devido a erro de transmissão.
DISCLAIMER
This message is confidential and intended exclusively for the addressed. If you received this message by mistake please inform the sender and delete the message and attachments. No confidentiality nor any privilege regarding the information is waived or lost by any miscommunication.

Attachments:

image001.pngimage/png; name=image001.pngDownload
#2Jim Jones
jim.jones@uni-muenster.de
In reply to: Ana Almeida (#1)
Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

Hi Ana

On 18/03/2026 10:54, Ana Almeida wrote:

2026-03-17 18:19:55.244 WAT [2261667] LOG:  server process (PID 2382873)
was terminated by signal 11: Segmentation fault

2026-03-17 18:19:55.244 WAT [2261667] DETAIL:  Failed process was
running: REINDEX TABLE CONCURRENTLY sibs.purchases;

2026-03-17 18:19:55.244 WAT [2261667] LOG:  terminating any other active
server processes

2026-03-17 18:19:55.257 WAT [2261667] LOG:  all server processes
terminated; reinitializing

2026-03-17 18:19:55.354 WAT [2382972] LOG:  database system was
interrupted; last known up at 2026-03-17 18:18:58 WAT

2026-03-17 18:19:55.449 WAT [2382972] LOG:  database system was not
properly shut down; automatic recovery in progress

2026-03-17 18:19:55.457 WAT [2382972] LOG:  redo starts at 310/8142BEC0

2026-03-17 18:19:56.352 WAT [2382972] LOG:  invalid record length at
310/988BAA18: expected at least 24, got 0

2026-03-17 18:19:56.352 WAT [2382972] LOG:  redo done at 310/988BA9E0
system usage: CPU: user: 0.28 s, system: 0.34 s, elapsed: 0.89 s

2026-03-17 18:19:56.360 WAT [2382973] LOG:  checkpoint starting: end-of-
recovery immediate wait

I was unable to reproduce the bug. Could you share a bit more data on
the table and indexes that caused the system crash?

Best, Jim

#3Ana Almeida
Ana.Almeida@timestamp.pt
In reply to: Jim Jones (#2)
RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

Hello Jim,

I didn’t notice that the error showed the schema and table name. For confidentiality reasons, could you please not share the schema and table name if this is released as a bug?

Here is the information:

Table "myschema.mytable"

Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description

--------------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------

id | bigint | | not null | | plain | | |

axxxxxx | character varying(32) | | not null | | extended | | |

bxx | text | | not null | | extended | | |

cxxxxxxx | text | | not null | | extended | | |

dxxxxxxxx | text | | | | extended | | |

lag_val | text | | | | extended | | |

exxxxxxxxxx | text | | | | extended | | |

fxxxxxxxxxxxxx | text | | | | extended | | |

gxxxxxxxxxxxx | text | | | | extended | | |

hxxxxxx | numeric | | not null | | main | | |

ixxxxxxxxxxxxxx | numeric | | | | main | | |

jxxxxxxxxxxxxxx | numeric | | | | main | | |

kxxxxxx | integer | | | | plain | | |

lxxxxxxxxxxxx | integer | | not null | | plain | | |

mxxxxxxxxxxxxxx | timestamp without time zone | | | | plain | | |

nxxxxxxxxxxxxx | timestamp without time zone | | | | plain | | |

oxxxxxxxxxxxx | timestamp without time zone | | | | plain | | |

pxxxxxxxxxxx | timestamp without time zone | | not null | | plain | | |

qr_mydb_id | bigint | | | | plain | | |

qxxxxxx | character varying(100) | | | | extended | | |

Indexes:

"mytable_pkey" PRIMARY KEY, btree (id)

"idx_lag_val" btree (lag_val)

"idx_mytable_qr_mydb" btree (qr_mydb_id)

Foreign-key constraints:

"fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES myschema.qr_mydb(id)

Access method: heap

Options: autovacuum_enabled=true, toast.autovacuum_enabled=true

Just another note, before we also had the error below in the same reindex command. The database didn’t crash when that error happened but the reindex failed. After that, we recreated the table.

ERROR: could not open file "base/179146/184526.4" (target block 808464432): previous segment is only 99572 blocks

We haven’t been able to reproduce the errors again.

Cumprimentos,

Ana Almeida

-----Original Message-----
From: Jim Jones <jim.jones@uni-muenster.de>
Sent: 18 March 2026 13:10
To: Ana Almeida <Ana.Almeida@timestamp.pt>; pgsql-bugs@lists.postgresql.org
Subject: Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

[You don't often get email from jim.jones@uni-muenster.de<mailto:jim.jones@uni-muenster.de>. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Ana

On 18/03/2026 10:54, Ana Almeida wrote:

2026-03-17 18:19:55.244 WAT [2261667] LOG: server process (PID

2382873) was terminated by signal 11: Segmentation fault

2026-03-17 18:19:55.244 WAT [2261667] DETAIL: Failed process was

running: REINDEX TABLE CONCURRENTLY sibs.purchases;

2026-03-17 18:19:55.244 WAT [2261667] LOG: terminating any other

active server processes

2026-03-17 18:19:55.257 WAT [2261667] LOG: all server processes

terminated; reinitializing

2026-03-17 18:19:55.354 WAT [2382972] LOG: database system was

interrupted; last known up at 2026-03-17 18:18:58 WAT

2026-03-17 18:19:55.449 WAT [2382972] LOG: database system was not

properly shut down; automatic recovery in progress

2026-03-17 18:19:55.457 WAT [2382972] LOG: redo starts at

310/8142BEC0

2026-03-17 18:19:56.352 WAT [2382972] LOG: invalid record length at

310/988BAA18: expected at least 24, got 0

2026-03-17 18:19:56.352 WAT [2382972] LOG: redo done at 310/988BA9E0

system usage: CPU: user: 0.28 s, system: 0.34 s, elapsed: 0.89 s

2026-03-17 18:19:56.360 WAT [2382973] LOG: checkpoint starting:

end-of- recovery immediate wait

I was unable to reproduce the bug. Could you share a bit more data on the table and indexes that caused the system crash?

Best, Jim

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ana Almeida (#3)
Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

On 3/18/26 15:54, Ana Almeida wrote:

Hello Jim,

I didn’t notice that the error showed the schema and table name. For
confidentiality reasons, could you please not share the schema and table
name if this is released as a bug?

Here is the information:

 

                                                         Table
"myschema.mytable"

       Column       |            Type             | Collation | Nullable
| Default | Storage  | Compression | Stats target | Description

--------------------+-----------------------------+-----------
+----------+---------+----------+-------------+--------------+-------------

id                 | bigint                      |           | not null
|         | plain    |             |              |

axxxxxx            | character varying(32)       |           | not null
|         | extended |             |              |

bxx                | text                        |           | not null
|         | extended |             |              |

cxxxxxxx           | text                        |           | not null
|         | extended |             |              |

dxxxxxxxx          | text                        |           |         
|         | extended |             |              |

lag_val            | text                        |           |         
|         | extended |             |              |

exxxxxxxxxx        | text                        |           |         
|         | extended |             |              |

fxxxxxxxxxxxxx     | text                        |           |         
|         | extended |             |              |

gxxxxxxxxxxxx      | text                        |           |         
|         | extended |             |              |

hxxxxxx            | numeric                     |           | not null
|         | main     |             |              |

ixxxxxxxxxxxxxx    | numeric                     |           |         
|         | main     |             |              |

jxxxxxxxxxxxxxx    | numeric                     |           |         
|         | main     |             |              |

kxxxxxx            | integer                     |           |         
|         | plain    |             |              |

lxxxxxxxxxxxx      | integer                     |           | not null
|         | plain    |             |              |

mxxxxxxxxxxxxxx    | timestamp without time zone |           |         
|         | plain    |             |              |

nxxxxxxxxxxxxx     | timestamp without time zone |           |         
|         | plain    |             |              |

oxxxxxxxxxxxx      | timestamp without time zone |           |         
|         | plain    |             |              |

pxxxxxxxxxxx       | timestamp without time zone |           | not null
|         | plain    |             |              |

qr_mydb_id         | bigint                      |           |         
|         | plain    |             |              |

qxxxxxx            | character varying(100)      |           |         
|         | extended |             |              |

Indexes:

    "mytable_pkey" PRIMARY KEY, btree (id)

    "idx_lag_val" btree (lag_val)

    "idx_mytable_qr_mydb" btree (qr_mydb_id)

Foreign-key constraints:

    "fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES
myschema.qr_mydb(id)

Access method: heap

Options: autovacuum_enabled=true, toast.autovacuum_enabled=true

 

Just another note, before we also had the error below in the same
reindex command. The database didn’t crash when that error happened but
the reindex failed. After that, we recreated the table.

 

ERROR:  could not open file "base/179146/184526.4" (target block
808464432): previous segment is only 99572 blocks

So what was the sequence of events, exactly? You got this "could not
open file" error during REINDEX CONCURRENTLY, you recreated the table
and then it crashed on some later REINDEX CONCURRENTLY?

How did you recreate the table? Did you reload it from a backup or
something else?

We haven’t been able to reproduce the errors again.

That suggests it might have been some sort of data corruption, but it's
just a guess. Have you checked the server log if there are any messages
suggesting e.g. storage / memory issues or something like that?

Per the backtrace you shared in the previous message, the segfault
happened here:

#0 0x00000000005d67a8 validate_index_callback (postgres)
#1 0x00000000005738bd btvacuumpage (postgres)
#2 0x0000000000573d8a btvacuumscan (postgres)
#3 0x0000000000573f00 btbulkdelete (postgres)
...

Which is a very heavily exercised code, so I'm somewhat skeptical a bug
would go unnoticed for very long. It's possible, of course. But the
validate_index_callback doesn't do all that much - it just writes the
TID value to a tuplesort / temporary file.

It seems you have the core saved in a file:

Storage: /var/lib/systemd/coredump/core.postgres.26.0a32...

Can you try inspecting getting a better backtrace using gdb? It might
tell us if there's a bogus pointer or something like that. Or maybe not,
chances are the compiler optimized some of the variables, but it's worth
a try.

regards

--
Tomas Vondra

#5Ana Almeida
Ana.Almeida@timestamp.pt
In reply to: Tomas Vondra (#4)
RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

Hello Tomas,

We used GDB and generated a full backtrace, which we have annexed for reference.

#0 BlockIdGetBlockNumber (blockId=0x7f915c757010, blockId=0x7f915c757010) at ../../../src/include/postgres.h:405

No locals.

#1 ItemPointerGetBlockNumberNoCheck (pointer=<optimized out>) at ../../../src/include/storage/itemptr.h:95

No locals.

#2 ItemPointerGetBlockNumber (pointer=<optimized out>) at ../../../src/include/storage/itemptr.h:106

No locals.

#3 itemptr_encode (itemptr=<optimized out>) at ../../../src/include/catalog/index.h:191

block = <error reading variable block (Cannot access memory at address 0x7f915c757010)>

offset = <error reading variable offset (Cannot access memory at address 0x7f915c757014)>

encoded = <optimized out>

block = <optimized out>

offset = <optimized out>

encoded = <optimized out>

#4 validate_index_callback (itemptr=0x7f915c757010, opaque=0x7fff98611030) at index.c:3425

state = 0x7fff98611030

encoded = <error reading variable encoded (Cannot access memory at address 0x7f915c757010)>

The errors occurred in a test database that was created using pg_restore from a pg_dump of another database.

We executed DELETE, VACUUM, and REINDEX commands multiple times. During one of the executions, the REINDEX operation failed with the error: "could not open file". After this, we dropped and recreated the database and repeated the tests. In one of the executions, the same REINDEX command resulted in segmentation fault, which caused the database to crash. When the REINDEX command failed, it left behind temporary index copies with the _ccnew suffix. We manually dropped these indexes and re-ran the REINDEX command, which then completed successfully.

We also attempted to reproduce the issue by increasing certain parameters, such as maintenance_work_mem and max_parallel_maintenance_workers, suspecting it might be related to server resource constraints. However, we have not yet been able to reproduce the error.

Cumprimentos,

Ana Almeida

-----Original Message-----

From: Tomas Vondra <tomas@vondra.me<mailto:tomas@vondra.me>>

Sent: 18 March 2026 23:42

To: Ana Almeida <Ana.Almeida@timestamp.pt<mailto:Ana.Almeida@timestamp.pt>>; Jim Jones <jim.jones@uni-muenster.de<mailto:jim.jones@uni-muenster.de>>; pgsql-bugs@lists.postgresql.org<mailto:pgsql-bugs@lists.postgresql.org>

Cc: Nuno Azevedo <Nuno.Azevedo@timestamp.pt<mailto:Nuno.Azevedo@timestamp.pt>>

Subject: Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

[You don't often get email from tomas@vondra.me<mailto:tomas@vondra.me>. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

On 3/18/26 15:54, Ana Almeida wrote:

Hello Jim,

I didn’t notice that the error showed the schema and table name. For

confidentiality reasons, could you please not share the schema and

table name if this is released as a bug?

Here is the information:

Table

"myschema.mytable"

Column | Type | Collation | Nullable

| Default | Storage | Compression | Stats target | Description

--------------------+-----------------------------+-----------

+----------+---------+----------+-------------+--------------+--------

+----------+---------+----------+-------------+--------------+-----

id | bigint | | not null

| | plain | | |

axxxxxx | character varying(32) | | not null

| | extended | | |

bxx | text | | not null

| | extended | | |

cxxxxxxx | text | | not null

| | extended | | |

dxxxxxxxx | text | |

| | extended | | |

lag_val | text | |

| | extended | | |

exxxxxxxxxx | text | |

| | extended | | |

fxxxxxxxxxxxxx | text | |

| | extended | | |

gxxxxxxxxxxxx | text | |

| | extended | | |

hxxxxxx | numeric | | not null

| | main | | |

ixxxxxxxxxxxxxx | numeric | |

| | main | | |

jxxxxxxxxxxxxxx | numeric | |

| | main | | |

kxxxxxx | integer | |

| | plain | | |

lxxxxxxxxxxxx | integer | | not null

| | plain | | |

mxxxxxxxxxxxxxx | timestamp without time zone | |

| | plain | | |

nxxxxxxxxxxxxx | timestamp without time zone | |

| | plain | | |

oxxxxxxxxxxxx | timestamp without time zone | |

| | plain | | |

pxxxxxxxxxxx | timestamp without time zone | | not null

| | plain | | |

qr_mydb_id | bigint | |

| | plain | | |

qxxxxxx | character varying(100) | |

| | extended | | |

Indexes:

"mytable_pkey" PRIMARY KEY, btree (id)

"idx_lag_val" btree (lag_val)

"idx_mytable_qr_mydb" btree (qr_mydb_id)

Foreign-key constraints:

"fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES

myschema.qr_mydb(id)

Access method: heap

Options: autovacuum_enabled=true, toast.autovacuum_enabled=true

Just another note, before we also had the error below in the same

reindex command. The database didn’t crash when that error happened

but the reindex failed. After that, we recreated the table.

ERROR: could not open file "base/179146/184526.4" (target block

808464432): previous segment is only 99572 blocks

So what was the sequence of events, exactly? You got this "could not open file" error during REINDEX CONCURRENTLY, you recreated the table and then it crashed on some later REINDEX CONCURRENTLY?

How did you recreate the table? Did you reload it from a backup or something else?

We haven’t been able to reproduce the errors again.

That suggests it might have been some sort of data corruption, but it's just a guess. Have you checked the server log if there are any messages suggesting e.g. storage / memory issues or something like that?

Per the backtrace you shared in the previous message, the segfault happened here:

#0 0x00000000005d67a8 validate_index_callback (postgres)

#1 0x00000000005738bd btvacuumpage (postgres)

#2 0x0000000000573d8a btvacuumscan (postgres)

#3 0x0000000000573f00 btbulkdelete (postgres)

...

Which is a very heavily exercised code, so I'm somewhat skeptical a bug would go unnoticed for very long. It's possible, of course. But the validate_index_callback doesn't do all that much - it just writes the TID value to a tuplesort / temporary file.

It seems you have the core saved in a file:

Storage: /var/lib/systemd/coredump/core.postgres.26.0a32...

Can you try inspecting getting a better backtrace using gdb? It might tell us if there's a bogus pointer or something like that. Or maybe not, chances are the compiler optimized some of the variables, but it's worth a try.

regards

--

Tomas Vondra

Attachments:

bt_core_postgresql.txttext/plain; name=bt_core_postgresql.txtDownload
#6Michael Paquier
michael@paquier.xyz
In reply to: Ana Almeida (#5)
Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

On Fri, Mar 20, 2026 at 12:26:36PM +0000, Ana Almeida wrote:

We executed DELETE, VACUUM, and REINDEX commands multiple
times. During one of the executions, the REINDEX operation failed
with the error: "could not open file". After this, we dropped and
recreated the database and repeated the tests. In one of the
executions, the same REINDEX command resulted in segmentation fault,
which caused the database to crash. When the REINDEX command failed,
it left behind temporary index copies with the _ccnew suffix. We
manually dropped these indexes and re-ran the REINDEX command, which
then completed successfully.

I am afraid that it is one of these cases where having a
self-contained workload to be able to reproduce the issue, even if the
issue can be hit at a very low rate, would be super useful before we
could categorize that as a backend core issue. I could buy that there
is an in-core problem, but it's hard to justify the time investment
based on an assumption that we may have a problem.

FWIW, I have not seen such problematic error patterns lately, so it's
hard to say. I'll double-check around me and see if there is some data
to fish. Perhaps there is something matching with a portion of your
problematic patterns, or that has a similar smell.
--
Michael