Segmentation fault - PostgreSQL 17.0

Started by Ľuboslav Špilákover 1 year ago17 messagesbugs

lspilak@microstep-hdo.sk

over 1 year ago

Hello.

I am trying postgresql17 partitioning tables and check brin indexes with error Segmenation fault.
We upgraded db from last PostgreSQL12 to 17.0 using pg_upgrade binary.
Everything seems to be OK. Only this select is problem.

select * from brin_page_items(
get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
'test1_table_2022q3_timeseries_id_time_idx'
)

Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-200-generic x86_64)
PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

2024-11-08 18:12:20.861 CET [12350] LOG: starting PostgreSQL 17.0 (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-08 18:12:20.864 CET [12350] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-11-08 18:12:20.867 CET [12350] LOG: could not create IPv6 socket for address "::": Address family not supported by protocol
2024-11-08 18:12:20.868 CET [12350] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-08 18:12:20.878 CET [12357] LOG: database system was shut down at 2024-11-08 18:12:19 CET
2024-11-08 18:12:20.890 CET [12350] LOG: database system is ready to accept connections
2024-11-08 18:12:41.055 CET [12350] LOG: server process (PID 12376) was terminated by signal 11: Segmentation fault
2024-11-08 18:12:41.055 CET [12350] DETAIL: Failed process was running: select *
from brin_page_items(
get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
'test1_table_2022q3_timeseries_id_time_idx'
)
2024-11-08 18:12:41.055 CET [12350] LOG: terminating any other active server processes
2024-11-08 18:12:41.058 CET [12350] LOG: all server processes terminated; reinitializing
2024-11-08 18:12:41.276 CET [12379] LOG: database system was interrupted; last known up at 2024-11-08 18:12:20 CET
2024-11-08 18:12:41.293 CET [12382] postgres@xtimeseries FATAL: the database system is in recovery mode
2024-11-08 18:12:41.319 CET [12383] postgres@xtimeseries FATAL: the database system is in recovery mode
2024-11-08 18:12:41.346 CET [12384] postgres@xtimeseries FATAL: the database system is in recovery mode
2024-11-08 18:12:41.364 CET [12379] LOG: database system was not properly shut down; automatic recovery in progress
2024-11-08 18:12:41.371 CET [12379] LOG: redo starts at FE49/7A5FBCD0
2024-11-08 18:12:41.371 CET [12379] LOG: invalid record length at FE49/7A5FBD08: expected at least 24, got 0
2024-11-08 18:12:41.371 CET [12379] LOG: redo done at FE49/7A5FBCD0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-11-08 18:12:41.371 CET [12385] postgres@xtimeseries FATAL: the database system is in recovery mode
2024-11-08 18:12:41.384 CET [12380] LOG: checkpoint starting: end-of-recovery immediate wait
2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries FATAL: the database system is not yet accepting connections
2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries DETAIL: Consistent recovery state has not been yet reached.
2024-11-08 18:12:41.408 CET [12380] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.017 s, sync=0.002 s, total=0.026 s; sync files=3, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=FE49/7A5FBD08, redo lsn=FE49/7A5FBD08
2024-11-08 18:12:41.413 CET [12350] LOG: database system is ready to accept connections

What could be wrong?
Thank You.

Best regards, Ľubo

________________________________

Textom tejto emailovej správy odosielateľ nesľubuje ani neuzatvára za spoločnosť MicroStep – HDO s.r.o. žiadnu zmluvu, nakoľko naša spoločnosť uzatvára každú zmluvu výlučne v písomnej forme. Ak Vám bol tento e-mail zaslaný omylom, prosím upozornite odosielateľa a tento e-mail odstráňte.

The sender of this e-mail message does not promise nor shall conclude any contract on the behalf of the company MicroStep HDO s.r.o. as our company enters into any contract exclusively in writing. If you have been sent this email in error, please notify the sender and delete this email.

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#1)

Re: Segmentation fault - PostgreSQL 17.0

On 11/8/24 18:22, Ľuboslav Špilák wrote:

Hello.

I am trying postgresql17 partitioning tables and check brin indexes with
error* Segmenation fault.*
We upgraded db from last PostgreSQL12 to 17.0 using pg_upgrade binary.
Everything seems to be OK. Only this select is problem.

select * from brin_page_items(
get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
'test1_table_2022q3_timeseries_id_time_idx'
)

Welcome to *Ubuntu 20.04.6 *LTS (GNU/Linux 5.4.0-200-generic x86_64)
*PostgreSQL 17.0* (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

2024-11-08 18:12:20.861 CET [12350] LOG: starting PostgreSQL 17.0
(Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-08 18:12:20.864 CET [12350] LOG: listening on IPv4 address
"0.0.0.0", port 5432
2024-11-08 18:12:20.867 CET [12350] LOG: could not create IPv6 socket
for address "::": Address family not supported by protocol
2024-11-08 18:12:20.868 CET [12350] LOG: listening on Unix socket "/
var/run/postgresql/.s.PGSQL.5432"
2024-11-08 18:12:20.878 CET [12357] LOG: database system was shut down
at 2024-11-08 18:12:19 CET
2024-11-08 18:12:20.890 CET [12350] LOG: database system is ready to
accept connections
2024-11-08 18:12:41.055 CET [12350] LOG: *server process (PID 12376)
was terminated by signal 11: Segmentation fault*
2024-11-08 18:12:41.055 CET [12350] DETAIL: Failed process was running:
select *
from brin_page_items(
get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
'test1_table_2022q3_timeseries_id_time_idx'
)
2024-11-08 18:12:41.055 CET [12350] LOG: terminating any other active
server processes
2024-11-08 18:12:41.058 CET [12350] LOG: all server processes
terminated; reinitializing
2024-11-08 18:12:41.276 CET [12379] LOG: database system was
interrupted; last known up at 2024-11-08 18:12:20 CET
2024-11-08 18:12:41.293 CET [12382] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.319 CET [12383] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.346 CET [12384] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.364 CET [12379] LOG: database system was not
properly shut down; automatic recovery in progress
2024-11-08 18:12:41.371 CET [12379] LOG: redo starts at FE49/7A5FBCD0
2024-11-08 18:12:41.371 CET [12379] LOG: invalid record length at
FE49/7A5FBD08: expected at least 24, got 0
2024-11-08 18:12:41.371 CET [12379] LOG: redo done at FE49/7A5FBCD0
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-11-08 18:12:41.371 CET [12385] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.384 CET [12380] LOG: checkpoint starting: end-of-
recovery immediate wait
2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries FATAL: the
database system is not yet accepting connections
2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries DETAIL:
Consistent recovery state has not been yet reached.
2024-11-08 18:12:41.408 CET [12380] LOG: checkpoint complete: wrote 2
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.017
s, sync=0.002 s, total=0.026 s; sync files=3, longest=0.001 s,
average=0.001 s; distance=0 kB, estimate=0 kB; lsn=FE49/7A5FBD08, redo
lsn=FE49/7A5FBD08
2024-11-08 18:12:41.413 CET [12350] LOG: database system is ready to
accept connections

What could be wrong?

Hard to say, really. It would be interesting to see the backtrace from
the crash.

Considering you're able to trigger the issue easily, it shouldn't be too
difficult to attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace
from that.

Presumably the index is a simple BRIN minmax index? Or what opclass does
it use? Any special parameters? Is the index working otherwise,
producing correct results?

regards

--
Tomas Vondra

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Tomas Vondra (#2)

Re: Segmentation fault - PostgreSQL 17.0

Ahoj/Hello.

On migrated db.
In postgres or public schema (im not sure now) I created the table with one column int8 - cas (unixtime)
Then I create index brin on that column (by cas/unixtime).
Insert only one row.
Then I Vacuumed table.

I want to check brin index with
Funkcions:

brin_metapage_info .. ok
brin_revmap_data .. ok
brin_page_items .. sigsegv

This is done repeatedly on my migrated db.

On Monday I could try create new cluster / empty database and try the same again.

I must google it to know how:
"attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace "

In Your pg17 db this funkction works correctly?

Thank You. Lubo.

________________________________
From: Tomas Vondra <tomas@vondra.me>
Sent: Saturday, November 9, 2024 1:01:25 PM
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On 11/8/24 18:22, Ľuboslav Špilák wrote:

Hello.

I am trying postgresql17 partitioning tables and check brin indexes with
error* Segmenation fault.*
We upgraded db from last PostgreSQL12 to 17.0 using pg_upgrade binary.
Everything seems to be OK. Only this select is problem.

select * from brin_page_items(
get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
'test1_table_2022q3_timeseries_id_time_idx'
)

Welcome to *Ubuntu 20.04.6 *LTS (GNU/Linux 5.4.0-200-generic x86_64)
*PostgreSQL 17.0* (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit

2024-11-08 18:12:20.861 CET [12350] LOG: starting PostgreSQL 17.0
(Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-11-08 18:12:20.864 CET [12350] LOG: listening on IPv4 address
"0.0.0.0", port 5432
2024-11-08 18:12:20.867 CET [12350] LOG: could not create IPv6 socket
for address "::": Address family not supported by protocol
2024-11-08 18:12:20.868 CET [12350] LOG: listening on Unix socket "/
var/run/postgresql/.s.PGSQL.5432"
2024-11-08 18:12:20.878 CET [12357] LOG: database system was shut down
at 2024-11-08 18:12:19 CET
2024-11-08 18:12:20.890 CET [12350] LOG: database system is ready to
accept connections
2024-11-08 18:12:41.055 CET [12350] LOG: *server process (PID 12376)
was terminated by signal 11: Segmentation fault*
2024-11-08 18:12:41.055 CET [12350] DETAIL: Failed process was running:
select *
from brin_page_items(
get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
'test1_table_2022q3_timeseries_id_time_idx'
)
2024-11-08 18:12:41.055 CET [12350] LOG: terminating any other active
server processes
2024-11-08 18:12:41.058 CET [12350] LOG: all server processes
terminated; reinitializing
2024-11-08 18:12:41.276 CET [12379] LOG: database system was
interrupted; last known up at 2024-11-08 18:12:20 CET
2024-11-08 18:12:41.293 CET [12382] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.319 CET [12383] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.346 CET [12384] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.364 CET [12379] LOG: database system was not
properly shut down; automatic recovery in progress
2024-11-08 18:12:41.371 CET [12379] LOG: redo starts at FE49/7A5FBCD0
2024-11-08 18:12:41.371 CET [12379] LOG: invalid record length at
FE49/7A5FBD08: expected at least 24, got 0
2024-11-08 18:12:41.371 CET [12379] LOG: redo done at FE49/7A5FBCD0
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-11-08 18:12:41.371 CET [12385] postgres@xtimeseries FATAL: the
database system is in recovery mode
2024-11-08 18:12:41.384 CET [12380] LOG: checkpoint starting: end-of-
recovery immediate wait
2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries FATAL: the
database system is not yet accepting connections
2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries DETAIL:
Consistent recovery state has not been yet reached.
2024-11-08 18:12:41.408 CET [12380] LOG: checkpoint complete: wrote 2
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.017
s, sync=0.002 s, total=0.026 s; sync files=3, longest=0.001 s,
average=0.001 s; distance=0 kB, estimate=0 kB; lsn=FE49/7A5FBD08, redo
lsn=FE49/7A5FBD08
2024-11-08 18:12:41.413 CET [12350] LOG: database system is ready to
accept connections

What could be wrong?

Hard to say, really. It would be interesting to see the backtrace from
the crash.

Presumably the index is a simple BRIN minmax index? Or what opclass does
it use? Any special parameters? Is the index working otherwise,
producing correct results?

regards

--
Tomas Vondra

________________________________

Peter Geoghegan

pg@bowt.ie

over 1 year ago

In reply to: Tomas Vondra (#2)

Re: Segmentation fault - PostgreSQL 17.0

On Sat, Nov 9, 2024 at 7:01 AM Tomas Vondra <tomas@vondra.me> wrote:

Considering you're able to trigger the issue easily, it shouldn't be too
difficult to attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace
from that.

This query involves the use of a pageinspect function that accepts a
raw page image. There are some sanity checks of the page, but those
are quite lightweight. It's really not that hard to imagine it
segfaulting from a page image that passes those checks by mistake, but
is nevertheless not a valid BRIN page.

In any case this should be easy to debug: save the page image that the
function segfaults on, verify that it doesn't contain confidential
information, and then post it here. See:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump

--
Peter Geoghegan

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#3)

Re: Segmentation fault - PostgreSQL 17.0

On 11/9/24 14:02, Ľuboslav Špilák wrote:

Ahoj/Hello.

On migrated db.
In postgres or public schema (im not sure now) I created the table with
one column int8 - cas (unixtime)
Then I create index brin on that column (by cas/unixtime).
Insert only one row.
Then I Vacuumed table.

So is this a tiny single-row table? Did you create it on PG17, or before
running pg_upgrade?

I want to check brin index with
Funkcions:

|brin_metapage_info .. ok|
|brin_revmap_data .. ok|
|brin_page_items .. sigsegv|

This is done repeatedly on my migrated db.

On Monday I could try create new cluster / empty database and try the
same again.

I must google it to know how:
"attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace "

There are wiki pages [1] and [2] with instructions how to do this. But
in short, connect to the DB, get PID using

SELECT pg_backend_pid();

attach gdb to that backend

gdb -p $PID

Hit 'c' to continue running the program, and run the crashing query in
the client. The gdb session will interrupt on the segfault, and you'll
be able to get backtrace by 'bt'.

In Your pg17 db this funkction works correctly?

It works for me, yes. this is what I tried:

create table t (a bigint);
insert into t values (1);
create index on t using brin (a);

select * from brin_metapage_info(get_raw_page('t_a_idx', 0));
magic | version | pagesperrange | lastrevmappage
------------+---------+---------------+----------------
0xA8109CFA | 1 | 128 | 1
(1 row)

select * from brin_revmap_data(get_raw_page('t_a_idx', 1));
pages
-------
(2,1)
(0,0)
(0,0)
...

select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx');
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |
empty | value
------------+--------+--------+----------+----------+-------------+-------+----------
1 | 0 | 1 | f | f | f | f
| {1 .. 1}
(1 row)

But this is just a very simple test.

regards

--
Tomas Vondra

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Tomas Vondra (#5)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

After pg_upgrade there was 200 timeseries tables in xtimeseries database.
Each about 2GB with two indexes. One Btree and second brin index. Brin on two columns (timeseries_id, time).

I created copy of one table and try change table to partitioned table, partigioned by time. One quartal to one partition.

Analyse, vacuum table.

Then i figured out it sigsegv on brin_page_items Function repeatedly.

So THEN I try create new test table on pg17. I dont know now if I created table on different database or only on different schema, but on the same db cluster.
Test table with one column with one brin index on that column. Insert only one row. I Vacuumed this test table. I try again sequence of three brin functions to check if brin index is computed. Third function brin_page_items caused sigsegv again.

Thank you.
Best regards,
Lubo

________________________________
From: Tomas Vondra <tomas@vondra.me>
Sent: Saturday, November 9, 2024 5:07:48 PM
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On 11/9/24 14:02, Ľuboslav Špilák wrote:

Ahoj/Hello.

On migrated db.
In postgres or public schema (im not sure now) I created the table with
one column int8 - cas (unixtime)
Then I create index brin on that column (by cas/unixtime).
Insert only one row.
Then I Vacuumed table.

So is this a tiny single-row table? Did you create it on PG17, or before
running pg_upgrade?

I want to check brin index with
Funkcions:

|brin_metapage_info .. ok|
|brin_revmap_data .. ok|
|brin_page_items .. sigsegv|

This is done repeatedly on my migrated db.

On Monday I could try create new cluster / empty database and try the
same again.

I must google it to know how:
"attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace "

There are wiki pages [1] and [2] with instructions how to do this. But
in short, connect to the DB, get PID using

SELECT pg_backend_pid();

attach gdb to that backend

gdb -p $PID

Hit 'c' to continue running the program, and run the crashing query in
the client. The gdb session will interrupt on the segfault, and you'll
be able to get backtrace by 'bt'.

In Your pg17 db this funkction works correctly?

It works for me, yes. this is what I tried:

create table t (a bigint);
insert into t values (1);
create index on t using brin (a);

select * from brin_metapage_info(get_raw_page('t_a_idx', 0));
magic | version | pagesperrange | lastrevmappage
------------+---------+---------------+----------------
0xA8109CFA | 1 | 128 | 1
(1 row)

select * from brin_revmap_data(get_raw_page('t_a_idx', 1));
pages
-------
(2,1)
(0,0)
(0,0)
...

But this is just a very simple test.

regards

--
Tomas Vondra

________________________________

Textom tejto emailovej správy odosielateľ nesľubuje ani neuzatvára za spoločnosť MicroStep - HDO s.r.o. žiadnu zmluvu, nakoľko naša spoločnosť uzatvára každú zmluvu výlučne v písomnej forme. Ak Vám bol tento e-mail zaslaný omylom, prosím upozornite odosielateľa a tento e-mail odstráňte.

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#6)

Re: Segmentation fault - PostgreSQL 17.0

On 11/9/24 17:35, Ľuboslav Špilák wrote:

Hello.

After pg_upgrade there was 200 timeseries tables in xtimeseries database.
Each about 2GB with two indexes. One Btree and second brin index. Brinis the case
on two columns (timeseries_id, time).

I created copy of one table and try change table to partitioned table,
partigioned by time. One quartal to one partition.

Analyse, vacuum table.

Then i figured out it sigsegv on brin_page_items Function repeatedly.

So THEN I try create new test table on pg17. I dont know now if I
created table on different database or only on different schema, but on
the same db cluster.
Test table with one column with one brin index on that column. Insert
only one row. I Vacuumed this test table. I try again sequence of three
brin functions to check if brin index is computed. Third
function brin_page_items caused sigsegv again.

I'm a bit confused about what exactly are the cases that fail. But if
you're observing crashes even on tables/indexes created on PG17 after
the upgrade, it's unlikely related to the upgrade.

Please, get the backtrace when you have access to the system.

Is there anything special about the system? Which repository are you
using for postgres packages?

regards

--
Tomas Vondra

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Peter Geoghegan (#4)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

I am sending you the dump file from command:
Postgres@hdoppxendb1:~$ PGOPTIONS="-c search_path=\"XEN_TS\"" psql -XAt -d "xtimeseries" -c "SELECT encode(get_raw_page('test_idxbrin', 2),'base64')" | base64 -d > dump_block_2.page

The steps for preparing table and index are:

CREATE TABLE test (
cas int8 NULL
);

CREATE INDEX test_idxbrin ON test USING brin (cas) WITH (pages_per_range='32');

insert into test values (123)

analyse test

vacuum test

CREATE extension pageinspect;

SELECT brin_page_type(get_raw_page('test_idxbrin', 0));

select * from "XEN_TS".brin_metapage_info(get_raw_page('test_idxbrin',0));

select * from brin_revmap_data(get_raw_page('test_idxbrin',1)) limit 1000;

[cid:8ee2db51-07e6-4d71-a134-5a6a5954a9d7]

select *
from brin_page_items(
get_raw_page('test_idxbrin',2),
'test_idxbrin'
);

Last select returns this error:

SQL Error [57P03]: FATAL: the database system is not yet accepting connections
Detail: Consistent recovery state has not been yet reached.

I am working on getting the backtrace.

Thank You.

Best regards, Lubo

________________________________
From: Peter Geoghegan <pg@bowt.ie>
Sent: Saturday, 9 November 2024 16:53
To: Tomas Vondra <tomas@vondra.me>
Cc: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On Sat, Nov 9, 2024 at 7:01 AM Tomas Vondra <tomas@vondra.me> wrote:

Considering you're able to trigger the issue easily, it shouldn't be too
difficult to attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace
from that.

In any case this should be easy to debug: save the page image that the
function segfaults on, verify that it doesn't contain confidential
information, and then post it here. See:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump

--
Peter Geoghegan
________________________________

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Ľuboslav Špilák (#8)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

After creating new database cluster (5433) in Postgresql 17 there was no problem with calling function
select * from brin_page_items(
get_raw_page(

On the pg_upgraded cluster I got this backtrace on sigsegv. Is this helpful or do I need to include any more information?

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x5627a1df38c0, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file or directory.
(gdb) bt
#0 0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x5627a1df38c0, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
#1 0x0000562775221e4f in heap_form_minimal_tuple (tupleDescriptor=0x5627a1df38c0, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:1492
#2 0x00005627756f0e45 in tuplestore_putvalues (state=0x5627a1df3cc8, tdesc=<optimized out>, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/utils/sort/tuplestore.c:756
#3 0x00007fc7e2d0d9eb in brin_page_items (fcinfo=<optimized out>) at ./build/../contrib/pageinspect/brinfuncs.c:300
#4 0x00005627753d435c in ExecMakeTableFunctionResult (setexpr=0x5627a1df9370, econtext=0x5627a1df9258, argContext=<optimized out>, expectedDesc=0x5627a1dfa4e0, randomAccess=false)
at ./build/../src/backend/executor/execSRF.c:234
#5 0x00005627753e527a in FunctionNext (node=node@entry=0x5627a1df9050) at ./build/../src/backend/executor/nodeFunctionscan.c:93
#6 0x00005627753d4df9 in ExecScanFetch (recheckMtd=0x5627753e4f50 <FunctionRecheck>, accessMtd=0x5627753e4f80 <FunctionNext>, node=0x5627a1df9050)
at ./build/../src/backend/executor/execScan.c:131
#7 ExecScan (node=0x5627a1df9050, accessMtd=0x5627753e4f80 <FunctionNext>, recheckMtd=0x5627753e4f50 <FunctionRecheck>) at ./build/../src/backend/executor/execScan.c:180
#8 0x00005627753cb7bb in ExecProcNode (node=0x5627a1df9050) at ./build/../src/include/executor/executor.h:274
#9 ExecutePlan (execute_once=<optimized out>, dest=0x5627a1c89478, direction=<optimized out>, numberTuples=200, sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x5627a1df9050, estate=0x5627a1df8e38) at ./build/../src/backend/executor/execMain.c:1648
#10 standard_ExecutorRun (queryDesc=0x5627a1cdeef0, direction=<optimized out>, count=200, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:365
#11 0x000056277557966e in PortalRunSelect (portal=0x5627a1d43188, forward=<optimized out>, count=200, dest=<optimized out>) at ./build/../src/backend/tcop/pquery.c:924
#12 0x000056277557a9b6 in PortalRun (portal=portal@entry=0x5627a1d43188, count=count@entry=200, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=false, dest=dest@entry=0x5627a1c89478,
altdest=altdest@entry=0x5627a1c89478, qc=0x7fff4744aa60) at ./build/../src/backend/tcop/pquery.c:768
#13 0x000056277557817e in exec_execute_message (max_rows=200, portal_name=0x5627a1c88fe8 "") at ./build/../src/backend/tcop/postgres.c:2255
#14 PostgresMain (dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4834
#15 0x0000562775573423 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ./build/../src/backend/tcop/backend_startup.c:105
#16 0x00005627754e366e in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7fff4744ad70 "", startup_data_len=startup_data_len@entry=4,
client_sock=client_sock@entry=0x7fff4744ad90) at ./build/../src/backend/postmaster/launch_backend.c:277
#17 0x00005627754e7229 in BackendStartup (client_sock=0x7fff4744ad90) at ./build/../src/backend/postmaster/postmaster.c:3593
#18 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1674
#19 0x00005627754e8dbd in PostmasterMain (argc=<optimized out>, argv=0x5627a1c82f10) at ./build/../src/backend/postmaster/postmaster.c:1372
#20 0x0000562775212df0 in main (argc=5, argv=0x5627a1c82f10) at ./build/../src/backend/main/main.c:197
(gdb)

Best regards, Lubo

________________________________
From: Ľuboslav Špilák <lspilak@microstep-hdo.sk>
Sent: Monday, 11 November 2024 09:25
To: Peter Geoghegan <pg@bowt.ie>; Tomas Vondra <tomas@vondra.me>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

Hello.

The steps for preparing table and index are:

CREATE TABLE test (
cas int8 NULL
);

CREATE INDEX test_idxbrin ON test USING brin (cas) WITH (pages_per_range='32');

insert into test values (123)

analyse test

vacuum test

CREATE extension pageinspect;

SELECT brin_page_type(get_raw_page('test_idxbrin', 0));

select * from "XEN_TS".brin_metapage_info(get_raw_page('test_idxbrin',0));

select * from brin_revmap_data(get_raw_page('test_idxbrin',1)) limit 1000;

[cid:8ee2db51-07e6-4d71-a134-5a6a5954a9d7]

select *
from brin_page_items(
get_raw_page('test_idxbrin',2),
'test_idxbrin'
);

Last select returns this error:

SQL Error [57P03]: FATAL: the database system is not yet accepting connections
Detail: Consistent recovery state has not been yet reached.

I am working on getting the backtrace.

Thank You.

Best regards, Lubo

On Sat, Nov 9, 2024 at 7:01 AM Tomas Vondra <tomas@vondra.me> wrote:

Considering you're able to trigger the issue easily, it shouldn't be too
difficult to attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace
from that.

In any case this should be easy to debug: save the page image that the
function segfaults on, verify that it doesn't contain confidential
information, and then post it here. See:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump

--
Peter Geoghegan
________________________________

#10

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#9)

Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 10:30, Ľuboslav Špilák wrote:

Hello.

After creating new database cluster (5433) in Postgresql 17 there was no
problem with calling function
select * from brin_page_items(
get_raw_page(

On the pg_upgraded cluster I got this backtrace on sigsegv. Is this
helpful or do I need to include any more information?

Could you maybe try on a completely new 17.0 cluster, not one that went
through pg_upgrade? I don't think pg_upgrade should cause anything like
this, but it'd be good to conclusively rule that out by reproducing the
issue on a fresh cluster.

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x5627a1df38c0,
values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file
or directory.
(gdb) bt
#0 0x00005627752205d5 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x5627a1df38c0,
values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234

This is ... weird. heap_compute_data_size literally didn't change for
the last 9 years, so it's the same for 12 and 17. Line 234 is this:

Size
heap_compute_data_size(TupleDesc tupleDesc,
const Datum *values,
const bool *isnull)
{
Size data_length = 0;
int i;
int numberOfAttributes = tupleDesc->natts;

for (i = 0; i < numberOfAttributes; i++)
{
Datum val;
Form_pg_attribute atti;

if (isnull[i])
continue;

val = values[i];
atti = TupleDescAttr(tupleDesc, i);

if (ATT_IS_PACKABLE(atti) &&
VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))

I wonder which of the conditions triggers the segfault. Whether the one
accessing the attribute info (atti), or the one checking the pointer. It
has to be the first, because we're dealing with int8, and that's not a
varlena type, so it's not packable. So my guess would be atti is some
bogus pointer, with garbage.

Could you please print variables "i", "numberOfAttributes" and then also
the contents of tupleDesc and atti?

print i
print numberOfAttributes
print *tupleDesc
print *atti

regards

--
Tomas Vondra

#11

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#8)

Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 09:25, Ľuboslav Špilák wrote:

Hello.

I am sending you the dump file from command:
Postgres@hdoppxendb1:~$ *PGOPTIONS="-c search_path=\"XEN_TS\""
psql -XAt -d "xtimeseries" -c "SELECT
encode(get_raw_page('test_idxbrin', 2),'base64')" | base64 -d >
dump_block_2.page*

The steps for preparing table and index are:

CREATE TABLE test (
cas int8 NULL
);

CREATE INDEX test_idxbrin ON test USING brin (cas) WITH
(pages_per_range='32');

It took me a while to get this working. It was failing for me with

ERROR: column "cas" does not exist

because the spaces in CREATE TABLE are actually not regular spaces, but
"EN SPACES" (U+2002), which we just consider not-whitespace, and include
them in the column name.

Presumably it's been added by the mail client.

insert into test values (123)

analyse test

vacuum test

CREATE extension pageinspect;

SELECT brin_page_type(get_raw_page('test_idxbrin', 0));

select * from "XEN_TS".brin_metapage_info(get_raw_page('test_idxbrin',0));

select * from brin_revmap_data(get_raw_page('test_idxbrin',1)) limit 1000;

select *
from brin_page_items(
get_raw_page('test_idxbrin',2),
'test_idxbrin'
);

Last select returns this error:

SQL Error [57P03]: FATAL: the database system is not yet accepting
connections
Detail: Consistent recovery state has not been yet reached.

I am working on getting the backtrace.

Well, all of this works just fine for me :-( I even tried on a cluster
that went thought the same PG12 -> PG17 pg_upgrade, but all of that
works. Even reading the page works fine:

test=# select lo_import('/tmp/dump_block_2.page');
lo_import
-----------
16443
(1 row)

test=# select * from brin_page_items(lo_get(16443), 'test_idxbrin');
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |
empty | value
------------+--------+--------+----------+----------+-------------+-------+--------------
1 | 0 | 1 | f | f | f | f
| {123 .. 123}
(1 row)

Not sure what's going on. Can you maybe share which exact Ubuntu version
and packages you use?

Is there anything special about the system? Do you use extensions?

regards

--
Tomas Vondra

#12

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Tomas Vondra (#10)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

Could you maybe try on a completely new 17.0 cluster, not one that went
through pg_upgrade? I don't think pg_upgrade should cause anything like
this, but it'd be good to conclusively rule that out by reproducing the
issue on a fresh cluster.

We can't reproduce the problem on a completely new 17.0 cluster.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x5627a1e1eea0, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file or directory.
(gdb) print i
$1 = 6
(gdb) print numberOfAttributes
$2 = <optimized out>
(gdb) print *tupleDesc
$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr = 0x0, attrs = 0x5627a1e1eeb8}
(gdb) print *atti
$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58 times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1, atttypmod = -1, attndims = 0, attbyval = false,
attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000', attnotnull = false, atthasdef = false, atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000',
attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 100}
(gdb) print val
$5 = 0
(gdb) print values[0]
$6 = 1
(gdb) print values[1]
$7 = 0
(gdb) print values[2]
$8 = 1
(gdb) print values[3]
$9 = 0
(gdb) print values[4]
$10 = 0
(gdb) print values[5]
$11 = 0
(gdb) print values[6]
$12 = 0
(gdb) print values[7]
$13 = 94728219153600
(gdb) bt
#0 0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x5627a1e1eea0, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
#1 0x0000562775221e4f in heap_form_minimal_tuple (tupleDescriptor=0x5627a1e1eea0, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:1492
#2 0x00005627756f0e45 in tuplestore_putvalues (state=0x5627a1e1f2a8, tdesc=<optimized out>, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/utils/sort/tuplestore.c:756
#3 0x00007fc7e2d0d9eb in brin_page_items (fcinfo=<optimized out>) at ./build/../contrib/pageinspect/brinfuncs.c:300
#4 0x00005627753d435c in ExecMakeTableFunctionResult (setexpr=0x5627a1e1ba40, econtext=0x5627a1e1b928, argContext=<optimized out>, expectedDesc=0x5627a1e1cbb0, randomAccess=false)
at ./build/../src/backend/executor/execSRF.c:234
#5 0x00005627753e527a in FunctionNext (node=node@entry=0x5627a1e1b720) at ./build/../src/backend/executor/nodeFunctionscan.c:93
#6 0x00005627753d4df9 in ExecScanFetch (recheckMtd=0x5627753e4f50 <FunctionRecheck>, accessMtd=0x5627753e4f80 <FunctionNext>, node=0x5627a1e1b720)
at ./build/../src/backend/executor/execScan.c:131
#7 ExecScan (node=0x5627a1e1b720, accessMtd=0x5627753e4f80 <FunctionNext>, recheckMtd=0x5627753e4f50 <FunctionRecheck>) at ./build/../src/backend/executor/execScan.c:180
#8 0x00005627753cb7bb in ExecProcNode (node=0x5627a1e1b720) at ./build/../src/include/executor/executor.h:274
#9 ExecutePlan (execute_once=<optimized out>, dest=0x5627a1c89478, direction=<optimized out>, numberTuples=200, sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x5627a1e1b720, estate=0x5627a1e1b508) at ./build/../src/backend/executor/execMain.c:1648
#10 standard_ExecutorRun (queryDesc=0x5627a1da2d20, direction=<optimized out>, count=200, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:365
#11 0x000056277557966e in PortalRunSelect (portal=0x5627a1d43188, forward=<optimized out>, count=200, dest=<optimized out>) at ./build/../src/backend/tcop/pquery.c:924
#12 0x000056277557a9b6 in PortalRun (portal=0x5627a1d43188, count=200, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x5627a1c89478, altdest=0x5627a1c89478, qc=0x7fff4744aa60)
at ./build/../src/backend/tcop/pquery.c:768
#13 0x000056277557817e in PostgresMain () at ./build/../src/backend/tcop/postgres.c:2255
#14 0x0000562775573423 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ./build/../src/backend/tcop/backend_startup.c:105
#15 0x00005627754e366e in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7fff4744ad70 "", startup_data_len=startup_data_len@entry=4,
client_sock=client_sock@entry=0x7fff4744ad90) at ./build/../src/backend/postmaster/launch_backend.c:277
#16 0x00005627754e7229 in BackendStartup (client_sock=0x7fff4744ad90) at ./build/../src/backend/postmaster/postmaster.c:3593
#17 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1674
#18 0x00005627754e8dbd in PostmasterMain (argc=<optimized out>, argv=0x5627a1c82f10) at ./build/../src/backend/postmaster/postmaster.c:1372
#19 0x0000562775212df0 in main (argc=5, argv=0x5627a1c82f10) at ./build/../src/backend/main/main.c:197

Ubuntu version:

[Mon Nov 11](09:58)# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Extensions:

[cid:b86570c3-6fa1-4ba8-8fbd-ad92bdec55c5]

For Packages I attached a file apt-list-installed.txt.

Thank you.

Best regards, Lubo

________________________________
From: Tomas Vondra <tomas@vondra.me>
Sent: Monday, 11 November 2024 14:48
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; Peter Geoghegan <pg@bowt.ie>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 10:30, Ľuboslav Špilák wrote:

Hello.

After creating new database cluster (5433) in Postgresql 17 there was no
problem with calling function
select * from brin_page_items(
get_raw_page(

On the pg_upgraded cluster I got this backtrace on sigsegv. Is this
helpful or do I need to include any more information?

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x5627a1df38c0,
values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file
or directory.
(gdb) bt
#0 0x00005627752205d5 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x5627a1df38c0,
values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234

This is ... weird. heap_compute_data_size literally didn't change for
the last 9 years, so it's the same for 12 and 17. Line 234 is this:

Size
heap_compute_data_size(TupleDesc tupleDesc,
const Datum *values,
const bool *isnull)
{
Size data_length = 0;
int i;
int numberOfAttributes = tupleDesc->natts;

for (i = 0; i < numberOfAttributes; i++)
{
Datum val;
Form_pg_attribute atti;

if (isnull[i])
continue;

val = values[i];
atti = TupleDescAttr(tupleDesc, i);

if (ATT_IS_PACKABLE(atti) &&
VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))

Could you please print variables "i", "numberOfAttributes" and then also
the contents of tupleDesc and atti?

print i
print numberOfAttributes
print *tupleDesc
print *atti

regards

--
Tomas Vondra

________________________________

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#12)

Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 15:22, Ľuboslav Špilák wrote:

Hello.

Could you maybe try on a completely new 17.0 cluster, not one that went
through pg_upgrade? I don't think pg_upgrade should cause anything like
this, but it'd be good to conclusively rule that out by reproducing the
issue on a fresh cluster.

We can't reproduce the problem on a completely new 17.0 cluster.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x5627a1e1eea0,
values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file
or directory.
*(gdb) print i*
*$1 = 6*
*(gdb) print numberOfAttributes*
*$2 = <optimized out>*
*(gdb) print *tupleDesc*
*$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr
= 0x0, attrs = 0x5627a1e1eeb8}*
*(gdb) print *atti*
*$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58
times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1,
atttypmod = -1, attndims = 0, attbyval = false,*
* attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000',
attnotnull = false, atthasdef = false, atthasmissing = false,
attidentity = 0 '\000', attgenerated = 0 '\000',*
* attisdropped = false, attislocal = true, attinhcount = 0,
attcollation = 100}*

OK, this is really weird - the index you created clearly has just 1
attribute, but this descriptor says there are 7. Which means it likely
accesses garbage outside the actual BRIN tuple - not surprising it
crashes on that.

That tuple descriptor however looks sane, so my guess is you actually
have multiple indexes with the same relname, in a different schemas. And
this finds the wrong one first. That would also explain why it only
happens on an upgraded cluster - the new one won't have the other
indexes, of course.

What does

SELECT * FROM pg_class WHERE relname = 'test_idxbrin';

say? My bet is it'll return multiple rows, one of which will have 7
attributes.

If this is the case, it's not a bug - as Peter explained, there are some
basic sanity checks, but there's not enough info to check everything. If
you pass a page as bytea with a mismatching index, segfault is expected
(even if unfortunate). It's a power tool - if you hold it wrong, you may
get injured.

One solution is to use fully qualified name of the index, including the
schema. Or always set the search_path.

regards

--
Tomas Vondra

#14

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Tomas Vondra (#13)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

I had similar ly created table in a different schema, so there were truly 2 rows in the given select (but the 2nd one was created to test the problem), so even after removing one of them the problem still persists.

select * from pg_class where relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128187015,test_idxbrin,2200,0,0,10,3580,1128187015,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

So we removed one of the tables with this index and now this select returned one row

select * from pg_class where relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

Then we called the problematic function again and it crashed.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x5627a1db6a50, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file or directory.
(gdb) bt
#0 0x00005627752205d5 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x5627a1db6a50, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
#1 0x0000562775221e4f in heap_form_minimal_tuple (tupleDescriptor=0x5627a1db6a50, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:1492
#2 0x00005627756f0e45 in tuplestore_putvalues (state=0x5627a1db6e58, tdesc=<optimized out>, values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/utils/sort/tuplestore.c:756
#3 0x00007fc7e2d8e9eb in brin_page_items (fcinfo=<optimized out>) at ./build/../contrib/pageinspect/brinfuncs.c:300
#4 0x00005627753d435c in ExecMakeTableFunctionResult (setexpr=0x5627a1dac480, econtext=0x5627a1dac368, argContext=<optimized out>, expectedDesc=0x5627a1dad5f0, randomAccess=false)
at ./build/../src/backend/executor/execSRF.c:234
#5 0x00005627753e527a in FunctionNext (node=node@entry=0x5627a1dac160) at ./build/../src/backend/executor/nodeFunctionscan.c:93
#6 0x00005627753d4df9 in ExecScanFetch (recheckMtd=0x5627753e4f50 <FunctionRecheck>, accessMtd=0x5627753e4f80 <FunctionNext>, node=0x5627a1dac160)
at ./build/../src/backend/executor/execScan.c:131
#7 ExecScan (node=0x5627a1dac160, accessMtd=0x5627753e4f80 <FunctionNext>, recheckMtd=0x5627753e4f50 <FunctionRecheck>) at ./build/../src/backend/executor/execScan.c:180
#8 0x00005627753cb7bb in ExecProcNode (node=0x5627a1dac160) at ./build/../src/include/executor/executor.h:274
#9 ExecutePlan (execute_once=<optimized out>, dest=0x5627a1c89478, direction=<optimized out>, numberTuples=200, sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x5627a1dac160, estate=0x5627a1dabf48) at ./build/../src/backend/executor/execMain.c:1648
#10 standard_ExecutorRun (queryDesc=0x5627a1cdf700, direction=<optimized out>, count=200, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:365
#11 0x000056277557966e in PortalRunSelect (portal=0x5627a1d43188, forward=<optimized out>, count=200, dest=<optimized out>) at ./build/../src/backend/tcop/pquery.c:924
#12 0x000056277557a9b6 in PortalRun (portal=0x5627a1d43188, count=200, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x5627a1c89478, altdest=0x5627a1c89478,
qc=0x7fff4744aa60) at ./build/../src/backend/tcop/pquery.c:768
#13 0x000056277557817e in PostgresMain () at ./build/../src/backend/tcop/postgres.c:2255
#14 0x0000562775573423 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ./build/../src/backend/tcop/backend_startup.c:105
#15 0x00005627754e366e in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7fff4744ad70 "",
startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7fff4744ad90) at ./build/../src/backend/postmaster/launch_backend.c:277
#16 0x00005627754e7229 in BackendStartup (client_sock=0x7fff4744ad90) at ./build/../src/backend/postmaster/postmaster.c:3593
#17 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1674
#18 0x00005627754e8dbd in PostmasterMain (argc=<optimized out>, argv=0x5627a1c82f10) at ./build/../src/backend/postmaster/postmaster.c:1372
#19 0x0000562775212df0 in main (argc=5, argv=0x5627a1c82f10) at ./build/../src/backend/main/main.c:197
(gdb) print i
$1 = 6
(gdb) print numberOfAttributes
$2 = <optimized out>
(gdb) print *tupleDesc
$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr = 0x0, attrs = 0x5627a1db6a68}
(gdb) print *atti
$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58 times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1, atttypmod = -1, attndims = 0,
attbyval = false, attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000', attnotnull = false, atthasdef = false, atthasmissing = false, attidentity = 0 '\000',
attgenerated = 0 '\000', attisdropped = false, attislocal = true, attinhcount = 0, attcollation = 100}
(gdb) print val
$5 = 0
(gdb) print values[7]
$6 = 94728219299776
(gdb)

The whole cluster was pg_upgraded from pg12 to pg17 with two databases (postgres and xtimeseries). I tried it again. I created test tables with unique brin index name and only xtimeseries database has problem - sigsegv.
[cid:2f0b5f63-6992-47ff-b79e-6fcb29a8cd2e]

Do you have any other idea what may cause this problem?
Thank you,

Best regards, Lubo

________________________________
From: Tomas Vondra <tomas@vondra.me>
Sent: Monday, 11 November 2024 15:40
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; Peter Geoghegan <pg@bowt.ie>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 15:22, Ľuboslav Špilák wrote:

Hello.

Could you maybe try on a completely new 17.0 cluster, not one that went
through pg_upgrade? I don't think pg_upgrade should cause anything like
this, but it'd be good to conclusively rule that out by reproducing the
issue on a fresh cluster.

We can't reproduce the problem on a completely new 17.0 cluster.

Program received signal SIGSEGV, Segmentation fault.
0x00005627752205d5 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x5627a1e1eea0,
values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448)
at ./build/../src/backend/access/common/heaptuple.c:234
234 ./build/../src/backend/access/common/heaptuple.c: No such file
or directory.
*(gdb) print i*
*$1 = 6*
*(gdb) print numberOfAttributes*
*$2 = <optimized out>*
*(gdb) print *tupleDesc*
*$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr
= 0x0, attrs = 0x5627a1e1eeb8}*
*(gdb) print *atti*
*$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58
times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1,
atttypmod = -1, attndims = 0, attbyval = false,*
* attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000',
attnotnull = false, atthasdef = false, atthasmissing = false,
attidentity = 0 '\000', attgenerated = 0 '\000',*
* attisdropped = false, attislocal = true, attinhcount = 0,
attcollation = 100}*

What does

SELECT * FROM pg_class WHERE relname = 'test_idxbrin';

say? My bet is it'll return multiple rows, one of which will have 7
attributes.

One solution is to use fully qualified name of the index, including the
schema. Or always set the search_path.

regards

--
Tomas Vondra

________________________________

#15

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 1 year ago

In reply to: Ľuboslav Špilák (#14)

Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 16:20, Ľuboslav Špilák wrote:

Hello.

I had similar ly created table in a different schema, so there were
truly 2 rows in the given select (but the 2^nd one was created to test
the problem), so even after removing one of them the problem still
persists.

*select* * *from* pg_class *where* relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128187015,test_idxbrin,2200,0,0,10,3580,1128187015,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

So we removed one of the tables with this index and now this select
returned one row

*select* * *from* pg_class *where* relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

Then we called the problematic function again and it crashed.

Ah, I see. I've been looking at this assuming the descriptor is for the
index, when in fact it's for the result, which actually has more
attributes (so my comment about the index having just 1 attribute was
misguided).

But now I noticed an interesting thing - if I print the descriptor in
heap_compute_data_size, I get this:

(gdb) p *tupleDesc
$1 = {natts = 8, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr
= 0x0, attrs = 0xb2d29b0}

There's 8 attributes, not 7 (which is what you get).

Well, the reason is likely pretty simple - I'd bet you have pageinspect
at version 1.11 (or older), which didn't know about empty ranges. And
1.12 added that, and the C code dutifully fills that. But the descriptor
is derived from the function signature, and that doesn't have that
attribute. So it tries to interpret 0 (=false) as a pointer, and that
just segfaults.

If you do \dx (or select * from pg_extension), what version you get for
pageinspect? And if you do "\df brin_page_items" does it have "empty" as
one of the output arguments?

You can try "alter extension pageinspect update" to update the function
signatures, etc. That should make the segfault go away.

I can reproduce this by installing pageinspect 1.11 and running the
brin_page_items() query. What a stupid bug, I should have thought about
this when adding the "empty" field.

Thanks for the report!

regards

--
Tomas Vondra

#16

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Tomas Vondra (#15)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

Yes we have the old version 1.7 as I sent before:

[Image]

I will try your recommendations tomorrow.
Thank you very much for your help.

Best regards, Lubo
________________________________
From: Tomas Vondra <tomas@vondra.me>
Sent: Monday, November 11, 2024 5:22:13 PM
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; Peter Geoghegan <pg@bowt.ie>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

On 11/11/24 16:20, Ľuboslav Špilák wrote:

Hello.

I had similar ly created table in a different schema, so there were
truly 2 rows in the given select (but the 2^nd one was created to test
the problem), so even after removing one of them the problem still
persists.

*select* * *from* pg_class *where* relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128187015,test_idxbrin,2200,0,0,10,3580,1128187015,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

So we removed one of the tables with this index and now this select
returned one row

*select* * *from* pg_class *where* relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

Then we called the problematic function again and it crashed.

But now I noticed an interesting thing - if I print the descriptor in
heap_compute_data_size, I get this:

(gdb) p *tupleDesc
$1 = {natts = 8, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr
= 0x0, attrs = 0xb2d29b0}

There's 8 attributes, not 7 (which is what you get).

If you do \dx (or select * from pg_extension), what version you get for
pageinspect? And if you do "\df brin_page_items" does it have "empty" as
one of the output arguments?

You can try "alter extension pageinspect update" to update the function
signatures, etc. That should make the segfault go away.

I can reproduce this by installing pageinspect 1.11 and running the
brin_page_items() query. What a stupid bug, I should have thought about
this when adding the "empty" field.

Thanks for the report!

regards

--
Tomas Vondra

________________________________

#17

Ľuboslav Špilák

lspilak@microstep-hdo.sk

over 1 year ago

In reply to: Ľuboslav Špilák (#16)

Re: Segmentation fault - PostgreSQL 17.0

Hello.

Updating pageinspect helped. The function is not crashing anymore.

Before update pageinspect

[Tue Nov 12](09:06)# su postgres
postgres@hdoppxendb1:/home/ladmin$ psql -d xtimeseries
psql (17.0 (Ubuntu 17.0-1.pgdg20.04+1))
Type "help" for help.

xtimeseries=# \conninfo
You are connected to database "xtimeseries" as user "postgres" via socket in "/v ar/run/postgresql" at port "5432".
xtimeseries=# \df brin_page_items
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------+------------------+---------------------+------
(0 rows)

xtimeseries=# select * from pg_extension;
oid | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-------+-------------+----------+--------------+----------------+------------+-----------+--------------
13515 | plpgsql | 10 | 11 | f | 1.0 | |
16831 | pg_repack | 10 | 16830 | f | 1.5.1 | |
16833 | pageinspect | 10 | 16830 | t | 1.7 | |
(3 rows)

After update pageinspect

[Tue Nov 12](09:11)# su postgres
postgres@hdoppxendb1:/home/ladmin$ psql -d xtimeseries
psql (17.0 (Ubuntu 17.0-1.pgdg20.04+1))
Type "help" for help.

xtimeseries=# \df brin_page_items
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------+------------------+---------------------+------
(0 rows)

xtimeseries=#

Thank you very much.

Best regards, Lubo
________________________________
From: Ľuboslav Špilák <lspilak@microstep-hdo.sk>
Sent: Monday, 11 November 2024 18:11
To: Tomas Vondra <tomas@vondra.me>; Peter Geoghegan <pg@bowt.ie>
Cc: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0

Hello.

Yes we have the old version 1.7 as I sent before:

[Image]

I will try your recommendations tomorrow.
Thank you very much for your help.

On 11/11/24 16:20, Ľuboslav Špilák wrote:

Hello.

I had similar ly created table in a different schema, so there were
truly 2 rows in the given select (but the 2^nd one was created to test
the problem), so even after removing one of them the problem still
persists.

*select* * *from* pg_class *where* relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128187015,test_idxbrin,2200,0,0,10,3580,1128187015,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

So we removed one of the tables with this index and now this select
returned one row

*select* * *from* pg_class *where* relname='test_idxbrin';
"oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound"
1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32},

Then we called the problematic function again and it crashed.

But now I noticed an interesting thing - if I print the descriptor in
heap_compute_data_size, I get this:

(gdb) p *tupleDesc
$1 = {natts = 8, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr
= 0x0, attrs = 0xb2d29b0}

There's 8 attributes, not 7 (which is what you get).

If you do \dx (or select * from pg_extension), what version you get for
pageinspect? And if you do "\df brin_page_items" does it have "empty" as
one of the output arguments?

You can try "alter extension pageinspect update" to update the function
signatures, etc. That should make the segfault go away.

I can reproduce this by installing pageinspect 1.11 and running the
brin_page_items() query. What a stupid bug, I should have thought about
this when adding the "empty" field.

Thanks for the report!

regards

--
Tomas Vondra

________________________________

Segmentation fault - PostgreSQL 17.0

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: