BUG #18974: Postgresql repeatable crash after pg_upgrade from 15 to 17.5 version in postgresql_fdw queries
The following bug has been logged on the website:
Bug reference: 18974
Logged by: Maxim Boguk
Email address: maxim.boguk@gmail.com
PostgreSQL version: 17.5
Operating system: Ubuntu
Description:
Postgresql repeatable crash after pg_upgrade from 15 to 17.5 version in
postgresql_fdw timeouted (via query_timeout) queries
Backtrace data from core file:
Core was generated by `postgres: 17/main: **.app **_data [local] SELECT
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
warning: 314 ../sysdeps/x86_64/multiarch/strcmp-evex.S: No such file or
directory
(gdb) bt
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
#1 0x0000780820fd2df7 in emitHostIdentityInfo (conn=0x5d89b19f6d80,
host_addr=0x7ffd34c50c70 "10.100.103.4") at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:2128
#2 0x0000780820fd8a0f in PQconnectPoll (conn=conn@entry=0x5d89b19f6d80) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:3038
#3 0x0000780820fda44d in pqConnectDBStart (conn=0x5d89b19f6d80) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:2446
#4 0x0000780820fda4e2 in PQcancelStart
(cancelConn=cancelConn@entry=0x5d89b19f6d80) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-cancel.c:198
#5 0x000078082102401d in libpqsrv_cancel (conn=conn@entry=0x5d89b1785870,
endtime=endtime@entry=804720445571683) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/include/libpq/libpq-be-fe-helpers.h:399
#6 0x00007808210244ae in pgfdw_cancel_query_begin
(conn=conn@entry=0x5d89b1785870, endtime=endtime@entry=804720445571683) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../contrib/postgres_fdw/connection.c:1353
#7 0x0000780821027f5e in pgfdw_cancel_query (conn=0x5d89b1785870) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../contrib/postgres_fdw/connection.c:1336
#8 pgfdw_abort_cleanup (entry=entry@entry=0x5d89b17dad78,
toplevel=toplevel@entry=true) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../contrib/postgres_fdw/connection.c:1666
#9 0x000078082102874d in pgfdw_xact_callback (event=XACT_EVENT_ABORT,
arg=<optimized out>) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../contrib/postgres_fdw/connection.c:1044
#10 0x00005d89b06cb51a in CallXactCallbacks (event=XACT_EVENT_ABORT) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/access/transam/xact.c:3796
#11 AbortTransaction () at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/access/transam/xact.c:2903
#12 0x00005d89b06ccd38 in AbortCurrentTransactionInternal () at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/access/transam/xact.c:3515
#13 AbortCurrentTransaction () at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/access/transam/xact.c:3393
#14 0x00005d89b0a0e40f in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/tcop/postgres.c:4482
#15 0x00005d89b0a04eff in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/tcop/backend_startup.c:105
#16 0x00005d89b0965376 in postmaster_child_launch (child_type=B_BACKEND,
startup_data=0x7ffd34c518d0 "", startup_data_len=4,
client_sock=0x7ffd34c518f0) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/postmaster/launch_backend.c:277
#17 0x00005d89b0bfe911 in postmaster_child_launch
(client_sock=0x7ffd34c518f0, startup_data_len=4, startup_data=0x7ffd34c518d0
"", child_type=B_BACKEND) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/postmaster/postmaster.c:3558
#18 BackendStartup (client_sock=0x7ffd34c518f0) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/postmaster/postmaster.c:3594
#19 ServerLoop.isra.0 () at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/postmaster/postmaster.c:1676
#20 0x00005d89b0970965 in PostmasterMain (argc=<optimized out>,
argv=<optimized out>) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/postmaster/postmaster.c:1374
#21 0x00005d89b0616d2d in main (argc=5, argv=0x5d89b16525d0) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/backend/main/main.c:199
Regards,
Maxim
On Wed, Jul 2, 2025 at 1:09 AM PG Bug reporting form <noreply@postgresql.org>
wrote:
The following bug has been logged on the website:
Bug reference: 18974
Logged by: Maxim Boguk
Email address: maxim.boguk@gmail.com
PostgreSQL version: 17.5
Operating system: Ubuntu
Description:Postgresql repeatable crash after pg_upgrade from 15 to 17.5 version in
postgresql_fdw timeouted (via query_timeout) queries
Backtrace data from core file:
Core was generated by `postgres: 17/main: **.app **_data [local] SELECT
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
warning: 314 ../sysdeps/x86_64/multiarch/strcmp-evex.S: No such file or
directory
(gdb) bt
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
#1 0x0000780820fd2df7 in emitHostIdentityInfo (conn=0x5d89b19f6d80,
host_addr=0x7ffd34c50c70 "10.100.103.4") at
(gdb) print ((PGconn *)0x5d89b19f6d80)->connhost[0]
$14 = {type = CHT_HOST_NAME, host = 0x0, hostaddr = 0x5d89b19c53b0
"10.100.103.4", port = 0x5d89b19c5390 "6503", password = 0x0}
As a result displayed_host = conn
<https://doxygen.postgresql.org/streamutil_8c.html#af4516154f33e07be1eadff88fab71465>
->connhost
<https://doxygen.postgresql.org/structpg__conn.html#af613581f3bb3ef9a64acf0346c3cd92b>
[conn
<https://doxygen.postgresql.org/streamutil_8c.html#af4516154f33e07be1eadff88fab71465>
->whichhost
<https://doxygen.postgresql.org/structpg__conn.html#aef258b7f6a1d241b2fad5728aa08a1ef>
].host
<https://doxygen.postgresql.org/structpg__conn__host.html#a112de1e777da00724075fe6f65aaf3be>
=
0x0
and crash in line strcmp(displayed_host, host_addr) != 0
related FDW definition:
FDW options | ( dbname '****', hostaddr '10.100.103.4', port
'6503')
--
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
On Wed, Jul 2, 2025 at 3:03 AM Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Wed, Jul 2, 2025 at 1:09 AM PG Bug reporting form <
noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 18974
Logged by: Maxim Boguk
Email address: maxim.boguk@gmail.com
PostgreSQL version: 17.5
Operating system: Ubuntu
Description:Postgresql repeatable crash after pg_upgrade from 15 to 17.5 version in
postgresql_fdw timeouted (via query_timeout) queries
Backtrace data from core file:
Core was generated by `postgres: 17/main: **.app **_data [local] SELECT
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
warning: 314 ../sysdeps/x86_64/multiarch/strcmp-evex.S: No such file or
directory
(gdb) bt
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
#1 0x0000780820fd2df7 in emitHostIdentityInfo (conn=0x5d89b19f6d80,
host_addr=0x7ffd34c50c70 "10.100.103.4") at(gdb) print ((PGconn *)0x5d89b19f6d80)->connhost[0]
$14 = {type = CHT_HOST_NAME, host = 0x0, hostaddr = 0x5d89b19c53b0
"10.100.103.4", port = 0x5d89b19c5390 "6503", password = 0x0}
As a result displayed_host = conn
<https://doxygen.postgresql.org/streamutil_8c.html#af4516154f33e07be1eadff88fab71465>
->connhost
<https://doxygen.postgresql.org/structpg__conn.html#af613581f3bb3ef9a64acf0346c3cd92b>
[conn
<https://doxygen.postgresql.org/streamutil_8c.html#af4516154f33e07be1eadff88fab71465>
->whichhost
<https://doxygen.postgresql.org/structpg__conn.html#aef258b7f6a1d241b2fad5728aa08a1ef>
].host
<https://doxygen.postgresql.org/structpg__conn__host.html#a112de1e777da00724075fe6f65aaf3be> =
0x0
and crash in line strcmp(displayed_host, host_addr) != 0related FDW definition:
FDW options | ( dbname '****', hostaddr '10.100.103.4', port
'6503')
related part of backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
(gdb) bt
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
#1 0x0000780820fd2df7 in emitHostIdentityInfo (conn=0x5d89b19f6d80,
host_addr=0x7ffd34c50c70 "10.100.103.4") at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:2128
#2 0x0000780820fd8a0f in PQconnectPoll (conn=conn@entry=0x5d89b19f6d80) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:3038
#3 0x0000780820fda44d in pqConnectDBStart (conn=0x5d89b19f6d80) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:2446
#4 0x0000780820fda4e2 in PQcancelStart
(cancelConn=cancelConn@entry=0x5d89b19f6d80)
at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-cancel.c:198
#5 0x000078082102401d in libpqsrv_cancel (conn=conn@entry=0x5d89b1785870,
endtime=endtime@entry=804720445571683) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/include/libpq/libpq-be-fe-helpers.h:399
in libpqsrv_cancel conn have:
(gdb) print ((PGconn *)0x5d89b1785870)->connhost[0]
$14 = {type = CHT_HOST_ADDRESS, host = 0x0, hostaddr = 0x5d89b16cde00
"10.100.103.4", port = 0x5d89b16cddc0 "6503", password = 0x0}
new connection in PQcancelStart already have wrong type:
(gdb) print ((PGconn *)0x5d89b19f6d80)->connhost[0]
$15 = {type = CHT_HOST_NAME, host = 0x0, hostaddr = 0x5d89b19c53b0
"10.100.103.4", port = 0x5d89b19c5390 "6503", password = 0x0}
--
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
On Wed, Jul 2, 2025 at 9:35 AM Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Wed, Jul 2, 2025 at 3:03 AM Maxim Boguk <maxim.boguk@gmail.com> wrote:
On Wed, Jul 2, 2025 at 1:09 AM PG Bug reporting form <
noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 18974
Logged by: Maxim Boguk
Email address: maxim.boguk@gmail.com
PostgreSQL version: 17.5
Operating system: Ubuntu
Description:Postgresql repeatable crash after pg_upgrade from 15 to 17.5 version in
postgresql_fdw timeouted (via query_timeout) queries
Backtrace data from core file:
Core was generated by `postgres: 17/main: **.app **_data [local] SELECT
'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
warning: 314 ../sysdeps/x86_64/multiarch/strcmp-evex.S: No such file
or
directory
(gdb) bt
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
#1 0x0000780820fd2df7 in emitHostIdentityInfo (conn=0x5d89b19f6d80,
host_addr=0x7ffd34c50c70 "10.100.103.4") at(gdb) print ((PGconn *)0x5d89b19f6d80)->connhost[0]
$14 = {type = CHT_HOST_NAME, host = 0x0, hostaddr = 0x5d89b19c53b0
"10.100.103.4", port = 0x5d89b19c5390 "6503", password = 0x0}
As a result displayed_host = conn
<https://doxygen.postgresql.org/streamutil_8c.html#af4516154f33e07be1eadff88fab71465>
->connhost
<https://doxygen.postgresql.org/structpg__conn.html#af613581f3bb3ef9a64acf0346c3cd92b>
[conn
<https://doxygen.postgresql.org/streamutil_8c.html#af4516154f33e07be1eadff88fab71465>
->whichhost
<https://doxygen.postgresql.org/structpg__conn.html#aef258b7f6a1d241b2fad5728aa08a1ef>
].host
<https://doxygen.postgresql.org/structpg__conn__host.html#a112de1e777da00724075fe6f65aaf3be> =
0x0
and crash in line strcmp(displayed_host, host_addr) != 0related FDW definition:
FDW options | ( dbname '****', hostaddr '10.100.103.4', port
'6503')related part of backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314(gdb) bt
#0 __strcmp_evex () at ../sysdeps/x86_64/multiarch/strcmp-evex.S:314
#1 0x0000780820fd2df7 in emitHostIdentityInfo (conn=0x5d89b19f6d80,
host_addr=0x7ffd34c50c70 "10.100.103.4") at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:2128
#2 0x0000780820fd8a0f in PQconnectPoll (conn=conn@entry=0x5d89b19f6d80)
at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:3038
#3 0x0000780820fda44d in pqConnectDBStart (conn=0x5d89b19f6d80) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-connect.c:2446
#4 0x0000780820fda4e2 in PQcancelStart (cancelConn=cancelConn@entry=0x5d89b19f6d80)
at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/interfaces/libpq/fe-cancel.c:198
#5 0x000078082102401d in libpqsrv_cancel (conn=conn@entry=0x5d89b1785870,
endtime=endtime@entry=804720445571683) at
/usr/src/postgresql-17-17.5-1.pgdg24.04+1/build/../src/include/libpq/libpq-be-fe-helpers.h:399in libpqsrv_cancel conn have:
(gdb) print ((PGconn *)0x5d89b1785870)->connhost[0]
$14 = {type = CHT_HOST_ADDRESS, host = 0x0, hostaddr = 0x5d89b16cde00
"10.100.103.4", port = 0x5d89b16cddc0 "6503", password = 0x0}new connection in PQcancelStart already have wrong type:
(gdb) print ((PGconn *)0x5d89b19f6d80)->connhost[0]
$15 = {type = CHT_HOST_NAME, host = 0x0, hostaddr = 0x5d89b19c53b0
"10.100.103.4", port = 0x5d89b19c5390 "6503", password = 0x0}
As I understand the problem in PQcancelCreate() - which completely ignores
the existence of type in connhost structure.
As a result new connections got type=0 which maps on the first possible
value of
typedef enum pg_conn_host_type
{
CHT_HOST_NAME,
CHT_HOST_ADDRESS,
CHT_UNIX_SOCKET
} pg_conn_host_type;
--
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Hello
Yeah, I think there is a missing copy of the type field:
--- a/src/interfaces/libpq/fe-cancel.c
+++ b/src/interfaces/libpq/fe-cancel.c
@@ -119,6 +119,7 @@ PQcancelCreate(PGconn *conn)
goto oom_error;
originalHost = conn->connhost[conn->whichhost];
+ cancelConn->connhost[0].type = originalHost.type;
if (originalHost.host)
{
cancelConn->connhost[0].host = strdup(originalHost.host);
Other fields of the pg_conn_host structure are copied below in the code, excepts type.
regards, Sergei
Sergei Kornilov <sk@zsrv.org> writes:
Yeah, I think there is a missing copy of the type field:
--- a/src/interfaces/libpq/fe-cancel.c +++ b/src/interfaces/libpq/fe-cancel.c @@ -119,6 +119,7 @@ PQcancelCreate(PGconn *conn) goto oom_error;
originalHost = conn->connhost[conn->whichhost];
+ cancelConn->connhost[0].type = originalHost.type;
if (originalHost.host)
{
cancelConn->connhost[0].host = strdup(originalHost.host);
Other fields of the pg_conn_host structure are copied below in the code, excepts type.
Good catch!
For the archives: it's easy to reproduce this crash by modifying
the postgres_fdw regression tests, along the lines of
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index e534b40de3c..883dc669deb 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -8,7 +8,7 @@ CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
DO $d$
BEGIN
EXECUTE $$CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw
- OPTIONS (dbname '$$||current_database()||$$',
+ OPTIONS (hostaddr '127.0.0.1', dbname '$$||current_database()||$$',
port '$$||current_setting('port')||$$'
)$$;
EXECUTE $$CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
I'll see to fixing this.
regards, tom lane