PostgreSQL 15.5 stops processing user queries

Started by Andrey Zhidenkovover 2 years ago3 messagesgeneral
Jump to latest
#1Andrey Zhidenkov
pensnarik@gmail.com

Hello all,

We have encountered an issue with our PostgreSQL 15.5 installation. The
problem is that
PostgreSQL server periodically falls into a state when it accepts new
connections but
doesn't execute any queries. The session which runs a query hangs and
cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter
how we connect to
the database - both TCP and unix socket sessions hang but it seems that
existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which
seemed to use an
existing connection to the database).

Here is a stack trace from gdb for one of the frozen sessions:

(gdb) bt 10
#0 0x00007f6d31dbd378 in poll () from /lib64/libc.so.6
#1 0x00007f6d3286aee1 in pqSocketCheck.part.2 () from
/usr/pgsql-15/lib/libpq.so.5
#2 0x00007f6d3286b054 in pqWaitTimed () from /usr/pgsql-15/lib/libpq.so.5
#3 0x00007f6d32867848 in PQgetResult () from /usr/pgsql-15/lib/libpq.so.5
#4 0x0000000000411320 in ExecQueryAndProcessResults
(query=query@entry=0x23a68b0
"select 1;", elapsed_msec=elapsed_msec@entry=0x7ffc2b5840a8,
svpt_gone_p=svpt_gone_p@entry=0x7ffc2b5840a7,
is_watch=is_watch@entry=false, opt=opt@entry=0x0,
printQueryFout=printQueryFout@entry=0x0) at common.c:1426
#5 0x000000000040feb9 in SendQuery (query=0x23a68b0 "select 1;") at
common.c:1117
#6 0x000000000040627b in main (argc=<optimized out>, argv=<optimized out>)
at startup.c:384

We're using glibc-2.28-236.0.1.el8.7.x86_64 on this machine and PostgreSQL
15.5:

postgres=# select version();
version

---------------------------------------------------------------------------------------------------------
PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-20), 64-bit
(1 row)

We've tried to recreate a cluster from scratch using a logical dump on new
hardware but it didn't
help though. And unfortunately we could not reproduce the issue, it looks
like it occurs randomly and
when it happens only PostgreSQL restart helps. Also we have number of
machines that run the
same version of PostgreSQL but we have the problem only with one cluster so
maybe it somehow
related to queries that are specific to this cluster.

We also run patroni 2.1.4 on this cluster, for reference (I'm not sure if
it can be related). We checked PostgreSQL logs,
of course - there are no any messages that could be related to the issue as
well.

We will really appreciate any help, thanks!

--
With best regards, Andrei Zhidenkov.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Zhidenkov (#1)
Re: PostgreSQL 15.5 stops processing user queries

Andrey Zhidenkov <pensnarik@gmail.com> writes:

We have encountered an issue with our PostgreSQL 15.5 installation. The
problem is that
PostgreSQL server periodically falls into a state when it accepts new
connections but
doesn't execute any queries. The session which runs a query hangs and
cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter
how we connect to
the database - both TCP and unix socket sessions hang but it seems that
existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which
seemed to use an
existing connection to the database).

Try using an existing session to capture pg_stat_activity and pg_locks
information about the stuck session(s).

Here is a stack trace from gdb for one of the frozen sessions:

This appears to be a trace of a psql process waiting for a query
result. It won't teach you much about what the server is doing.

regards, tom lane

#3Andrey Zhidenkov
pensnarik@gmail.com
In reply to: Tom Lane (#2)
Re: PostgreSQL 15.5 stops processing user queries

Sorry, wrong stack trace. Here is a correct one:

(gdb) bt
#0 0x00007f9acca1bdf6 in do_futex_wait.constprop () from
/lib64/libpthread.so.0
#1 0x00007f9acca1bee8 in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
#2 0x00000000007815a2 in PGSemaphoreLock ()
#3 0x00000000007fca6c in LWLockAcquire ()
#4 0x00007f9ac5c385e6 in pgsm_store () from
/usr/pgsql-15/lib/pg_stat_monitor.so
#5 0x00007f9ac5c39a6b in pgsm_ExecutorEnd () from
/usr/pgsql-15/lib/pg_stat_monitor.so
#6 0x0000000000658d91 in PortalCleanup ()
#7 0x000000000095b264 in PortalDrop ()
#8 0x000000000080c6df in exec_simple_query ()
#9 0x000000000080de22 in PostgresMain ()
#10 0x000000000078db80 in ServerLoop ()
#11 0x000000000078eb34 in PostmasterMain ()
#12 0x000000000050474d in main ()

It’s seems to be a problem in pg_stat_monitor extension. The similar issue
is described here (in Chinese):
https://blog.csdn.net/qq_43687755/article/details/117592635

On Thu, 21 Dec 2023, 18:32 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Andrey Zhidenkov <pensnarik@gmail.com> writes:

We have encountered an issue with our PostgreSQL 15.5 installation. The
problem is that
PostgreSQL server periodically falls into a state when it accepts new
connections but
doesn't execute any queries. The session which runs a query hangs and
cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No

matter

how we connect to
the database - both TCP and unix socket sessions hang but it seems that
existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which
seemed to use an
existing connection to the database).

Try using an existing session to capture pg_stat_activity and pg_locks
information about the stuck session(s).

Here is a stack trace from gdb for one of the frozen sessions:

This appears to be a trace of a psql process waiting for a query
result. It won't teach you much about what the server is doing.

regards, tom lane