BUG #18966: Invalid SQL queries hang indefinitely until server restart

Started by PG Bug reporting form12 months ago4 messagesbugs

noreply@postgresql.org

12 months ago

The following bug has been logged on the website:

Bug reference: 18966
Logged by: Jérémy S
Email address: jeremy.spriet@gmail.com
PostgreSQL version: 16.9
Operating system: Docker debian
Description:

Hello,
I’m running PostgreSQL 16.9 in Docker (PostgreSQL 16.9 (Debian
16.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc 12.2.0, 64-bit),
and intermittently observe that invalid SQL queries hang indefinitely
instead of immediately returning an error.
Valid queries continue to work normally and return results right away.
I usually issue statements via a Python client library, but when the issue
occurs, I see the same blocking behavior connecting directly with psql—the
session simply stalls and never reports syntax errors, “relation does not
exist,” or any other error.
Only a full restart of the Docker container unblocks all sessions.
This has happened 2–3 times over the past six months, but I haven’t been
able to reproduce it on demand.
When a query hangs, even running: SELECT pg_cancel_backend(pid); or SELECT
pg_terminate_backend(pid); does not unblock it. Only restarting the
container helps.
Does anyone have ideas on what could be causing this behavior?
My postgresql.conf:
listen_addresses = '*'
max_prepared_transactions = 0
shared_buffers = 2000MB
temp_buffers = 8MB
max_connections = 5010
superuser_reserved_connections = 2
shared_preload_libraries = 'timescaledb,pg_cron,pg_stat_statements'
cron.database_name = 'db1'
pg_stat_statements.max = 5000
pg_stat_statements.track = top
Thanks!

Tom Lane

tgl@sss.pgh.pa.us

12 months ago

In reply to: PG Bug reporting form (#1)

Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart

PG Bug reporting form <noreply@postgresql.org> writes:

I’m running PostgreSQL 16.9 in Docker (PostgreSQL 16.9 (Debian
16.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc 12.2.0, 64-bit),
and intermittently observe that invalid SQL queries hang indefinitely
instead of immediately returning an error.
Valid queries continue to work normally and return results right away.

What exactly is "invalid" about the troublesome queries? What does
pg_stat_activity show them as doing? When the problem is occurring,
can you run EXPLAIN (*not* EXPLAIN ANALYZE) on them, or does that
hang too?

It's not very difficult to write SQL queries that would run for
unreasonable lengths of time, eg by leaving out a critical join
restriction. I'm wondering if something like that is happening
here. You've not provided nearly enough info to do more than
speculate, however.

When a query hangs, even running: SELECT pg_cancel_backend(pid); or SELECT
pg_terminate_backend(pid); does not unblock it.

Hmm, that's unusual (and a strike against my theory above).
Again, pg_stat_activity output would be helpful here.

My postgresql.conf:
listen_addresses = '*'
max_prepared_transactions = 0
shared_buffers = 2000MB
temp_buffers = 8MB
max_connections = 5010

That max_connections value seems fairly insane. It doesn't look like
you've provisioned a setup that is anywhere near capable of supporting
so many sessions. I'd knock it down by at least a factor of 10.

shared_preload_libraries = 'timescaledb,pg_cron,pg_stat_statements'

I don't suppose you can take timescaledb out of the mix to see
if it's contributing to this.

regards, tom lane

Laurenz Albe

laurenz.albe@cybertec.at

12 months ago

In reply to: Tom Lane (#2)

Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart

On Tue, 2025-06-24 at 15:09 -0400, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

I’m running PostgreSQL 16.9 in Docker (PostgreSQL 16.9 (Debian
16.9-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc 12.2.0, 64-bit),
and intermittently observe that invalid SQL queries hang indefinitely
instead of immediately returning an error.
Valid queries continue to work normally and return results right away.

You've not provided nearly enough info to do more than
speculate, however.

Yes. One speculation that occurs to me is that the control group settings
of your container are such that a process that exceeds the memory quota is
suspended until enough memory becomes free.

Yours,
Laurenz Albe

Jérémy Spriet

jeremy.spriet@gmail.com

12 months ago

In reply to: Laurenz Albe (#3)

Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart

What exactly is "invalid" about the troublesome queries? What does
pg_stat_activity show them as doing? When the problem is occurring,
can you run EXPLAIN (*not* EXPLAIN ANALYZE) on them, or does that
hang too?

Yes, they hang when I run EXPLAIN on them, too.

Here is a 'select * from pg_stat_activity' in CSV format when the problem
occurred:
datid,datname,pid,leader_pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start,xact_start,query_start,state_change,wait_event_type,wait_event,state,backend_xid,backend_xmin,query_id,query,backend_type
16384,mydb,61,,10,role_superadmin,pg_cron scheduler,,,,2025-06-25
08:15:40.365377+00,,,,Extension,Extension,,,,,,pg_cron launcher
,,60,,,,,,,,2025-06-25
08:15:40.365266+00,,,,Activity,AutoVacuumMain,,,,,,autovacuum launcher
,,63,,10,role_superadmin,,,,,2025-06-25
08:15:40.366251+00,,,,Activity,LogicalLauncherMain,,,,,,logical replication
launcher
,,62,,10,role_superadmin,TimescaleDB Background Worker
Launcher,,,,2025-06-25
08:15:40.365899+00,,,,Extension,Extension,,,,,,TimescaleDB Background
Worker Launcher
16384,mydb,69,,10,role_superadmin,,127.0.0.1,,49760,2025-06-25
08:15:40.518269+00,,2025-06-25 08:15:40.539926+00,2025-06-25
08:15:40.539959+00,Client,ClientRead,idle,,,-8817007929064609608,SELECT
pg_advisory_unlock($1),client backend
16384,mydb,168,,16386,role_admin,,,,-1,2025-06-25
08:37:46.868834+00,2025-06-25 08:44:11.558469+00,2025-06-25
08:44:11.558469+00,2025-06-25
08:44:11.558472+00,,,active,,,3749380189022910195,ROLLBACK,client backend
5,postgres,78,,16385,role_monitoring,,127.0.0.1,,49766,2025-06-25
08:15:42.344303+00,,2025-06-25 13:08:42.355812+00,2025-06-25
13:08:42.355985+00,Client,ClientRead,idle,,,4452344162736673220,SELECT *
FROM pg_stat_database_conflicts;,client backend
5,postgres,79,,16385,role_monitoring,,127.0.0.1,,49770,2025-06-25
08:15:42.345051+00,,2025-06-25 13:08:42.34076+00,2025-06-25
13:08:42.347132+00,Client,ClientRead,idle,,,8133081727313365291,"SELECT
pg_database.datname
,pg_database_size(pg_database.datname)
FROM pg_database;",client backend
16384,mydb,207,,16386,role_admin,,,,-1,2025-06-25
08:55:00.037971+00,2025-06-25 08:55:15.378432+00,2025-06-25
08:55:15.378432+00,2025-06-25
08:55:15.378433+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,719,,16386,role_admin,,,,-1,2025-06-25
11:17:57.910472+00,2025-06-25 11:17:57.932789+00,2025-06-25
11:17:57.94765+00,2025-06-25
11:17:57.947651+00,,,active,,323299,1708853975061214229,SELECT
pg_cancel_backend(0);,client backend
16384,mydb,320,,16386,role_admin,,,,-1,2025-06-25
09:44:53.568006+00,2025-06-25 09:44:53.597946+00,2025-06-25
09:44:53.939852+00,2025-06-25 09:44:53.939856+00,,,active,,323299,,"WITH
infos_base AS (
SELECT DISTINCT aca, region, departement, libelle_nature
FROM contexte.comptage
WHERE uai = 'S105806'
LIMIT 1
),
derniere_annee AS (
SELECT MAX(annee_sc) AS annee
FROM contexte.comptage
WHERE uai = 'S105806'
)

-- uai_base
SELECT 'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc

UNION

-- uai_comp
SELECT 'uai_comp' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN dernier",client backend
16384,mydb,321,,16386,role_admin,,,,-1,2025-06-25
09:45:01.856679+00,2025-06-25 09:47:47.979227+00,2025-06-25
09:47:48.046105+00,2025-06-25 09:47:48.046109+00,,,active,,323299,,"SELECT
'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.nb_peoples) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
LIMIT 1001",client backend
16384,mydb,781,,16386,role_admin,,,,-1,2025-06-25
11:46:45.062895+00,2025-06-25 11:46:45.082754+00,2025-06-25
11:46:45.096873+00,2025-06-25
11:46:45.096874+00,,,active,,,6650430479887907887,BEGIN,client backend
16384,mydb,789,,16386,role_admin,,,,-1,2025-06-25
11:50:10.987069+00,2025-06-25 11:50:15.903804+00,2025-06-25
11:50:15.903804+00,2025-06-25
11:50:15.903807+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,914,,10,role_superadmin,psql,,,-1,2025-06-25
12:29:28.685433+00,2025-06-25 12:31:49.087451+00,2025-06-25
12:31:49.087451+00,2025-06-25 12:31:49.087454+00,,,active,,323329,,select *
from bad_table;,client backend
,,53,,,,,,,,2025-06-25
08:15:40.338819+00,,,,Activity,BgWriterHibernate,,,,,,background writer
,,52,,,,,,,,2025-06-25 08:15:40.337776+00,,,,,,,,,,,checkpointer
,,59,,,,,,,,2025-06-25
08:15:40.364229+00,,,,Activity,WalWriterMain,,,,,,walwriter

I don't suppose you can take timescaledb out of the mix to see
if it's contributing to this.

I will check, but it is not possible to do so easily.

Yes. One speculation that occurs to me is that the control group settings
of your container are such that a process that exceeds the memory quota is
suspended until enough memory becomes free.

The RAM on my Docker was 1GB/4GB when the problem occurred. However, after
I continued to run queries, the memory usage increased to 4GB.

Jérémy