Enquiry about long-running queries

Started by Ashish Mukherjee5 months ago6 messagesgeneral
Jump to latest
#1Ashish Mukherjee
ashish.mukherjee@gmail.com

Hello,

I have a query like this showing up on my production database -

s05=> SELECT pid, user, usename, application_name, client_addr,
client_hostname, client_port, datname, now() - query_start as "runtime",
state, wait_event_type, wait_event,
substr(query, 0, 100)
FROM pg_stat_activity
WHERE now() - query_start > '5 minutes'::interval and state = 'active'
ORDER BY runtime DESC;
pid | user | usename | application_name | client_addr |
client_hostname | client_port | datname | runtime
| state | wait_event_type | wait_event |
substr
-------+--------+-----------------+----------------+--------------------------------------------------------------------
356274 | s05 | s05 | scandir | 192.168.64.61 |
| 44098 | s05 | 9 days 18:45:37
.65577 | active | IPC | ParallelFinish | select scac_code from
scac where supported_by_smc = true

The query when run from psql prompt finishes in a jiffy, so query
performance/cost is not the problem. Also, when I try to kill the query
through pg_terminate_backend or pg_cancel_backend, it does not get killed.

I am wondering what could be the root cause of this problem and how it
could be addressed. Any pointers would be appreciated.

Regards,
Ashish

#2Ron
ronljohnsonjr@gmail.com
In reply to: Ashish Mukherjee (#1)
Re: Enquiry about long-running queries

On Fri, Nov 7, 2025 at 8:49 AM Ashish Mukherjee <ashish.mukherjee@gmail.com>
wrote:

Hello,

I have a query like this showing up on my production database -

s05=> SELECT pid, user, usename, application_name, client_addr,
client_hostname, client_port, datname, now() - query_start as "runtime",
state, wait_event_type, wait_event,
substr(query, 0, 100)
FROM pg_stat_activity
WHERE now() - query_start > '5 minutes'::interval and state = 'active'
ORDER BY runtime DESC;
pid | user | usename | application_name | client_addr |
client_hostname | client_port | datname | runtime
| state | wait_event_type | wait_event |
substr

-------+--------+-----------------+----------------+--------------------------------------------------------------------
356274 | s05 | s05 | scandir | 192.168.64.61 |
| 44098 | s05 | 9 days 18:45:37
.65577 | active | IPC | ParallelFinish | select scac_code from
scac where supported_by_smc = true

The query when run from psql prompt finishes in a jiffy, so query
performance/cost is not the problem. Also, when I try to kill the query
through pg_terminate_backend or pg_cancel_backend, it does not get killed.

If it runs in a jiffy, *do you have time* to kill it before it finishes?

I am wondering what could be the root cause of this problem and how it
could be addressed. Any pointers would be appreciated.

What problem? The query existing, or not being able to kill a query that
finishes before you have time to kill it?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Ashish Mukherjee (#1)
Re: Enquiry about long-running queries

On Fri, 2025-11-07 at 19:18 +0530, Ashish Mukherjee wrote:

I have a query like this showing up on my production database - 

s05=> SELECT pid, user, usename, application_name, client_addr, client_hostname, client_port, datname, now() - query_start as "runtime",
state, wait_event_type, wait_event,
substr(query, 0, 100)
 FROM  pg_stat_activity
 WHERE now() - query_start > '5 minutes'::interval and state = 'active'
 ORDER BY runtime DESC;
   pid   | user | usename | application_name |  client_addr  | client_hostname | client_port | datname |        runtime
       | state  | wait_event_type |   wait_event   |                                               substr
-------+--------+-----------------+----------------+--------------------------------------------------------------------
  356274 | s05  | s05     | scandir          | 192.168.64.61 |                 |       44098 | s05     | 9 days 18:45:37
.65577 | active | IPC             | ParallelFinish | select scac_code from scac where supported_by_smc = true

The query when run from psql prompt finishes in a jiffy, so query performance/cost is not the problem.
Also, when I try to kill the query through pg_terminate_backend or pg_cancel_backend, it does not get killed.

I am wondering what could be the root cause of this problem and how it could be addressed.  Any pointers would be appreciated.

That is strange. The wait event means that the backend is waiting for parallel
workers to finish. But any existing parallel worker processes would also have
to show up in the query result.

On what operating system does PostgreSQL run?

What exact version is it?

You could try to "strace" the backend process (or use an equivalent tool, if
you are not on Linux) and see if the process issues any system calls.
To see what's going on, you'd have to attach to the backend process with a
debugger and take a stack trace.

Yours,
Laurenz Albe

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ashish Mukherjee (#1)
Re: Enquiry about long-running queries

On 11/7/25 05:48, Ashish Mukherjee wrote:

Hello,

I have a query like this showing up on my production database -

The query when run from psql prompt finishes in a jiffy, so query

What client is the stuck query being run from?

Regards,
Ashish

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Ashish Mukherjee
ashish.mukherjee@gmail.com
In reply to: Laurenz Albe (#3)
Re: Enquiry about long-running queries

pgsql 17.5 on Red Hat Enterprise Linux release 8.10 (Ootpa)

The query is run from one of our php applications.

On Fri, Nov 7, 2025 at 8:59 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Show quoted text

On Fri, 2025-11-07 at 19:18 +0530, Ashish Mukherjee wrote:

I have a query like this showing up on my production database -

s05=> SELECT pid, user, usename, application_name, client_addr,

client_hostname, client_port, datname, now() - query_start as "runtime",

state, wait_event_type, wait_event,
substr(query, 0, 100)
FROM pg_stat_activity
WHERE now() - query_start > '5 minutes'::interval and state = 'active'
ORDER BY runtime DESC;
pid | user | usename | application_name | client_addr |

client_hostname | client_port | datname | runtime

| state | wait_event_type | wait_event |

substr

-------+--------+-----------------+----------------+--------------------------------------------------------------------

356274 | s05 | s05 | scandir | 192.168.64.61 |

| 44098 | s05 | 9 days 18:45:37

.65577 | active | IPC | ParallelFinish | select scac_code

from scac where supported_by_smc = true

The query when run from psql prompt finishes in a jiffy, so query

performance/cost is not the problem.

Also, when I try to kill the query through pg_terminate_backend or

pg_cancel_backend, it does not get killed.

I am wondering what could be the root cause of this problem and how it

could be addressed. Any pointers would be appreciated.

That is strange. The wait event means that the backend is waiting for
parallel
workers to finish. But any existing parallel worker processes would also
have
to show up in the query result.

On what operating system does PostgreSQL run?

What exact version is it?

You could try to "strace" the backend process (or use an equivalent tool,
if
you are not on Linux) and see if the process issues any system calls.
To see what's going on, you'd have to attach to the backend process with a
debugger and take a stack trace.

Yours,
Laurenz Albe

#6Greg Sabino Mullane
greg@turnstep.com
In reply to: Ashish Mukherjee (#5)
Re: Enquiry about long-running queries

Wow, 9 days is a long time. The fact that it is not responding to
pg_terminate_backend means it's not even calling CHECK_FOR_INTERRUPTS
inside the WaitForParallelWorkersToFinish loop, so something is seriously
wrong. If the PID is still there, please do what Laurenz suggested:

strace -tp 356274

Is the problem reproducible, and has it happened more than once?

Cheers,
Greg