weird long time query
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;
Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388
I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.
What's this query and what shall I do for it?
Best Wishes
Kaijiang
út 17. 12. 2019 v 11:45 odesílatel Kaijiang Chen <chenkaijiang@gmail.com>
napsal:
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;
I think so this query is weird - probably this query was finished
you should to use constraint
WHERE state <> 'idle';
Regards
Pavel
Show quoted text
Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.What's this query and what shall I do for it?
Best Wishes
Kaijiang
Kaijiang Chen <chenkaijiang@gmail.com> writes:
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;
Don't know where you got this query from, but it's wrong for any PG
version more recent than (I think) 9.1. We don't use "<IDLE>" as an
indicator of idle sessions anymore; rather, those can be identified
by having state = 'idle'. What's in the query column for such a session
is its last query.
Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388
It's not running. That was the last query it ran, back in November :-(
You could zap the session with pg_terminate_backend(), but
pg_cancel_backend() is not going to have any effect because there's
no active query.
regards, tom lane
I think I should also report it as a bug since logically, it couldn't exist.
On Wed, Dec 18, 2019 at 1:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Kaijiang Chen <chenkaijiang@gmail.com> writes:
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;Don't know where you got this query from, but it's wrong for any PG
version more recent than (I think) 9.1. We don't use "<IDLE>" as an
indicator of idle sessions anymore; rather, those can be identified
by having state = 'idle'. What's in the query column for such a session
is its last query.Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388It's not running. That was the last query it ran, back in November :-(
You could zap the session with pg_terminate_backend(), but
pg_cancel_backend() is not going to have any effect because there's
no active query.regards, tom lane
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;
Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388
I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.
What's this query and what shall I do for it?
I think it is a bug since logically, this query should be gone.
Best Wishes
Kaijiang
Hello,I encountered into this kernel message, and I cannot login into the Linux system anymore:
Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120 seconds.Dec 17 23:02:30 hq-pg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Dec 17 23:02:30 hq-pg kernel: sync D ffff965ebabd1040 0 6573 6572 0x00000080Dec 17 23:02:30 hq-pg kernel: Call Trace:Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ? generic_write_sync+0x70/0x70
After some google I guess it's the problem that IO speed is low, while the insert requests are coming too much quickly.So PG put these into cache first then kernel called sync.I know I can queue the requests, so that POSTGRES will not accept these requests which will result in an increase in system cache.But is there any way I can tell POSTGRES, that you can only handle 20000 records per second, or 4M per second, please don't accept inserts more than that speed.For me, POSTGRES just waiting is much better than current behavior.
Any help will be much appreciated.
Thanks,James
On Wed, Dec 18, 2019 at 11:25:32AM +0800, Kaijiang Chen wrote:
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.What's this query and what shall I do for it?
I think it is a bug since logically, this query should be gone.
It's not a bug. Most likely this backend is not doing anything.
You're using old way to check if backend is working - current_query <>
'<IDLE>';
Check: select * from pg_stat_activity where pid = 32638
Most likely you'll see state = 'idle'
In such cases, query just shows last executed query, not currently
running one.
Also - WHY are you calling internal pg* functions directly, instead of
using pg_stat_activity view?
Best regards,
depesz
Thanks!
I learn the SQL from the web. pg views should be better.
BTW, I got the similar result (still see that proc) with "select * from
pg_stat_activity":
backend_start | 2019-11-25 16:27:05.103901+08
xact_start |
query_start | 2019-11-25 16:29:29.529318+08
state_change | 2019-11-25 16:29:29.529344+08
waiting | f
state | idle
backend_xid |
backend_xmin |
query | DEALLOCATE pdo_stmt_00000388
Looks not very nice :-)
On Wed, Dec 18, 2019 at 10:06 PM hubert depesz lubaczewski <
depesz@depesz.com> wrote:
Show quoted text
On Wed, Dec 18, 2019 at 11:25:32AM +0800, Kaijiang Chen wrote:
I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.What's this query and what shall I do for it?
I think it is a bug since logically, this query should be gone.
It's not a bug. Most likely this backend is not doing anything.
You're using old way to check if backend is working - current_query <>
'<IDLE>';Check: select * from pg_stat_activity where pid = 32638
Most likely you'll see state = 'idle'
In such cases, query just shows last executed query, not currently
running one.Also - WHY are you calling internal pg* functions directly, instead of
using pg_stat_activity view?Best regards,
depesz
On Thu, Dec 19, 2019 at 12:14:26AM +0800, Kaijiang Chen wrote:
Thanks!
I learn the SQL from the web. pg views should be better.
BTW, I got the similar result (still see that proc) with "select * from
pg_stat_activity":backend_start | 2019-11-25 16:27:05.103901+08
xact_start |
query_start | 2019-11-25 16:29:29.529318+08
state_change | 2019-11-25 16:29:29.529344+08
waiting | f
state | idle
backend_xid |
backend_xmin |
query | DEALLOCATE pdo_stmt_00000388Looks not very nice :-)
not sure what you mean by not nice.
As you can clearly see the backend is *NOT* running anything (state is
idle).
Value in "query" column is simply last query that it ran. It *finished*
running this query at 2019-11-25 16:29:29.529344+08.
So your app is keeping connection open. It's not Pg problem or a bug.
Best regards,
depesz
hubert depesz lubaczewski <depesz@depesz.com> writes:
On Thu, Dec 19, 2019 at 12:14:26AM +0800, Kaijiang Chen wrote:
BTW, I got the similar result (still see that proc) with "select * from
pg_stat_activity":
...
state | idle
...
query | DEALLOCATE pdo_stmt_00000388Looks not very nice :-)
not sure what you mean by not nice.
That's a feature not a bug (and yes, the behavior is documented).
People requested that the view continue to display the last query
of an idle session. IIRC, the main argument was that otherwise
it's hard to tell apart a bunch of idle sessions.
If you don't like it, you can always do something like
case when state = idle then null else query end
regards, tom lane
On Wed, 18 Dec 2019 17:53:26 +0800
"James(王旭)" <wangxu@gu360.com> wrote:
Hello,I encountered into this kernel message, and I cannot login into
the Linux system anymore:Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17
23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120
seconds.Dec 17 23:02:30 hq-pg kernel: "echo 0
> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.Dec 17 23:02:30 hq-pg kernel: sync
D ffff965ebabd1040 0
6573 6572 0x00000080Dec 17 23:02:30 hq-pg kernel: Call
Trace:Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ?
generic_write_sync+0x70/0x70 After some google I guess it's the
problem that IO speed is low, while the insert requests are coming
too much quickly.So PG put these into cache first then kernel called
sync.I know I can queue the requests, so that POSTGRES will not
accept these requests which will result in an increase in system
cache.But is there any way I can tell POSTGRES, that you can only
handle 20000 records per second, or 4M per second, please don't
accept inserts more than that speed.For me, POSTGRES just waiting is
much better than current behavior. Any help will be much appreciated.
There isn't one magic-bullet solution for this. It may be that you can
tune Linux, PG, or the filesystem to handle the load more
gracefully; or that you just need more hardware. Streaming inserts might
be better batched and handled via synchronous ETL than pushed in at
random, at that point you can control the resources.
One approach might be tighter timeouts on the server or client, which
will leave the queries failing when the queue gets too high. That
frees up resources on the server, at the obvious expense of having
transactions roll back. On the other hand, you can end up with
timeouts so tight that you end up thrashing, which doesn't help the
problem.
Catch from this end is that without more informaton on the system
you are dealing with there isn't any quick-and-dirty fix.
I'd suggest looking over:
<https://duckduckgo.com/?q=linux+postgres+tuning&t=ffab&ia=web>
for suggestions and seeing which ones work or don't. If you have
more specific questions on the parameters or how to evaluate the
stats PG is keeping feel free to ask them here, but you will need
to be specific as to the stats and situation in which they were
acquired so that people have enough context to give you a reasonable
answer.
--
Steven Lembark 3646 Flora Place
Workhorse Computing St. Louis, MO 63110
lembark@wrkhors.com +1 888 359 3508
On Wed, Dec 18, 2019 at 3:53 AM James(王旭) <wangxu@gu360.com> wrote:
Hello,
I encountered into this kernel message, and I cannot login into the Linux system anymore:
Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1
Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120 seconds.
Dec 17 23:02:30 hq-pg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 17 23:02:30 hq-pg kernel: sync D ffff965ebabd1040 0 6573 6572 0x00000080
Dec 17 23:02:30 hq-pg kernel: Call Trace:
Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ? generic_write_sync+0x70/0x70
After some google I guess it's the problem that IO speed is low, while the insert requests are coming too much quickly.So PG put these into cache first then kernel called sync.
I know I can queue the requests, so that POSTGRES will not accept these requests which will result in an increase in system cache.
But is there any way I can tell POSTGRES, that you can only handle 20000 records per second, or 4M per second, please don't accept inserts more than that speed.
For me, POSTGRES just waiting is much better than current behavior.
Any help will be much appreciated.
This is more a problem with the o/s than with postgres itself.
synchronous_commit is one influential parameter that can possibly help
mitigate the issue with some safety tradeoffs (read the docs). For
linux, one possible place to look is tuning dirty_background_ratio and
related parameters. The idea is you want the o/s to be more
aggressive about syncing to reduce the impact of i/o storm; basically
you are trading off some burst performance for consistency of
performance. Another place to look is checkpoint behavior. Do some
searches, there is tons of information about this on the net.
merlin
On Wed, Dec 18, 2019 at 4:53 AM James(王旭) <wangxu@gu360.com> wrote:
Hello,
I encountered into this kernel message, and I cannot login into the Linux
system anymore:Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1
Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than
120 seconds.
Dec 17 23:02:30 hq-pg kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 17 23:02:30 hq-pg kernel: sync D ffff965ebabd1040 0
6573 6572 0x00000080
Dec 17 23:02:30 hq-pg kernel: Call Trace:
Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ?
generic_write_sync+0x70/0x70
After some google I guess it's the problem that IO speed is low, while
the insert requests are coming too much quickly.So PG put these into cache
first then kernel called sync
Could you expand on what you found in the googling, with links? I've never
seen these in my kernel log, and I don't know what they mean other than the
obvious that it is something to do with IO. Also, what kernel and file
system are you using?
.
I know I can queue the requests, so that POSTGRES will not accept these
requests which will result in an increase in system cache.
But is there any way I can tell POSTGRES, that you can only handle 20000
records per second, or 4M per second, please don't accept inserts more than
that speed.For me, POSTGRES just waiting is much better than current behavior.
I don't believe there is a setting from within PostgreSQL to do this.
There was a proposal for a throttle on WAL generation back in February, but
with no recent discussion or (visible) progress:
/messages/by-id/2B42AB02-03FC-406B-B92B-18DED2D8D491@anarazel.de
I think the real answer here to get a better IO system, or maybe a better
kernel. Otherwise, once you find a painful workaround for one symptom you
will just smack into another one.
Cheers,
Jeff