weird long time query

Started by Kaijiang Chenover 6 years ago13 messagesbugsgeneral

chenkaijiang@gmail.com

over 6 years ago

bugsgeneral

I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.

What's this query and what shall I do for it?

Best Wishes
Kaijiang

Pavel Stehule

pavel.stehule@gmail.com

over 6 years ago

In reply to: Kaijiang Chen (#1)

bugsgeneral

Re: weird long time query

út 17. 12. 2019 v 11:45 odesílatel Kaijiang Chen <chenkaijiang@gmail.com>
napsal:

I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;

I think so this query is weird - probably this query was finished

you should to use constraint

WHERE state <> 'idle';

Regards

Pavel

Show quoted text

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.

What's this query and what shall I do for it?

Best Wishes
Kaijiang

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Kaijiang Chen (#1)

bugsgeneral

Re: weird long time query

Kaijiang Chen <chenkaijiang@gmail.com> writes:

I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;

Don't know where you got this query from, but it's wrong for any PG
version more recent than (I think) 9.1. We don't use "<IDLE>" as an
indicator of idle sessions anymore; rather, those can be identified
by having state = 'idle'. What's in the query column for such a session
is its last query.

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

It's not running. That was the last query it ran, back in November :-(
You could zap the session with pg_terminate_backend(), but
pg_cancel_backend() is not going to have any effect because there's
no active query.

regards, tom lane

Kaijiang Chen

chenkaijiang@gmail.com

over 6 years ago

In reply to: Tom Lane (#3)

bugsgeneral

Re: weird long time query

I think I should also report it as a bug since logically, it couldn't exist.

On Wed, Dec 18, 2019 at 1:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Kaijiang Chen <chenkaijiang@gmail.com> writes:

I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;

Don't know where you got this query from, but it's wrong for any PG
version more recent than (I think) 9.1. We don't use "<IDLE>" as an
indicator of idle sessions anymore; rather, those can be identified
by having state = 'idle'. What's in the query column for such a session
is its last query.

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

It's not running. That was the last query it ran, back in November :-(
You could zap the session with pg_terminate_backend(), but
pg_cancel_backend() is not going to have any effect because there's
no active query.

regards, tom lane

Kaijiang Chen

chenkaijiang@gmail.com

over 6 years ago

In reply to: Kaijiang Chen (#1)

bugsgeneral

Fwd: weird long time query

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.

What's this query and what shall I do for it?

I think it is a bug since logically, this query should be gone.

Best Wishes
Kaijiang

James(王旭)

wangxu@gu360.com

over 6 years ago

In reply to: Kaijiang Chen (#4)

bugsgeneral

How to prevent POSTGRES killing linux system from accepting too much inserts?

Hello,I encountered into this kernel message, and I cannot login into the Linux system anymore:

Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120 seconds.Dec 17 23:02:30 hq-pg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Dec 17 23:02:30 hq-pg kernel: sync            D ffff965ebabd1040     0  6573   6572 0x00000080Dec 17 23:02:30 hq-pg kernel: Call Trace:Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ? generic_write_sync+0x70/0x70
After some google I guess it's the problem that IO speed is low, while the insert requests are coming too much quickly.So PG put these into cache first then kernel called sync.I know I can queue the requests, so that POSTGRES will not accept these requests which will result in an increase in system cache.But is there any way I can tell POSTGRES, that you can only handle 20000 records per second, or 4M per second, please don't accept inserts more than that speed.For me, POSTGRES just waiting is much better than current behavior.
Any help will be much appreciated.

Thanks,James

hubert depesz lubaczewski

depesz@depesz.com

over 6 years ago

In reply to: Kaijiang Chen (#5)

bugsgeneral

Re: Fwd: weird long time query

On Wed, Dec 18, 2019 at 11:25:32AM +0800, Kaijiang Chen wrote:

I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.

What's this query and what shall I do for it?

I think it is a bug since logically, this query should be gone.

It's not a bug. Most likely this backend is not doing anything.

You're using old way to check if backend is working - current_query <>
'<IDLE>';

Check: select * from pg_stat_activity where pid = 32638

Most likely you'll see state = 'idle'

In such cases, query just shows last executed query, not currently
running one.

Also - WHY are you calling internal pg* functions directly, instead of
using pg_stat_activity view?

Best regards,

depesz

Kaijiang Chen

chenkaijiang@gmail.com

over 6 years ago

In reply to: hubert depesz lubaczewski (#7)

bugsgeneral

Re: Fwd: weird long time query

Thanks!
I learn the SQL from the web. pg views should be better.
BTW, I got the similar result (still see that proc) with "select * from
pg_stat_activity":

Looks not very nice :-)

On Wed, Dec 18, 2019 at 10:06 PM hubert depesz lubaczewski <
depesz@depesz.com> wrote:

Show quoted text

On Wed, Dec 18, 2019 at 11:25:32AM +0800, Kaijiang Chen wrote:

I'm using postgres 9.4.17 on centos 7.
I check the running queries with the following SQL:
SELECT
procpid,
start,
now() - start AS lap,
current_query
FROM
(SELECT
backendid,
pg_stat_get_backend_pid(S.backendid) AS procpid,
pg_stat_get_backend_activity_start(S.backendid) AS start,
pg_stat_get_backend_activity(S.backendid) AS current_query
FROM
(SELECT pg_stat_get_backend_idset() AS backendid) AS S
) AS S
WHERE
current_query <> '<IDLE>'
ORDER BY
lap DESC;

Then, I found a SQL that has run for some days (and still running):
procpid | 32638
start | 2019-11-25 16:29:29.529318+08
lap | 21 days 18:24:54.707369
current_query | DEALLOCATE pdo_stmt_00000388

I tried to kill it with: SELECT pg_cancel_backend(32638) but it takes no
effects.

What's this query and what shall I do for it?

I think it is a bug since logically, this query should be gone.

It's not a bug. Most likely this backend is not doing anything.

You're using old way to check if backend is working - current_query <>
'<IDLE>';

Check: select * from pg_stat_activity where pid = 32638

Most likely you'll see state = 'idle'

In such cases, query just shows last executed query, not currently
running one.

Also - WHY are you calling internal pg* functions directly, instead of
using pg_stat_activity view?

Best regards,

depesz

hubert depesz lubaczewski

depesz@depesz.com

over 6 years ago

In reply to: Kaijiang Chen (#8)

bugsgeneral

Re: Fwd: weird long time query

On Thu, Dec 19, 2019 at 12:14:26AM +0800, Kaijiang Chen wrote:

Thanks!
I learn the SQL from the web. pg views should be better.
BTW, I got the similar result (still see that proc) with "select * from
pg_stat_activity":

backend_start | 2019-11-25 16:27:05.103901+08
xact_start |
query_start | 2019-11-25 16:29:29.529318+08
state_change | 2019-11-25 16:29:29.529344+08
waiting | f
state | idle
backend_xid |
backend_xmin |
query | DEALLOCATE pdo_stmt_00000388

Looks not very nice :-)

not sure what you mean by not nice.

As you can clearly see the backend is *NOT* running anything (state is
idle).

Value in "query" column is simply last query that it ran. It *finished*
running this query at 2019-11-25 16:29:29.529344+08.

So your app is keeping connection open. It's not Pg problem or a bug.

Best regards,

depesz

#10

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: hubert depesz lubaczewski (#9)

bugsgeneral

Re: Fwd: weird long time query

hubert depesz lubaczewski <depesz@depesz.com> writes:

On Thu, Dec 19, 2019 at 12:14:26AM +0800, Kaijiang Chen wrote:

BTW, I got the similar result (still see that proc) with "select * from
pg_stat_activity":
...
state | idle
...
query | DEALLOCATE pdo_stmt_00000388

Looks not very nice :-)

not sure what you mean by not nice.

That's a feature not a bug (and yes, the behavior is documented).
People requested that the view continue to display the last query
of an idle session. IIRC, the main argument was that otherwise
it's hard to tell apart a bunch of idle sessions.

If you don't like it, you can always do something like

case when state = idle then null else query end

regards, tom lane

#11

Steven Lembark

lembark@wrkhors.com

over 6 years ago

In reply to: James(王旭) (#6)

bugsgeneral

Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

On Wed, 18 Dec 2019 17:53:26 +0800
"James(王旭)" <wangxu@gu360.com> wrote:

Hello,I encountered into this kernel message, and I cannot login into
the Linux system anymore:

Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1Dec 17
23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120
seconds.Dec 17 23:02:30 hq-pg kernel: "echo 0
> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.Dec 17 23:02:30 hq-pg kernel: sync       
    D ffff965ebabd1040     0 
6573   6572 0x00000080Dec 17 23:02:30 hq-pg kernel: Call
Trace:Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ?
generic_write_sync+0x70/0x70 After some google I guess it's the
problem that IO speed is low, while the insert requests are coming
too much quickly.So PG put these into cache first then kernel called
sync.I know I can queue the requests, so that POSTGRES will not
accept these requests which will result in an increase in system
cache.But is there any way I can tell POSTGRES, that you can only
handle 20000 records per second, or 4M per second, please don't
accept inserts more than that speed.For me, POSTGRES just waiting is
much better than current behavior. Any help will be much appreciated.

There isn't one magic-bullet solution for this. It may be that you can
tune Linux, PG, or the filesystem to handle the load more
gracefully; or that you just need more hardware. Streaming inserts might
be better batched and handled via synchronous ETL than pushed in at
random, at that point you can control the resources.

One approach might be tighter timeouts on the server or client, which
will leave the queries failing when the queue gets too high. That
frees up resources on the server, at the obvious expense of having
transactions roll back. On the other hand, you can end up with
timeouts so tight that you end up thrashing, which doesn't help the
problem.

Catch from this end is that without more informaton on the system
you are dealing with there isn't any quick-and-dirty fix.

I'd suggest looking over:

<https://duckduckgo.com/?q=linux+postgres+tuning&t=ffab&ia=web>

for suggestions and seeing which ones work or don't. If you have
more specific questions on the parameters or how to evaluate the
stats PG is keeping feel free to ask them here, but you will need
to be specific as to the stats and situation in which they were
acquired so that people have enough context to give you a reasonable
answer.

--
Steven Lembark 3646 Flora Place
Workhorse Computing St. Louis, MO 63110
lembark@wrkhors.com +1 888 359 3508

#12

Merlin Moncure

mmoncure@gmail.com

over 6 years ago

In reply to: James(王旭) (#6)

bugsgeneral

Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

On Wed, Dec 18, 2019 at 3:53 AM James(王旭) <wangxu@gu360.com> wrote:

Hello,

I encountered into this kernel message, and I cannot login into the Linux system anymore:

Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1

Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than 120 seconds.

Dec 17 23:02:30 hq-pg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Dec 17 23:02:30 hq-pg kernel: sync D ffff965ebabd1040 0 6573 6572 0x00000080

Dec 17 23:02:30 hq-pg kernel: Call Trace:

Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ? generic_write_sync+0x70/0x70

After some google I guess it's the problem that IO speed is low, while the insert requests are coming too much quickly.So PG put these into cache first then kernel called sync.

I know I can queue the requests, so that POSTGRES will not accept these requests which will result in an increase in system cache.

But is there any way I can tell POSTGRES, that you can only handle 20000 records per second, or 4M per second, please don't accept inserts more than that speed.

For me, POSTGRES just waiting is much better than current behavior.

Any help will be much appreciated.

This is more a problem with the o/s than with postgres itself.

synchronous_commit is one influential parameter that can possibly help
mitigate the issue with some safety tradeoffs (read the docs). For
linux, one possible place to look is tuning dirty_background_ratio and
related parameters. The idea is you want the o/s to be more
aggressive about syncing to reduce the impact of i/o storm; basically
you are trading off some burst performance for consistency of
performance. Another place to look is checkpoint behavior. Do some
searches, there is tons of information about this on the net.

merlin

#13

Jeff Janes

jeff.janes@gmail.com

over 6 years ago

In reply to: James(王旭) (#6)

bugsgeneral

Re: How to prevent POSTGRES killing linux system from accepting too much inserts?

On Wed, Dec 18, 2019 at 4:53 AM James(王旭) <wangxu@gu360.com> wrote:

Hello,

I encountered into this kernel message, and I cannot login into the Linux
system anymore:

Dec 17 23:01:50 hq-pg kernel: sh (6563): drop_caches: 1

Dec 17 23:02:30 hq-pg kernel: INFO: task sync:6573 blocked for more than

120 seconds.

Dec 17 23:02:30 hq-pg kernel: "echo 0 >

/proc/sys/kernel/hung_task_timeout_secs" disables this message.

Dec 17 23:02:30 hq-pg kernel: sync D ffff965ebabd1040 0

6573 6572 0x00000080

Dec 17 23:02:30 hq-pg kernel: Call Trace:

Dec 17 23:02:30 hq-pg kernel: [<ffffffffa48760a0>] ?

generic_write_sync+0x70/0x70

After some google I guess it's the problem that IO speed is low, while
the insert requests are coming too much quickly.So PG put these into cache
first then kernel called sync

Could you expand on what you found in the googling, with links? I've never
seen these in my kernel log, and I don't know what they mean other than the
obvious that it is something to do with IO. Also, what kernel and file
system are you using?

.

I know I can queue the requests, so that POSTGRES will not accept these

requests which will result in an increase in system cache.

But is there any way I can tell POSTGRES, that you can only handle 20000

records per second, or 4M per second, please don't accept inserts more than
that speed.

For me, POSTGRES just waiting is much better than current behavior.

I don't believe there is a setting from within PostgreSQL to do this.

There was a proposal for a throttle on WAL generation back in February, but
with no recent discussion or (visible) progress:

/messages/by-id/2B42AB02-03FC-406B-B92B-18DED2D8D491@anarazel.de

I think the real answer here to get a better IO system, or maybe a better
kernel. Otherwise, once you find a painful workaround for one symptom you
will just smack into another one.

Cheers,

Jeff

Show quoted text