backend hangs at sendto() and can't be terminated

Started by Jovalmost 13 years ago10 messagesgeneral

amutu@amutu.com

almost 13 years ago

one of our pg user send a select * from 10 million table without limit from
psql，before get the return data，he realize the mistake and quite from
psql.but after 2 hours,the sql still alive:

pg_terminate_backend return t but the backend still there.

strace the pid show the process hang at sendto() function call:
[postgres@xxx ~]$ strace -tv -p 8243
Process 8243 attached - interrupt to quit
17:02:26 sendto(10, "70804\0\0\0\f{2012070804}D\0\0\0`\0\5\0\0\0\1"...,
8152, 0, NULL, 0^C <unfinished ...>
Process 8243 detached

os is centos 6 x86-64,pg version is 9.1.9.

the process still there,how can I help to debug the problem?

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

Bill Mitchell

bill@publicrelay.com

almost 13 years ago

In reply to: Jov (#1)

Re: backend hangs at sendto() and can't be terminated

You can do select pg_cancel_backend(8243);

and that should terminate that process that is sending, but still leave
your postgres server healthy.

regards,
Bill

Show quoted text

On 7/8/13 5:31 AM, Jov wrote:

one of our pg user send a select * from 10 million table without limit
from psql,before get the return data,he realize the mistake and quite
from psql.but after 2 hours,the sql still alive:

postgres=# SELECT * from pg_stat_activity where procpid = 8243;
-[ RECORD 1 ]----+---------------------------------
datid | 758972
datname | xxx
procpid | 8243
usesysid | 661846
usename | test
application_name | psql
client_addr | 10.136.4.90
client_hostname |
client_port | 6382
backend_start | 2013-07-08 14:11:00.942293+08
xact_start | 2013-07-08 14:31:11.157681+08
query_start | 2013-07-08 14:31:11.157681+08
waiting | f
current_query | select * from yyyy;

pg_terminate_backend return t but the backend still there.

strace the pid show the process hang at sendto() function call:
[postgres@xxx ~]$ strace -tv -p 8243
Process 8243 attached - interrupt to quit
17:02:26 sendto(10,
"70804\0\0\0\f{2012070804}D\0\0\0`\0\5\0\0\0\1"..., 8152, 0, NULL, 0^C
<unfinished ...>
Process 8243 detached

os is centos 6 x86-64,pg version is 9.1.9.

the process still there,how can I help to debug the problem?

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

Jov

amutu@amutu.com

almost 13 years ago

In reply to: Bill Mitchell (#2)

Re: backend hangs at sendto() and can't be terminated

we do select pg_cancel_backend(8243) several times,but the backend still
hang there.

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

2013/7/8 Bill Mitchell <bill@publicrelay.com>

Show quoted text

You can do select pg_cancel_backend(8243);

and that should terminate that process that is sending, but still leave
your postgres server healthy.

regards,
Bill

On 7/8/13 5:31 AM, Jov wrote:

one of our pg user send a select * from 10 million table without limit
from psql，before get the return data，he realize the mistake and quite from
psql.but after 2 hours,the sql still alive:

postgres=# SELECT * from pg_stat_activity where procpid = 8243;
-[ RECORD 1 ]----+---------------------------------
datid | 758972
datname | xxx
procpid | 8243
usesysid | 661846
usename | test
application_name | psql
client_addr | 10.136.4.90
client_hostname |
client_port | 6382
backend_start | 2013-07-08 14:11:00.942293+08
xact_start | 2013-07-08 14:31:11.157681+08
query_start | 2013-07-08 14:31:11.157681+08
waiting | f
current_query | select * from yyyy;

pg_terminate_backend return t but the backend still there.

strace the pid show the process hang at sendto() function call:
[postgres@xxx ~]$ strace -tv -p 8243
Process 8243 attached - interrupt to quit
17:02:26 sendto(10, "70804\0\0\0\f{2012070804}D\0\0\0`\0\5\0\0\0\1"...,
8152, 0, NULL, 0^C <unfinished ...>
Process 8243 detached

os is centos 6 x86-64,pg version is 9.1.9.

the process still there,how can I help to debug the problem?

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

Bill Mitchell

bill@publicrelay.com

almost 13 years ago

In reply to: Jov (#3)

Re: backend hangs at sendto() and can't be terminated

Hmm..
In that case, I think that select pg_terminate_backend() might be in order?

http://www.postgresql.org/docs/9.1/static/functions-admin.html

regards,
Bill

Show quoted text

On 7/8/13 5:46 AM, Jov wrote:

we do select pg_cancel_backend(8243) several times,but the backend
still hang there.

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

2013/7/8 Bill Mitchell <bill@publicrelay.com
<mailto:bill@publicrelay.com>>

You can do select pg_cancel_backend(8243);

and that should terminate that process that is sending, but still
leave your postgres server healthy.

regards,
Bill

On 7/8/13 5:31 AM, Jov wrote:

one of our pg user send a select * from 10 million table without
limit from psql，before get the return data，he realize the
mistake and quite from psql.but after 2 hours,the sql still alive:

postgres=# SELECT * from pg_stat_activity where procpid = 8243;
-[ RECORD 1 ]----+---------------------------------
datid | 758972
datname | xxx
procpid | 8243
usesysid | 661846
usename | test
application_name | psql
client_addr | 10.136.4.90
client_hostname |
client_port | 6382
backend_start | 2013-07-08 14:11:00.942293+08
xact_start | 2013-07-08 14:31:11.157681+08
query_start | 2013-07-08 14:31:11.157681+08
waiting | f
current_query | select * from yyyy;

pg_terminate_backend return t but the backend still there.

strace the pid show the process hang at sendto() function call:
[postgres@xxx ~]$ strace -tv -p 8243
Process 8243 attached - interrupt to quit
17:02:26 sendto(10,
"70804\0\0\0\f{2012070804}D\0\0\0`\0\5\0\0\0\1"..., 8152, 0,
NULL, 0^C <unfinished ...>
Process 8243 detached

os is centos 6 x86-64,pg version is 9.1.9.

the process still there,how can I help to debug the problem?

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

Jov

amutu@amutu.com

almost 13 years ago

In reply to: Bill Mitchell (#4)

Re: backend hangs at sendto() and can't be terminated

my first post already try the pg_terminate_backend but failed:
pg_terminate_backend return t but the backend still there.

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

2013/7/8 Bill Mitchell <bill@publicrelay.com>

Show quoted text

Hmm..
In that case, I think that select pg_terminate_backend() might be in order?

http://www.postgresql.org/docs/9.1/static/functions-admin.html

regards,
Bill

On 7/8/13 5:46 AM, Jov wrote:

we do select pg_cancel_backend(8243) several times,but the backend still
hang there.

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

2013/7/8 Bill Mitchell <bill@publicrelay.com>

You can do select pg_cancel_backend(8243);

and that should terminate that process that is sending, but still leave
your postgres server healthy.

regards,
Bill

On 7/8/13 5:31 AM, Jov wrote:

one of our pg user send a select * from 10 million table without limit
from psql，before get the return data，he realize the mistake and quite from
psql.but after 2 hours,the sql still alive:

postgres=# SELECT * from pg_stat_activity where procpid = 8243;
-[ RECORD 1 ]----+---------------------------------
datid | 758972
datname | xxx
procpid | 8243
usesysid | 661846
usename | test
application_name | psql
client_addr | 10.136.4.90
client_hostname |
client_port | 6382
backend_start | 2013-07-08 14:11:00.942293+08
xact_start | 2013-07-08 14:31:11.157681+08
query_start | 2013-07-08 14:31:11.157681+08
waiting | f
current_query | select * from yyyy;

pg_terminate_backend return t but the backend still there.

strace the pid show the process hang at sendto() function call:
[postgres@xxx ~]$ strace -tv -p 8243
Process 8243 attached - interrupt to quit
17:02:26 sendto(10, "70804\0\0\0\f{2012070804}D\0\0\0`\0\5\0\0\0\1"...,
8152, 0, NULL, 0^C <unfinished ...>
Process 8243 detached

os is centos 6 x86-64,pg version is 9.1.9.

the process still there,how can I help to debug the problem?

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

Merlin Moncure

mmoncure@gmail.com

almost 13 years ago

In reply to: Jov (#5)

Re: backend hangs at sendto() and can't be terminated

On Mon, Jul 8, 2013 at 4:56 AM, Jov <amutu@amutu.com> wrote:

my first post already try the pg_terminate_backend but failed:
pg_terminate_backend return t but the backend still there.

possibly a kernel problem?

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Merlin Moncure (#6)

Re: backend hangs at sendto() and can't be terminated

Merlin Moncure <mmoncure@gmail.com> writes:

On Mon, Jul 8, 2013 at 4:56 AM, Jov <amutu@amutu.com> wrote:

my first post already try the pg_terminate_backend but failed:
pg_terminate_backend return t but the backend still there.

possibly a kernel problem?

The backend will keep trying to send data until the kernel informs it
the connection is lost. (Anything else would be a bad idea.) So the
real question here is why it's taking so long for the TCP stack to
decide that the client is gone. I'm wondering what exactly you did
to kill the psql session. Most ordinary ways of killing a process
should result in closure of whatever connections it had open.

If you'd lost network connectivity to the client, a TCP timeout on the
order of an hour wouldn't be surprising. (If you feel this is too long,
you can fool with the TCP keepalive parameters.) But it seems unlikely
that that's what's happening here.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jov

amutu@amutu.com

almost 13 years ago

In reply to: Tom Lane (#7)

Re: backend hangs at sendto() and can't be terminated

n
etstat show nothing about the socket of the process,so I think the TCP
timeout took effect.so it is really wired.

Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

2013/7/8 Tom Lane <tgl@sss.pgh.pa.us>

Show quoted text

Merlin Moncure <mmoncure@gmail.com> writes:

On Mon, Jul 8, 2013 at 4:56 AM, Jov <amutu@amutu.com> wrote:

my first post already try the pg_terminate_backend but failed:
pg_terminate_backend return t but the backend still there.

possibly a kernel problem?

The backend will keep trying to send data until the kernel informs it
the connection is lost. (Anything else would be a bad idea.) So the
real question here is why it's taking so long for the TCP stack to
decide that the client is gone. I'm wondering what exactly you did
to kill the psql session. Most ordinary ways of killing a process
should result in closure of whatever connections it had open.

If you'd lost network connectivity to the client, a TCP timeout on the
order of an hour wouldn't be surprising. (If you feel this is too long,
you can fool with the TCP keepalive parameters.) But it seems unlikely
that that's what's happening here.

regards, tom lane

Steven Schlansker

steven@likeness.com

almost 13 years ago

In reply to: Jov (#8)

Re: backend hangs at sendto() and can't be terminated

On Jul 8, 2013, at 6:48 AM, Jov <amutu@amutu.com> wrote:

netstat show nothing about the socket of the process,so I think the TCP timeout took effect.so it is really wired.

Jov
blog: http:amutu.com/blog

2013/7/8 Tom Lane <tgl@sss.pgh.pa.us>
Merlin Moncure <mmoncure@gmail.com> writes:

On Mon, Jul 8, 2013 at 4:56 AM, Jov <amutu@amutu.com> wrote:

my first post already try the pg_terminate_backend but failed:
pg_terminate_backend return t but the backend still there.

possibly a kernel problem?

The backend will keep trying to send data until the kernel informs it
the connection is lost. (Anything else would be a bad idea.) So the
real question here is why it's taking so long for the TCP stack to
decide that the client is gone. I'm wondering what exactly you did
to kill the psql session. Most ordinary ways of killing a process
should result in closure of whatever connections it had open.

If you'd lost network connectivity to the client, a TCP timeout on the
order of an hour wouldn't be surprising. (If you feel this is too long,
you can fool with the TCP keepalive parameters.) But it seems unlikely
that that's what's happening here.

Interestingly enough, I am seeing what may (or then again, may not) be a related problem.

I have a backend process stuck in a "recvfrom" --
[root@prd-db2a ~]# strace -tv -p 24402
Process 24402 attached - interrupt to quit
00:02:00 recvfrom(10,

postgres 24402 0.0 10.7 3505628 2639032 ? Ss Jul01 0:21 postgres: event event 10.29.62.21(39485) idle

It is a psql process that I launched from the command line 10 days ago:

steven 24401 0.0 0.0 166824 2532 pts/2 T Jul01 0:00 psql -U event -h prd-db2a.nessops.net -c delete from event where event_id in (select event_id from event where payload is null limit 100000);

-------+------------------+-------------------------------+---------+-------+----------------------------------------------------------------------------------------
--------------
24402 | psql | 2013-07-01 21:03:27.417039+00 | f | idle | delete from event where event_id in (select event_id from event where payload is null l
imit 100000);
(1 row)

I invoked it with -c, which supposedly makes it exit when the single command is finished. Many similar queries have been run, and I'd say they run for a half hour on average.

But the process has been alive for 10 days now, not blocked, but just idle. It is connected over TCP from the local box (although not through the loopback interface, through a 10.x interface) How does this make any sense?

The command seems to be immune to pg_cancel_backend, but pg_terminate_backend did manage to kill it.

If this problem is not related, please tell me to shove off, and I will not pollute this thread further. But I hope some of this information is useful.

I am running PG 9.2.4, CentOS kernel 2.6.32.360.

Best,
Steven

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10