idle in transaction query makes server unresponsive

Started by Scot Kreienkampover 13 years ago11 messagesgeneral
Jump to latest
#1Scot Kreienkamp
SKreien@la-z-boy.com

Hi everyone,

I have a problem that I've been struggling with for quite some time. Every once in a while I will get a connection that goes to idle in transaction on an in-house programmed application that connects with JDBC. That happens fairly regularly and the programmers are trying to clean that up, but sometimes the idle in transaction connection makes the PG server entirely unresponsive. I'm not getting connection refused, nothing. All connections existing or new, JDBC or psql, just hang. I've already got full query logging on to try to catch the problem query or connection so I can give the developers somewhere to look to resolve their issue with the application, but since queries are logged with runtimes I'm assuming they are only logged after they are complete. And since it's idle in transaction it never completes so it never gets logged. Our application is connecting as an unprivileged user named rmstomcat, and the database is limited to 400 connections out of 512. I'm not running out of connections as I've got reserved connections set, and even connecting as user postgres with psql the connection just hangs. The server doesn't appear to be running out of memory when this happens and nothing is printed in the log. The only thing that resolves it is doing a kill on the PID of any idle in transaction connections existing at the time causing them to roll back. Then everything else picks up right where it left off and works again.

Can anyone give me any hints about why PG becomes unresponsive? Or how to fix it so it doesn't?

My server is 9.1.2 right now. I will be upgrading to the latest 9.1 series soon, but until 9.2 can be run through our development/testing cycle I can't upgrade to 9.2. That will take about 6-10 months.

Thanks!

Scot Kreienkamp

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

#2John R Pierce
pierce@hogranch.com
In reply to: Scot Kreienkamp (#1)
Re: idle in transaction query makes server unresponsive

On 09/25/12 12:23 PM, Scot Kreienkamp wrote:

I have a problem that I've been struggling with for quite some time.
Every once in a while I will get a connection that goes to idle in
transaction on an in-house programmed application that connects with
JDBC. That happens fairly regularly and the programmers are trying to
clean that up, but sometimes the idle in transaction connection makes
the PG server entirely unresponsive. I'm not getting connection
refused, nothing. All connections existing or new, JDBC or psql, just
hang. I've already got full query logging on to try to catch the
problem query or connection so I can give the developers somewhere to
look to resolve their issue with the application, but since queries
are logged with runtimes I'm assuming they are only logged after they
are complete. And since it's idle in transaction it never completes
so it never gets logged. Our application is connecting as an
unprivileged user named rmstomcat, and the database is limited to 400
connections out of 512. I'm not running out of connections as I've
got reserved connections set, and even connecting as user postgres
with psql the connection just hangs. The server doesn't appear to be
running out of memory when this happens and nothing is printed in the
log. The only thing that resolves it is doing a kill on the PID of
any idle in transaction connections existing at the time causing them
to roll back. Then everything else picks up right where it left off
and works again.

Can anyone give me any hints about why PG becomes unresponsive? Or
how to fix it so it doesn't?

that is a LOT of connections. you likely should be limiting that with
a connection pooler, and configuring your application to ...

1) get connection from pool
2) execute transaction
3) release connection to pool

then configure the pool to stall the requester when some sane number of
connections has been reached, like no more than 2-3X the number of CPU
cores or hardware threads you have. you'll likely get better overall
throughput.

if you have jobs that execute long running queries for reporting etc,
have those use a seperate smaller pool.

re: your logging.... <idle in transaction> means that connection has no
query running but started a transaction. there's no pending query on
that connection. these are normally only a concern when they go on
for a long time, say 10 minutes or more. however, if that transaction
has gotten locks on resources, and is then sitting on its thumbs doing
nothing, OTHER connections likely will block. join pg_stat_activity
with pg_locks to find out what all is going on..

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#3Scot Kreienkamp
SKreien@la-z-boy.com
In reply to: John R Pierce (#2)
Re: idle in transaction query makes server unresponsive

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of John R Pierce
Sent: Tuesday, September 25, 2012 3:53 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] idle in transaction query makes server unresponsive

On 09/25/12 12:23 PM, Scot Kreienkamp wrote:

I have a problem that I've been struggling with for quite some time.
Every once in a while I will get a connection that goes to idle in
transaction on an in-house programmed application that connects with
JDBC. That happens fairly regularly and the programmers are trying to
clean that up, but sometimes the idle in transaction connection makes
the PG server entirely unresponsive. I'm not getting connection
refused, nothing. All connections existing or new, JDBC or psql, just
hang. I've already got full query logging on to try to catch the
problem query or connection so I can give the developers somewhere to
look to resolve their issue with the application, but since queries
are logged with runtimes I'm assuming they are only logged after they
are complete. And since it's idle in transaction it never completes
so it never gets logged. Our application is connecting as an
unprivileged user named rmstomcat, and the database is limited to 400
connections out of 512. I'm not running out of connections as I've
got reserved connections set, and even connecting as user postgres
with psql the connection just hangs. The server doesn't appear to be
running out of memory when this happens and nothing is printed in the
log. The only thing that resolves it is doing a kill on the PID of
any idle in transaction connections existing at the time causing them
to roll back. Then everything else picks up right where it left off
and works again.

Can anyone give me any hints about why PG becomes unresponsive? Or
how to fix it so it doesn't?

that is a LOT of connections. you likely should be limiting that with
a connection pooler, and configuring your application to ...

1) get connection from pool
2) execute transaction
3) release connection to pool

then configure the pool to stall the requester when some sane number of
connections has been reached, like no more than 2-3X the number of CPU
cores or hardware threads you have. you'll likely get better overall
throughput.

if you have jobs that execute long running queries for reporting etc,
have those use a seperate smaller pool.

re: your logging.... <idle in transaction> means that connection has no
query running but started a transaction. there's no pending query on
that connection. these are normally only a concern when they go on
for a long time, say 10 minutes or more. however, if that transaction
has gotten locks on resources, and is then sitting on its thumbs doing
nothing, OTHER connections likely will block. join pg_stat_activity
with pg_locks to find out what all is going on..

[Scot Kreienkamp]

Hi John,

The application is using a pooler and generally runs around 100 connections, but I've seen it as high as 200 during the day for normal use. It's on a large server; 64 cores total and about 500 gigs of memory. That's one of the reasons I left it at 512 connections. The idle in transaction connections are getting locks and then going idle in transaction causing the queries to be waiting in that database. That I can understand. My problem is that I can't run a query to see what exactly it's doing because the entire Postgres server is unresponsive. I can't even use psql to connect to the postgres user database as user postgres so I can query pg_stat_activity, that hangs also until I kill the idle in transaction query PID. That's what my dilemma is. The server hardware itself is not being stressed when that's happening though, so it doesn't appear to be a resource problem, but I can't check because I can't see what PG is doing.

The problem is how do I investigate this when PG is entirely unresponsive? Why is it becoming unresponsive, and how do I prevent the PG server from becoming unresponsive?

Thanks!

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

#4John R Pierce
pierce@hogranch.com
In reply to: Scot Kreienkamp (#3)
Re: idle in transaction query makes server unresponsive

On 09/25/12 1:35 PM, Scot Kreienkamp wrote:

The problem is how do I investigate this when PG is entirely unresponsive? Why is it becoming unresponsive, and how do I prevent the PG server from becoming unresponsive?

I think I'd push that 9.1.latest upgrade ASAP, and then see if this
problem continues. been a pile of critical fixes since 9.1.2, to whit...
http://www.postgresql.org/docs/current/static/release-9-1-3.html
http://www.postgresql.org/docs/current/static/release-9-1-4.html
http://www.postgresql.org/docs/current/static/release-9-1-5.html
http://www.postgresql.org/docs/current/static/release-9-1-6.html

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#5Thomas Kellerer
spam_eater@gmx.net
In reply to: Scot Kreienkamp (#3)
Re: idle in transaction query makes server unresponsive

Scot Kreienkamp wrote on 25.09.2012 22:35:

The application is using a pooler and generally runs around 100
connections, but I've seen it as high as 200 during the day for
normal use. It's on a large server; 64 cores total and about 500
gigs of memory. That's one of the reasons I left it at 512
connections.

We had several web applications where performance was *improved*
by configuring the connection pool have a a lot less connections.

There is a threshold where too many connections
will simply flood the server. Lowering the number of processes
fighting for resource makes each process faster.

You might want to give it a try.

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Scot Kreienkamp (#1)
Re: idle in transaction query makes server unresponsive

Scot Kreienkamp wrote:

I have a problem that I've been struggling with for quite some time.

Every once in a while I will get

a connection that goes to idle in transaction on an in-house

programmed application that connects with

JDBC. That happens fairly regularly and the programmers are trying to

clean that up, but sometimes

the idle in transaction connection makes the PG server entirely

unresponsive. I'm not getting

connection refused, nothing. All connections existing or new, JDBC or

psql, just hang. I've already

got full query logging on to try to catch the problem query or

connection so I can give the developers

somewhere to look to resolve their issue with the application, but

since queries are logged with

runtimes I'm assuming they are only logged after they are complete.

And since it's idle in

transaction it never completes so it never gets logged. Our

application is connecting as an

unprivileged user named rmstomcat, and the database is limited to 400

connections out of 512. I'm not

running out of connections as I've got reserved connections set, and

even connecting as user postgres

with psql the connection just hangs. The server doesn't appear to be

running out of memory when this

happens and nothing is printed in the log. The only thing that

resolves it is doing a kill on the PID

of any idle in transaction connections existing at the time causing

them to roll back. Then

everything else picks up right where it left off and works again.

Can anyone give me any hints about why PG becomes unresponsive? Or

how to fix it so it doesn't?

My server is 9.1.2 right now. I will be upgrading to the latest 9.1

series soon, but until 9.2 can be

run through our development/testing cycle I can't upgrade to 9.2.

That will take about 6-10 months.

Yes, see if upgrading to 9.1 makes the problem disappear.

It is surprising that you cannot even start new connections.

You could try to "strace" the postmaster during a connection attempt
and see what happens. Maybe that helps to spot the place where
things go wrong.

Yours,
Laurenz Albe

#7Scot Kreienkamp
SKreien@la-z-boy.com
In reply to: Thomas Kellerer (#5)
Re: idle in transaction query makes server unresponsive

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Thomas Kellerer
Sent: Tuesday, September 25, 2012 5:25 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] idle in transaction query makes server unresponsive

Scot Kreienkamp wrote on 25.09.2012 22:35:

The application is using a pooler and generally runs around 100
connections, but I've seen it as high as 200 during the day for
normal use. It's on a large server; 64 cores total and about 500
gigs of memory. That's one of the reasons I left it at 512
connections.

We had several web applications where performance was *improved*
by configuring the connection pool have a a lot less connections.

There is a threshold where too many connections
will simply flood the server. Lowering the number of processes
fighting for resource makes each process faster.

You might want to give it a try.

[Scot Kreienkamp]
Can I get the same effect by limiting the number of logons for the user that my application is connecting as to 250? Or do I need to lower the number in postgresql.conf? I'd rather go with the first option because I can adjust it live without editing and reloading config files.

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

#8Scot Kreienkamp
SKreien@la-z-boy.com
In reply to: Laurenz Albe (#6)
Re: idle in transaction query makes server unresponsive

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Albe Laurenz
Sent: Wednesday, September 26, 2012 5:15 AM
To: Scot Kreienkamp; pgsql-general@postgresql.org
Subject: Re: [GENERAL] idle in transaction query makes server unresponsive

Scot Kreienkamp wrote:

I have a problem that I've been struggling with for quite some time.

Every once in a while I will get

a connection that goes to idle in transaction on an in-house

programmed application that connects with

JDBC. That happens fairly regularly and the programmers are trying to

clean that up, but sometimes

the idle in transaction connection makes the PG server entirely

unresponsive. I'm not getting

connection refused, nothing. All connections existing or new, JDBC or

psql, just hang. I've already

got full query logging on to try to catch the problem query or

connection so I can give the developers

somewhere to look to resolve their issue with the application, but

since queries are logged with

runtimes I'm assuming they are only logged after they are complete.

And since it's idle in

transaction it never completes so it never gets logged. Our

application is connecting as an

unprivileged user named rmstomcat, and the database is limited to 400

connections out of 512. I'm not

running out of connections as I've got reserved connections set, and

even connecting as user postgres

with psql the connection just hangs. The server doesn't appear to be

running out of memory when this

happens and nothing is printed in the log. The only thing that

resolves it is doing a kill on the PID

of any idle in transaction connections existing at the time causing

them to roll back. Then

everything else picks up right where it left off and works again.

Can anyone give me any hints about why PG becomes unresponsive? Or

how to fix it so it doesn't?

My server is 9.1.2 right now. I will be upgrading to the latest 9.1

series soon, but until 9.2 can be

run through our development/testing cycle I can't upgrade to 9.2.

That will take about 6-10 months.

Yes, see if upgrading to 9.1 makes the problem disappear.

It is surprising that you cannot even start new connections.

You could try to "strace" the postmaster during a connection attempt
and see what happens. Maybe that helps to spot the place where
things go wrong.

Yours,
Laurenz Albe

[Scot Kreienkamp]
I'm willing to give it a try, but I've never done it before. What do I need to do?

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Scot Kreienkamp (#8)
Re: idle in transaction query makes server unresponsive

Scot Kreienkamp wrote:

You could try to "strace" the postmaster during a connection attempt
and see what happens. Maybe that helps to spot the place where
things go wrong.

[Scot Kreienkamp]
I'm willing to give it a try, but I've never done it before. What do

I need to do?

"man strace" is your friend. Assuming you are on Linux.

Find out the process ID of the postmaster process
(the parent of the other PostgreSQL processes),
then try

strace -f -o /some/large/directory/tracefile -p process_id

The connect to PostgreSQL and see what gets logged.

Press Ctrl+C to stop strace.

Yours,
Laurenz Albe

#10Chris Travers
chris.travers@gmail.com
In reply to: Laurenz Albe (#9)
Re: idle in transaction query makes server unresponsive

If it is truly idle in a transaction, maybe it has locks that are holding
up other transactions?

Locks are usually held until commit time, except advisory locks iirc but
those have to be explicitly checked, so if you don't know if you are using
them you probably aren't.

Long-running transactions are generally a problem particularly when you
have idle blocks of space between them. Try shortening your transactions
and see if that helps.

Best Wishes,
Chris Travers

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Chris Travers (#10)
Re: idle in transaction query makes server unresponsive

Chris Travers wrote:

If it is truly idle in a transaction, maybe it has locks that are

holding up other transactions?

Locks are usually held until commit time, except advisory locks iirc

but those have to be explicitly

checked, so if you don't know if you are using them you probably

aren't.

Yes, but locks that block a new connection from succeeding?

Yours,
Laurenz Albe