Backend with closed connection at 99% CPU

Started by Guy Thornleyover 21 years ago2 messagesbugs
Jump to latest
#1Guy Thornley
guy@esphion.com

First, I better let you know that we have an in-house patch already on our
postgres, so this may be our breakage. It only started happening recently,
though, and our patch is quite old, so it is very unlikely.

I thought I'd ask here anyway, incase this was a known bug that was fixed
already. I couldn't see anything in the release notes, however.

Postgres 7.4.1. (Yes I know, we _should_ upgrade).

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27583 postgres 15 0 163m 163m 159m R 97.2 16.2 14:36.01 postmaster

As the subject says, it is spinning at 99% CPU. Memory consumption does not
appear to be increasing.

This backend has recently lost its client connection, and it appeared after
I shutdown a bunch of JDBC connections:

uke19:~# netstat -np | grep 27583
tcp 1 0 127.0.0.1:5432 127.0.0.1:35175 CLOSE_WAIT 27583/postgres

This happened 3 times last week, too. Never 'idle in transaction' until last
night, however, when it managed to lose its connection while 'idle in
transaction'. This left some things locked, and I had to kill it.

I only noticed this problem after messing with some settings in the
configuration file:

max_connections = 88
superuser_reserved_connections = 4
wal_buffers = 544

Around the same time, I changed the java code to close down the database
connections properly, doing conn.close() on pg connections. We are using
'postgresql-jdbc3.jar' that is in the 'libpgjava' Debian package.

You can even have a backtrace, how about that:

(gdb) attach 27583
Attaching to process 27583
0x0811cf40 in enlargeStringInfo ()

(gdb) bt
#0 0x0811cf40 in enlargeStringInfo ()
#1 0x081249b8 in pq_getmessage ()
#2 0x0817bdfe in HandleFunctionRequest ()
#3 0x0817bfda in HandleFunctionRequest ()
#4 0x0817eacc in PostgresMain ()
#5 0x0815877b in ClosePostmasterPorts ()
#6 0x08158163 in ClosePostmasterPorts ()
#7 0x08156658 in PostmasterMain ()
#8 0x08155ce4 in PostmasterMain ()
#9 0x08125cb6 in main ()
#10 0x4026eda6 in __libc_start_main () from /lib/libc.so.6

I can get a stack data dump from gdb if requested, and I'll leave this
attached to gdb for now. I'll probably need to restart postgres soon (to try
some more settings) so I dont want to leave it attached _too_ long.. ;)

.Guy

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Guy Thornley (#1)
Re: Backend with closed connection at 99% CPU

Guy Thornley <guy@esphion.com> writes:

Postgres 7.4.1. (Yes I know, we _should_ upgrade).

Yup.

As the subject says, it is spinning at 99% CPU. ...
This backend has recently lost its client connection, ...
You can even have a backtrace, how about that:

(gdb) bt
#0 0x0811cf40 in enlargeStringInfo ()
#1 0x081249b8 in pq_getmessage ()
#2 0x0817bdfe in HandleFunctionRequest ()
#3 0x0817bfda in HandleFunctionRequest ()
#4 0x0817eacc in PostgresMain ()

I'm betting this is this bug:

2004-05-11 16:07 tgl

* src/backend/lib/stringinfo.c (REL7_4_STABLE): Add tests to
enlargeStringInfo() to avoid possible buffer-overrun or
infinite-loop problems if a bogus data length is passed.

Somehow the dying client injected a few bogus bytes into the
communication channel, and managed to trigger the infinite-loop
variant of this bug.

regards, tom lane