Pgsql taking a *lot* of CPU time (unkillable).

Started by Berteun Dammanover 21 years ago5 messagesgeneral
Jump to latest
#1Berteun Damman
berteun@gmail.com

Hello,

I'm currently running PostgreSQL 7.4.6 under NetBSD 2.0 (Release), but
with a custom kernel. I can start it, and it performs normally, i.e. I
can access my databases and such. Now I'm primarily using it with the
GNUCash PostgreSQL backend.

After I've finished using it, and leaving it to itself for a while, it
starts to consume all CPU time for, apparently, no good reason
(because it's not doing anything).

I started it thusly:
/usr/pkg/bin/postmaster -i -D /usr/pkg/pgsql/data/

And the following output appeared:

LOG: database system was interrupted at 2005-01-14 13:52:42 CET
LOG: checkpoint record is at 0/2329AA0
LOG: redo record is at 0/2329AA0; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 46264; next OID: 142900
LOG: database system was not properly shut down; automatic recovery in progress
LOG: record with zero length at 0/2329AE0
LOG: redo is not required
LOG: database system is ready

ps auxww | grep pgsql shows:
pgsql 15786 94.8 0.3 4380 568 p2 R+ 4:13PM 5:13.13
/usr/pkg/bin/postmaster -i -D /usr/pkg/pgsql/data/ (postgres)
pgsql 24309 0.0 0.0 5368 4 p2 IW+ 4:13PM 0:00.01
postmaster: stats buffer process (postgres)
pgsql 25177 0.0 0.0 4420 4 p2 IW+ 4:13PM 0:00.01
postmaster: stats collector process (postgres)
pgsql 29008 0.0 0.0 0 0 p2 ZW+ - 0:00.00 (postgres)

Top gives:
15786 pgsql 64 0 4380K 568K RUN 5:56 93.80% 93.80% postgres

Now, the program won't respond to kill, or to a ctrl+c on the command
line, I have to kill it with -9.

I've tried to run it with a higher debug level, but this does not give
any useful information, except for a sequence of (of course, while
performing query's with gnucash, a lot of query information is shown,
but after quitting it and leaving postmaster to itself only this is
shown):

DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: child process (PID 24738) exited with exit code 0

With varying PID's. After a while, this stops, and everything hangs as
described above. The only thing to remark is that it does not seem to
happen when running with -d 5 (but I'm not really sure).

As said, I'm running NetBSD 2.0, with my own kernel, on the i386
platform. I hope this gives someone enough information to make a guess
about the cause, although I realise the problem is quite vague.

Berteun

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Berteun Damman (#1)
Re: Pgsql taking a *lot* of CPU time (unkillable).

Berteun Damman <berteun@gmail.com> writes:

After I've finished using it, and leaving it to itself for a while, it
starts to consume all CPU time for, apparently, no good reason
(because it's not doing anything).

Would you attach to the process with a debugger and get a stack trace?

$ gdb /usr/pkg/bin/postgres PID-of-process
gdb> bt
gdb> q

Probably should repeat this a few times to get a clear sense of where
it's looping.

regards, tom lane

#3Berteun Damman
berteun@gmail.com
In reply to: Tom Lane (#2)
Re: Pgsql taking a *lot* of CPU time (unkillable).

On Sat, 15 Jan 2005 13:15:36 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Would you attach to the process with a debugger and get a stack trace?

$ gdb /usr/pkg/bin/postgres PID-of-process
gdb> bt
gdb> q

Probably should repeat this a few times to get a clear sense of where
it's looping.

I think it has a locking problem:
#0 0x483bbb2e in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
Error accessing memory address 0x483bbb26: Operation not permitted.

And the other time:
#0 0x483bbb31 in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
And again an accessing error.

Does this indicate an error in NetBSD's pthreading library?

Berteun

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Berteun Damman (#3)
Re: Pgsql taking a *lot* of CPU time (unkillable).

Berteun Damman <berteun@gmail.com> writes:

On Sat, 15 Jan 2005 13:15:36 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Would you attach to the process with a debugger and get a stack trace?

I think it has a locking problem:
#0 0x483bbb2e in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
Error accessing memory address 0x483bbb26: Operation not permitted.

And the other time:
#0 0x483bbb31 in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
And again an accessing error.

Does this indicate an error in NetBSD's pthreading library?

Not necessarily --- it just means that gdb is confused and can't find
the stacked return addresses :-(. One thing to check is whether you
have the most up-to-date available version of gdb. Also, I'd suggest
trying it a dozen or two times in hopes of catching it when it's not
inside libpthread.

Another trick I've sometimes had success with is to kill the process in
such a way that it produces a core dump (kill -ABRT should do this),
and then gdb the core dump file instead of the live process. gdb seems
to handle that a bit differently and sometimes you can get a stack trace
one way when you couldn't get it the other way.

If none of that works, I'd suggest asking for help from the NetBSD
hackers; they may know some special way of finding out the call stack.
But we aren't going to be able to get far if we can't figure out what
it's doing.

regards, tom lane

#5Berteun Damman
berteun@gmail.com
In reply to: Berteun Damman (#1)
Re: Pgsql taking a *lot* of CPU time (unkillable).

On Sat, 15 Jan 2005 16:25:34 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:

You don't need to reproduce the bug from scratch each time. What I
meant was, once it seems to be spinning, repeatedly attach to it with
gdb and see if you can get a backtrace. If not, just quit gdb and try
again.

Oh, I was unclear there, the problem is, the process get's killed by
gdb (apparently), anyway, it does not run anymore after I've attached
gdb.

I'll continue again at the NetBSD mailing list.

Berteun