cvs tip - stats buffer process consuming 100% cpu
I just noticed that the stats buffer process is consuming 100% cpu as
soon as a backend is started, and continues after that backend is ended:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15150 postgres 25 0 27004 948 508 S 99.9 0.0 0:30.97 postmaster
# ps -ef |grep 15150
postgres 15150 15143 78 11:29 pts/3 00:00:38 postgres: stats buffer
process
postgres 15151 15150 0 11:29 pts/3 00:00:00 postgres: stats
collector process
(gdb) bt
#0 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
available.
) at pgstat.c:1921
#2 0x000000000055f73b in pgstat_start () at pgstat.c:614
#3 0x0000000000562fda in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2175
#4 <signal handler called>
#5 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#6 0x0000000000560d0f in ServerLoop () at postmaster.c:1180
#7 0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
postmaster.c:943
#8 0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263
I noticed a recent discussion on the stats collector -- is this related
to a recent change?
Joe
Interesting. Here is the patch I just applied:
The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?
Googling shows Linux does modify the structure (see bottom):
so I will fix the code accordingly. Patch attached and applied.
---------------------------------------------------------------------------
Joe Conway wrote:
I just noticed that the stats buffer process is consuming 100% cpu as
soon as a backend is started, and continues after that backend is ended:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15150 postgres 25 0 27004 948 508 S 99.9 0.0 0:30.97 postmaster# ps -ef |grep 15150
postgres 15150 15143 78 11:29 pts/3 00:00:38 postgres: stats buffer
process
postgres 15151 15150 0 11:29 pts/3 00:00:00 postgres: stats
collector process(gdb) bt
#0 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
available.
) at pgstat.c:1921
#2 0x000000000055f73b in pgstat_start () at pgstat.c:614
#3 0x0000000000562fda in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2175
#4 <signal handler called>
#5 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#6 0x0000000000560d0f in ServerLoop () at postmaster.c:1180
#7 0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
postmaster.c:943
#8 0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263I noticed a recent discussion on the stats collector -- is this related
to a recent change?Joe
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachments:
/bjm/difftext/plainDownload+9-8
Bruce Momjian <pgman@candle.pha.pa.us> writes:
The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?
You shouldn't assume so; I think it does on some platforms. The Single
Unix Spec says
On successful completion, the object pointed to by the timeout
argument may be modified.
regards, tom lane