cvs tip - stats buffer process consuming 100% cpu
I just noticed that the stats buffer process is consuming 100% cpu as
soon as a backend is started, and continues after that backend is ended:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15150 postgres 25 0 27004 948 508 S 99.9 0.0 0:30.97 postmaster
# ps -ef |grep 15150
postgres 15150 15143 78 11:29 pts/3 00:00:38 postgres: stats buffer
process
postgres 15151 15150 0 11:29 pts/3 00:00:00 postgres: stats
collector process
(gdb) bt
#0 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
available.
) at pgstat.c:1921
#2 0x000000000055f73b in pgstat_start () at pgstat.c:614
#3 0x0000000000562fda in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2175
#4 <signal handler called>
#5 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#6 0x0000000000560d0f in ServerLoop () at postmaster.c:1180
#7 0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
postmaster.c:943
#8 0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263
I noticed a recent discussion on the stats collector -- is this related
to a recent change?
Joe
Interesting. Here is the patch I just applied:
The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?
Googling shows Linux does modify the structure (see bottom):
so I will fix the code accordingly. Patch attached and applied.
---------------------------------------------------------------------------
Joe Conway wrote:
I just noticed that the stats buffer process is consuming 100% cpu as
soon as a backend is started, and continues after that backend is ended:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15150 postgres 25 0 27004 948 508 S 99.9 0.0 0:30.97 postmaster# ps -ef |grep 15150
postgres 15150 15143 78 11:29 pts/3 00:00:38 postgres: stats buffer
process
postgres 15151 15150 0 11:29 pts/3 00:00:00 postgres: stats
collector process(gdb) bt
#0 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
available.
) at pgstat.c:1921
#2 0x000000000055f73b in pgstat_start () at pgstat.c:614
#3 0x0000000000562fda in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2175
#4 <signal handler called>
#5 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#6 0x0000000000560d0f in ServerLoop () at postmaster.c:1180
#7 0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
postmaster.c:943
#8 0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263I noticed a recent discussion on the stats collector -- is this related
to a recent change?Joe
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Attachments:
/bjm/difftext/plainDownload
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.117
diff -c -c -r1.117 pgstat.c
*** src/backend/postmaster/pgstat.c 3 Jan 2006 16:42:17 -0000 1.117
--- src/backend/postmaster/pgstat.c 3 Jan 2006 19:52:14 -0000
***************
*** 1871,1884 ****
msgbuffer = (char *) palloc(PGSTAT_RECVBUFFERSZ);
/*
- * Wait for some work to do; but not for more than 10 seconds. (This
- * determines how quickly we will shut down after an ungraceful
- * postmaster termination; so it needn't be very fast.)
- */
- timeout.tv_sec = 10;
- timeout.tv_usec = 0;
-
- /*
* Loop forever
*/
for (;;)
--- 1871,1876 ----
***************
*** 1918,1923 ****
--- 1910,1924 ----
maxfd = writePipe;
}
+ /*
+ * Wait for some work to do; but not for more than 10 seconds. (This
+ * determines how quickly we will shut down after an ungraceful
+ * postmaster termination; so it needn't be very fast.) struct timeout
+ * is modified by some operating systems.
+ */
+ timeout.tv_sec = 10;
+ timeout.tv_usec = 0;
+
if (select(maxfd + 1, &rfds, &wfds, NULL, &timeout) < 0)
{
if (errno == EINTR)
Bruce Momjian <pgman@candle.pha.pa.us> writes:
The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?
You shouldn't assume so; I think it does on some platforms. The Single
Unix Spec says
On successful completion, the object pointed to by the timeout
argument may be modified.
regards, tom lane