cvs tip - stats buffer process consuming 100% cpu

Started by Joe Conwayabout 20 years ago3 messages
#1Joe Conway
mail@joeconway.com

I just noticed that the stats buffer process is consuming 100% cpu as
soon as a backend is started, and continues after that backend is ended:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15150 postgres 25 0 27004 948 508 S 99.9 0.0 0:30.97 postmaster

# ps -ef |grep 15150
postgres 15150 15143 78 11:29 pts/3 00:00:38 postgres: stats buffer
process
postgres 15151 15150 0 11:29 pts/3 00:00:00 postgres: stats
collector process

(gdb) bt
#0 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
available.
) at pgstat.c:1921
#2 0x000000000055f73b in pgstat_start () at pgstat.c:614
#3 0x0000000000562fda in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2175
#4 <signal handler called>
#5 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#6 0x0000000000560d0f in ServerLoop () at postmaster.c:1180
#7 0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
postmaster.c:943
#8 0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263

I noticed a recent discussion on the stats collector -- is this related
to a recent change?

Joe

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Joe Conway (#1)
1 attachment(s)
Re: cvs tip - stats buffer process consuming 100% cpu

Interesting. Here is the patch I just applied:

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/postmaster/pgstat.c.diff?r1=1.116&amp;r2=1.117

The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?

Googling shows Linux does modify the structure (see bottom):

http://groups.google.com/group/comp.unix.programmer/browse_frm/thread/a53c7c4a71cb48e5/5f0bbcc9fe0230a2?lnk=st&amp;q=select+timeout+modify&amp;rnum=9#5f0bbcc9fe0230a2

so I will fix the code accordingly. Patch attached and applied.

---------------------------------------------------------------------------

Joe Conway wrote:

I just noticed that the stats buffer process is consuming 100% cpu as
soon as a backend is started, and continues after that backend is ended:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15150 postgres 25 0 27004 948 508 S 99.9 0.0 0:30.97 postmaster

# ps -ef |grep 15150
postgres 15150 15143 78 11:29 pts/3 00:00:38 postgres: stats buffer
process
postgres 15151 15150 0 11:29 pts/3 00:00:00 postgres: stats
collector process

(gdb) bt
#0 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#1 0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
available.
) at pgstat.c:1921
#2 0x000000000055f73b in pgstat_start () at pgstat.c:614
#3 0x0000000000562fda in reaper (postgres_signal_arg=Variable
"postgres_signal_arg" is not available.
) at postmaster.c:2175
#4 <signal handler called>
#5 0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
#6 0x0000000000560d0f in ServerLoop () at postmaster.c:1180
#7 0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
postmaster.c:943
#8 0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263

I noticed a recent discussion on the stats collector -- is this related
to a recent change?

Joe

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.117
diff -c -c -r1.117 pgstat.c
*** src/backend/postmaster/pgstat.c	3 Jan 2006 16:42:17 -0000	1.117
--- src/backend/postmaster/pgstat.c	3 Jan 2006 19:52:14 -0000
***************
*** 1871,1884 ****
  	msgbuffer = (char *) palloc(PGSTAT_RECVBUFFERSZ);
  
  	/*
- 	 * Wait for some work to do; but not for more than 10 seconds. (This
- 	 * determines how quickly we will shut down after an ungraceful
- 	 * postmaster termination; so it needn't be very fast.)
- 	 */
- 	timeout.tv_sec = 10;
- 	timeout.tv_usec = 0;
- 
- 	/*
  	 * Loop forever
  	 */
  	for (;;)
--- 1871,1876 ----
***************
*** 1918,1923 ****
--- 1910,1924 ----
  				maxfd = writePipe;
  		}
  
+ 		/*
+ 		 * Wait for some work to do; but not for more than 10 seconds. (This
+ 		 * determines how quickly we will shut down after an ungraceful
+ 		 * postmaster termination; so it needn't be very fast.)  struct timeout
+ 		 * is modified by some operating systems.
+ 		 */
+ 		timeout.tv_sec = 10;
+ 		timeout.tv_usec = 0;
+ 
  		if (select(maxfd + 1, &rfds, &wfds, NULL, &timeout) < 0)
  		{
  			if (errno == EINTR)
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: cvs tip - stats buffer process consuming 100% cpu

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?

You shouldn't assume so; I think it does on some platforms. The Single
Unix Spec says

On successful completion, the object pointed to by the timeout
argument may be modified.

regards, tom lane