Stats processor not restarting

Started by Magnus Haganderalmost 19 years ago9 messages
#1Magnus Hagander
magnus@hagander.net

I've noticed that if for example the autovacuum process dies (such as
with a kill -9 when testing my new shared mem implementation), only
autovac and bgwriter are restarted. The stats collector is terminated,
but not restarted. (Same goes for a regular backend, and not just
autovac)

Is there a reason for this, or is it a bug?

//Magnus

#2Alvaro Herrera
alvherre@commandprompt.com
In reply to: Magnus Hagander (#1)
Re: Stats processor not restarting

Magnus Hagander wrote:

I've noticed that if for example the autovacuum process dies (such as
with a kill -9 when testing my new shared mem implementation), only
autovac and bgwriter are restarted. The stats collector is terminated,
but not restarted. (Same goes for a regular backend, and not just
autovac)

Is there a reason for this, or is it a bug?

I would say it is a bug, because the comments and code in ServerLoop()
and reaper() say different.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#2)
Re: Stats processor not restarting

On Tue, Mar 20, 2007 at 08:48:30AM -0400, Alvaro Herrera wrote:

Magnus Hagander wrote:

I've noticed that if for example the autovacuum process dies (such as
with a kill -9 when testing my new shared mem implementation), only
autovac and bgwriter are restarted. The stats collector is terminated,
but not restarted. (Same goes for a regular backend, and not just
autovac)

Is there a reason for this, or is it a bug?

I would say it is a bug, because the comments and code in ServerLoop()
and reaper() say different.

Bah, sorry about the noise. It was the effect of
PGSTAT_RESTART_INTERVAL.

Do we want to add some logging when we don't restart it due to repeated
failures?

//Magnus

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Stats processor not restarting

Alvaro Herrera <alvherre@commandprompt.com> writes:

Magnus Hagander wrote:

I've noticed that if for example the autovacuum process dies (such as
with a kill -9 when testing my new shared mem implementation), only
autovac and bgwriter are restarted. The stats collector is terminated,
but not restarted. (Same goes for a regular backend, and not just
autovac)

Is there a reason for this, or is it a bug?

I would say it is a bug, because the comments and code in ServerLoop()
and reaper() say different.

There is code in pgstat_start that limits the frequency with which new
stats collectors can be spawned --- maybe your test case is hitting that?

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#3)
Re: Stats processor not restarting

Magnus Hagander <magnus@hagander.net> writes:

Bah, sorry about the noise. It was the effect of
PGSTAT_RESTART_INTERVAL.
Do we want to add some logging when we don't restart it due to repeated
failures?

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart? The motivation for the timeout was
to reduce cycle wastage if pgstat crashed by itself, but when you've
deliberately SIGQUITed it, that hardly seems to apply ...

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
1 attachment(s)
Re: [HACKERS] Stats processor not restarting

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Bah, sorry about the noise. It was the effect of
PGSTAT_RESTART_INTERVAL.
Do we want to add some logging when we don't restart it due to repeated
failures?

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart? The motivation for the timeout was
to reduce cycle wastage if pgstat crashed by itself, but when you've
deliberately SIGQUITed it, that hardly seems to apply ...

You mean like this, attached?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/pgpatches/stattext/plainDownload
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.149
diff -c -c -r1.149 pgstat.c
*** src/backend/postmaster/pgstat.c	16 Mar 2007 17:57:36 -0000	1.149
--- src/backend/postmaster/pgstat.c	22 Mar 2007 02:03:31 -0000
***************
*** 1929,1934 ****
--- 1929,1937 ----
  static void
  pgstat_exit(SIGNAL_ARGS)
  {
+ 	/* allow stats to restart immediately after SIGQUIT */
+ 	last_pgstat_start_time = 0;
+ 
  	need_exit = true;
  }
  
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: [HACKERS] Stats processor not restarting

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart?

You mean like this, attached?

I'd be fairly surprised if resetting the variable in the child process
had any effect on the postmaster...

I think a correct fix would involve exposing either the variable or a
routine to zero it from pgstat.c, and having postmaster.c do that at
the points where it intentionally SIGQUITs the stats collector.

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
1 attachment(s)
Re: [HACKERS] Stats processor not restarting

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart?

You mean like this, attached?

I'd be fairly surprised if resetting the variable in the child process
had any effect on the postmaster...

Yep, me too. ;-)

I think a correct fix would involve exposing either the variable or a
routine to zero it from pgstat.c, and having postmaster.c do that at
the points where it intentionally SIGQUITs the stats collector.

OK, patch attached. I just reset the value in all places where we were
killing the pgstat process and not immediately exiting ourselves.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/pgpatches/stattext/plainDownload
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.149
diff -c -c -r1.149 pgstat.c
*** src/backend/postmaster/pgstat.c	16 Mar 2007 17:57:36 -0000	1.149
--- src/backend/postmaster/pgstat.c	22 Mar 2007 14:40:36 -0000
***************
*** 572,577 ****
--- 572,581 ----
  	return 0;
  }
  
+ void allow_immediate_pgstat_restart(void)
+ {
+ 		last_pgstat_start_time = 0;
+ }
  
  /* ------------------------------------------------------------
   * Public functions used by backends follow
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.526
diff -c -c -r1.526 postmaster.c
*** src/backend/postmaster/postmaster.c	7 Mar 2007 13:35:02 -0000	1.526
--- src/backend/postmaster/postmaster.c	22 Mar 2007 14:40:37 -0000
***************
*** 1896,1902 ****
--- 1896,1905 ----
  				signal_child(PgArchPID, SIGQUIT);
  			/* Tell pgstat to shut down too; nothing left for it to do */
  			if (PgStatPID != 0)
+ 			{
  				signal_child(PgStatPID, SIGQUIT);
+ 				allow_immediate_pgstat_restart();
+ 			}
  			/* Tell autovac launcher to shut down too */
  			if (AutoVacPID != 0)
  				signal_child(AutoVacPID, SIGTERM);
***************
*** 1952,1958 ****
--- 1955,1964 ----
  				signal_child(PgArchPID, SIGQUIT);
  			/* Tell pgstat to shut down too; nothing left for it to do */
  			if (PgStatPID != 0)
+ 			{
  				signal_child(PgStatPID, SIGQUIT);
+ 				allow_immediate_pgstat_restart();
+ 			}
  			/* Tell autovac launcher to shut down too */
  			if (AutoVacPID != 0)
  				signal_child(AutoVacPID, SIGTERM);
***************
*** 2241,2247 ****
--- 2247,2256 ----
  			signal_child(PgArchPID, SIGQUIT);
  		/* Tell pgstat to shut down too; nothing left for it to do */
  		if (PgStatPID != 0)
+ 		{
  			signal_child(PgStatPID, SIGQUIT);
+ 			allow_immediate_pgstat_restart();
+ 		}
  		/* Tell autovac launcher to shut down too */
  		if (AutoVacPID != 0)
  			signal_child(AutoVacPID, SIGTERM);
***************
*** 2404,2409 ****
--- 2413,2419 ----
  								 "SIGQUIT",
  								 (int) PgStatPID)));
  		signal_child(PgStatPID, SIGQUIT);
+ 		allow_immediate_pgstat_restart();
  	}
  
  	/* We do NOT restart the syslogger */
Index: src/include/pgstat.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/pgstat.h,v
retrieving revision 1.55
diff -c -c -r1.55 pgstat.h
*** src/include/pgstat.h	16 Mar 2007 17:57:36 -0000	1.55
--- src/include/pgstat.h	22 Mar 2007 14:40:38 -0000
***************
*** 369,375 ****
  extern void pgstat_init(void);
  extern int	pgstat_start(void);
  extern void pgstat_reset_all(void);
! 
  #ifdef EXEC_BACKEND
  extern void PgstatCollectorMain(int argc, char *argv[]);
  #endif
--- 369,375 ----
  extern void pgstat_init(void);
  extern int	pgstat_start(void);
  extern void pgstat_reset_all(void);
! extern void allow_immediate_pgstat_restart(void);
  #ifdef EXEC_BACKEND
  extern void PgstatCollectorMain(int argc, char *argv[]);
  #endif
#9Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#8)
Re: [HACKERS] Stats processor not restarting

Applied. I did the case where we exit right away too, just for
consistency.

---------------------------------------------------------------------------

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart?

You mean like this, attached?

I'd be fairly surprised if resetting the variable in the child process
had any effect on the postmaster...

Yep, me too. ;-)

I think a correct fix would involve exposing either the variable or a
routine to zero it from pgstat.c, and having postmaster.c do that at
the points where it intentionally SIGQUITs the stats collector.

OK, patch attached. I just reset the value in all places where we were
killing the pgstat process and not immediately exiting ourselves.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.149
diff -c -c -r1.149 pgstat.c
*** src/backend/postmaster/pgstat.c	16 Mar 2007 17:57:36 -0000	1.149
--- src/backend/postmaster/pgstat.c	22 Mar 2007 14:40:36 -0000
***************
*** 572,577 ****
--- 572,581 ----
return 0;
}
+ void allow_immediate_pgstat_restart(void)
+ {
+ 		last_pgstat_start_time = 0;
+ }
/* ------------------------------------------------------------
* Public functions used by backends follow
Index: src/backend/postmaster/postmaster.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
retrieving revision 1.526
diff -c -c -r1.526 postmaster.c
*** src/backend/postmaster/postmaster.c	7 Mar 2007 13:35:02 -0000	1.526
--- src/backend/postmaster/postmaster.c	22 Mar 2007 14:40:37 -0000
***************
*** 1896,1902 ****
--- 1896,1905 ----
signal_child(PgArchPID, SIGQUIT);
/* Tell pgstat to shut down too; nothing left for it to do */
if (PgStatPID != 0)
+ 			{
signal_child(PgStatPID, SIGQUIT);
+ 				allow_immediate_pgstat_restart();
+ 			}
/* Tell autovac launcher to shut down too */
if (AutoVacPID != 0)
signal_child(AutoVacPID, SIGTERM);
***************
*** 1952,1958 ****
--- 1955,1964 ----
signal_child(PgArchPID, SIGQUIT);
/* Tell pgstat to shut down too; nothing left for it to do */
if (PgStatPID != 0)
+ 			{
signal_child(PgStatPID, SIGQUIT);
+ 				allow_immediate_pgstat_restart();
+ 			}
/* Tell autovac launcher to shut down too */
if (AutoVacPID != 0)
signal_child(AutoVacPID, SIGTERM);
***************
*** 2241,2247 ****
--- 2247,2256 ----
signal_child(PgArchPID, SIGQUIT);
/* Tell pgstat to shut down too; nothing left for it to do */
if (PgStatPID != 0)
+ 		{
signal_child(PgStatPID, SIGQUIT);
+ 			allow_immediate_pgstat_restart();
+ 		}
/* Tell autovac launcher to shut down too */
if (AutoVacPID != 0)
signal_child(AutoVacPID, SIGTERM);
***************
*** 2404,2409 ****
--- 2413,2419 ----
"SIGQUIT",
(int) PgStatPID)));
signal_child(PgStatPID, SIGQUIT);
+ 		allow_immediate_pgstat_restart();
}
/* We do NOT restart the syslogger */
Index: src/include/pgstat.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/pgstat.h,v
retrieving revision 1.55
diff -c -c -r1.55 pgstat.h
*** src/include/pgstat.h	16 Mar 2007 17:57:36 -0000	1.55
--- src/include/pgstat.h	22 Mar 2007 14:40:38 -0000
***************
*** 369,375 ****
extern void pgstat_init(void);
extern int	pgstat_start(void);
extern void pgstat_reset_all(void);
! 
#ifdef EXEC_BACKEND
extern void PgstatCollectorMain(int argc, char *argv[]);
#endif
--- 369,375 ----
extern void pgstat_init(void);
extern int	pgstat_start(void);
extern void pgstat_reset_all(void);
! extern void allow_immediate_pgstat_restart(void);
#ifdef EXEC_BACKEND
extern void PgstatCollectorMain(int argc, char *argv[]);
#endif

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +