autovacuum launcher eating too much CPU

Started by Alvaro Herreraover 18 years ago9 messages
#1Alvaro Herrera
alvherre@commandprompt.com
1 attachment(s)

Hi,

Darcy Buskermolen noticed that when one has many databases, the autovac
launcher starts eating too much CPU.

I tried it here with 200 databases and indeed it does seem to eat its
share. Even with the default naptime, which I wouldn't have thought
that was too high (it does make the launcher wake up about three times a
second though).

I'm looking at a profile and I can't seem to make much sense out of it.
It seems to me like the problem is not autovac itself, but rather the
pgstat code that reads the stat file from disk. Of course, autovac does
need to read the file fairly regularly.

Here is the top lines of gprof output.

Comments? Is there something here that needs fixing?

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Investigaci�n es lo que hago cuando no s� lo que estoy haciendo"
(Wernher von Braun)

Attachments:

profile.outtext/plain; charset=us-asciiDownload
#2Alvaro Herrera
alvherre@commandprompt.com
In reply to: Alvaro Herrera (#1)
1 attachment(s)
Re: autovacuum launcher eating too much CPU

Alvaro Herrera wrote:

Darcy Buskermolen noticed that when one has many databases, the autovac
launcher starts eating too much CPU.

I tried it here with 200 databases and indeed it does seem to eat its
share. Even with the default naptime, which I wouldn't have thought
that was too high (it does make the launcher wake up about three times a
second though).

This patch does not solve the whole problem but it alleviates it a bit
by throttling pgstat reads. One problem with it is that the interval
for this increases:

/*
* Check whether pgstat data still says we need to vacuum this table.
* It could have changed if something else processed the table while we
* weren't looking.
*
* FIXME we ignore the possibility that the table was finished being
* vacuumed in the last 500ms (PGSTAT_STAT_INTERVAL). This is a bug.
*/
MemoryContextSwitchTo(AutovacMemCxt);
tab = table_recheck_autovac(relid);

which could be a problem in itself, by causing unnecessary vacuums.

Opinions?

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"In Europe they call me Niklaus Wirth; in the US they call me Nickel's worth.
That's because in Europe they call me by name, and in the US by value!"

Attachments:

autovac-throttle-pgstatread.patchtext/x-diff; charset=us-asciiDownload
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.58
diff -c -p -r1.58 autovacuum.c
*** src/backend/postmaster/autovacuum.c	12 Sep 2007 22:14:59 -0000	1.58
--- src/backend/postmaster/autovacuum.c	13 Sep 2007 16:57:59 -0000
*************** static void avl_sighup_handler(SIGNAL_AR
*** 291,296 ****
--- 291,297 ----
  static void avl_sigusr1_handler(SIGNAL_ARGS);
  static void avl_sigterm_handler(SIGNAL_ARGS);
  static void avl_quickdie(SIGNAL_ARGS);
+ static void autovac_refresh_stats(void);
  
  
  
*************** AutoVacLauncherMain(int argc, char *argv
*** 488,494 ****
  		DatabaseListCxt = NULL;
  		DatabaseList = NULL;
  
! 		/* Make sure pgstat also considers our stat data as gone */
  		pgstat_clear_snapshot();
  
  		/* Now we can allow interrupts again */
--- 489,498 ----
  		DatabaseListCxt = NULL;
  		DatabaseList = NULL;
  
! 		/*
! 		 * Make sure pgstat also considers our stat data as gone.  Note: we
! 		 * musn't use autovac_refresh_stats here.
! 		 */
  		pgstat_clear_snapshot();
  
  		/* Now we can allow interrupts again */
*************** rebuild_database_list(Oid newdb)
*** 836,842 ****
  	HTAB	   *dbhash;
  
  	/* use fresh stats */
! 	pgstat_clear_snapshot();
  
  	newcxt = AllocSetContextCreate(AutovacMemCxt,
  								   "AV dblist",
--- 840,846 ----
  	HTAB	   *dbhash;
  
  	/* use fresh stats */
! 	autovac_refresh_stats();
  
  	newcxt = AllocSetContextCreate(AutovacMemCxt,
  								   "AV dblist",
*************** do_start_worker(void)
*** 1063,1069 ****
  	oldcxt = MemoryContextSwitchTo(tmpcxt);
  
  	/* use fresh stats */
! 	pgstat_clear_snapshot();
  
  	/* Get a list of databases */
  	dblist = get_database_list();
--- 1067,1073 ----
  	oldcxt = MemoryContextSwitchTo(tmpcxt);
  
  	/* use fresh stats */
! 	autovac_refresh_stats();
  
  	/* Get a list of databases */
  	dblist = get_database_list();
*************** table_recheck_autovac(Oid relid)
*** 2258,2264 ****
  	PgStat_StatDBEntry *dbentry;
  
  	/* use fresh stats */
! 	pgstat_clear_snapshot();
  
  	shared = pgstat_fetch_stat_dbentry(InvalidOid);
  	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
--- 2262,2268 ----
  	PgStat_StatDBEntry *dbentry;
  
  	/* use fresh stats */
! 	autovac_refresh_stats();
  
  	shared = pgstat_fetch_stat_dbentry(InvalidOid);
  	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
*************** AutoVacuumShmemInit(void)
*** 2734,2736 ****
--- 2738,2759 ----
  	else
  		Assert(found);
  }
+ 
+ /*
+  * Refresh pgstats data in an autovacuum process, at most every 500 ms.  This
+  * is to avoid rereading the pgstats files too many times in quick succession.
+  */
+ static void
+ autovac_refresh_stats(void)
+ {
+ 	static TimestampTz last_read = 0;
+ 	TimestampTz current_time;
+ 
+ 	current_time = GetCurrentTimestamp();
+ 
+ 	if (!TimestampDifferenceExceeds(last_read, current_time, 500))
+ 		return;
+ 
+ 	pgstat_clear_snapshot();
+ 	last_read = current_time;
+ }
#3Darcy Buskermolen
darcyb@commandprompt.com
In reply to: Alvaro Herrera (#1)
Re: autovacuum launcher eating too much CPU

On Thursday 13 September 2007 09:16:52 Alvaro Herrera wrote:

Hi,

Darcy Buskermolen noticed that when one has many databases, the autovac
launcher starts eating too much CPU.

Don't forget the memory leak as well. after 3 or 4 days of running I end up
with a 2GB+ AVL..

I tried it here with 200 databases and indeed it does seem to eat its
share. Even with the default naptime, which I wouldn't have thought
that was too high (it does make the launcher wake up about three times a
second though).

I'm looking at a profile and I can't seem to make much sense out of it.
It seems to me like the problem is not autovac itself, but rather the
pgstat code that reads the stat file from disk. Of course, autovac does
need to read the file fairly regularly.

Here is the top lines of gprof output.

Comments? Is there something here that needs fixing?

--

Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
http://www.commandprompt.com/

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Darcy Buskermolen (#3)
Re: autovacuum launcher eating too much CPU

Darcy Buskermolen wrote:

On Thursday 13 September 2007 09:16:52 Alvaro Herrera wrote:

Hi,

Darcy Buskermolen noticed that when one has many databases, the autovac
launcher starts eating too much CPU.

Don't forget the memory leak as well. after 3 or 4 days of running I end up
with a 2GB+ AVL..

Huh, sorry for not letting you know, I already fixed that :-) (Please
grab the latest CVS HEAD and confirm.)

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Entristecido, Wutra (canci�n de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"

#5Alvaro Herrera
alvherre@commandprompt.com
In reply to: Alvaro Herrera (#4)
1 attachment(s)
Re: autovacuum launcher eating too much CPU

Alvaro Herrera wrote:

Darcy Buskermolen wrote:

On Thursday 13 September 2007 09:16:52 Alvaro Herrera wrote:

Hi,

Darcy Buskermolen noticed that when one has many databases, the autovac
launcher starts eating too much CPU.

Don't forget the memory leak as well. after 3 or 4 days of running I end up
with a 2GB+ AVL..

Huh, sorry for not letting you know, I already fixed that :-) (Please
grab the latest CVS HEAD and confirm.)

Darcy, please also apply the following patch and see if it reduces the
CPU consumption to a reasonable level.

What this patch does is keep the pgstats data for 1 second in the
autovac launcher. The idea is to avoid reading the data too frequently.
I coded it so that it doesn't affect the worker, because it would make
the table recheck code less effective.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"MySQL is a toy compared to PostgreSQL." (Randal L. Schwartz)
(http://archives.postgresql.org/pgsql-general/2005-07/msg00517.php)

Attachments:

autovac-throttle-pgstatread.patchtext/x-diff; charset=us-asciiDownload
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.58
diff -c -p -r1.58 autovacuum.c
*** src/backend/postmaster/autovacuum.c	12 Sep 2007 22:14:59 -0000	1.58
--- src/backend/postmaster/autovacuum.c	13 Sep 2007 22:13:29 -0000
*************** int			autovacuum_vac_cost_limit;
*** 116,121 ****
--- 116,124 ----
  
  int			Log_autovacuum = -1;
  
+ /* how long to keep pgstat data in the launcher, in milliseconds */
+ #define AUTOVAC_STATS_CACHE 1000
+ 
  
  /* Flags to tell if we are in an autovacuum process */
  static bool am_autovacuum_launcher = false;
*************** static void avl_sighup_handler(SIGNAL_AR
*** 291,296 ****
--- 294,300 ----
  static void avl_sigusr1_handler(SIGNAL_ARGS);
  static void avl_sigterm_handler(SIGNAL_ARGS);
  static void avl_quickdie(SIGNAL_ARGS);
+ static void autovac_refresh_stats(void);
  
  
  
*************** AutoVacLauncherMain(int argc, char *argv
*** 488,494 ****
  		DatabaseListCxt = NULL;
  		DatabaseList = NULL;
  
! 		/* Make sure pgstat also considers our stat data as gone */
  		pgstat_clear_snapshot();
  
  		/* Now we can allow interrupts again */
--- 492,501 ----
  		DatabaseListCxt = NULL;
  		DatabaseList = NULL;
  
! 		/*
! 		 * Make sure pgstat also considers our stat data as gone.  Note: we
! 		 * mustn't use autovac_refresh_stats here.
! 		 */
  		pgstat_clear_snapshot();
  
  		/* Now we can allow interrupts again */
*************** rebuild_database_list(Oid newdb)
*** 836,842 ****
  	HTAB	   *dbhash;
  
  	/* use fresh stats */
! 	pgstat_clear_snapshot();
  
  	newcxt = AllocSetContextCreate(AutovacMemCxt,
  								   "AV dblist",
--- 843,849 ----
  	HTAB	   *dbhash;
  
  	/* use fresh stats */
! 	autovac_refresh_stats();
  
  	newcxt = AllocSetContextCreate(AutovacMemCxt,
  								   "AV dblist",
*************** do_start_worker(void)
*** 1063,1069 ****
  	oldcxt = MemoryContextSwitchTo(tmpcxt);
  
  	/* use fresh stats */
! 	pgstat_clear_snapshot();
  
  	/* Get a list of databases */
  	dblist = get_database_list();
--- 1070,1076 ----
  	oldcxt = MemoryContextSwitchTo(tmpcxt);
  
  	/* use fresh stats */
! 	autovac_refresh_stats();
  
  	/* Get a list of databases */
  	dblist = get_database_list();
*************** do_start_worker(void)
*** 1106,1114 ****
  		avw_dbase  *tmp = lfirst(cell);
  		Dlelem	   *elem;
  
- 		/* Find pgstat entry if any */
- 		tmp->adw_entry = pgstat_fetch_stat_dbentry(tmp->adw_datid);
- 
  		/* Check to see if this one is at risk of wraparound */
  		if (TransactionIdPrecedes(tmp->adw_frozenxid, xidForceLimit))
  		{
--- 1113,1118 ----
*************** do_start_worker(void)
*** 1121,1129 ****
  		else if (for_xid_wrap)
  			continue;			/* ignore not-at-risk DBs */
  
  		/*
! 		 * Otherwise, skip a database with no pgstat entry; it means it
! 		 * hasn't seen any activity.
  		 */
  		if (!tmp->adw_entry)
  			continue;
--- 1125,1136 ----
  		else if (for_xid_wrap)
  			continue;			/* ignore not-at-risk DBs */
  
+ 		/* Find pgstat entry if any */
+ 		tmp->adw_entry = pgstat_fetch_stat_dbentry(tmp->adw_datid);
+ 
  		/*
! 		 * Skip a database with no pgstat entry; it means it hasn't seen any
! 		 * activity.
  		 */
  		if (!tmp->adw_entry)
  			continue;
*************** table_recheck_autovac(Oid relid)
*** 2258,2264 ****
  	PgStat_StatDBEntry *dbentry;
  
  	/* use fresh stats */
! 	pgstat_clear_snapshot();
  
  	shared = pgstat_fetch_stat_dbentry(InvalidOid);
  	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
--- 2265,2271 ----
  	PgStat_StatDBEntry *dbentry;
  
  	/* use fresh stats */
! 	autovac_refresh_stats();
  
  	shared = pgstat_fetch_stat_dbentry(InvalidOid);
  	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
*************** AutoVacuumShmemInit(void)
*** 2734,2736 ****
--- 2741,2769 ----
  	else
  		Assert(found);
  }
+ 
+ /*
+  * Refresh pgstats data in the autovacuum launcher process, at most every 500
+  * ms.  This is to avoid rereading the pgstats files too many times in quick
+  * succession.  Note: we don't do this in the worker, as it would be
+  * counterproductive.
+  */
+ static void
+ autovac_refresh_stats(void)
+ {
+ 	if (IsAutoVacuumLauncherProcess())
+ 	{
+ 		static TimestampTz	last_read = 0;
+ 		TimestampTz			current_time;
+ 
+ 		current_time = GetCurrentTimestamp();
+ 
+ 		if (!TimestampDifferenceExceeds(last_read, current_time,
+ 										AUTOVAC_STATS_CACHE))
+ 			return;
+ 
+ 		last_read = current_time;
+ 	}
+ 
+ 	pgstat_clear_snapshot();
+ }
#6Darcy Buskermolen
darcyb@commandprompt.com
In reply to: Alvaro Herrera (#5)
Re: autovacuum launcher eating too much CPU

On September 14, 2007 06:36 am, Alvaro Herrera wrote:

Alvaro Herrera wrote:

Darcy Buskermolen wrote:

On Thursday 13 September 2007 09:16:52 Alvaro Herrera wrote:

Hi,

Darcy Buskermolen noticed that when one has many databases, the
autovac launcher starts eating too much CPU.

Don't forget the memory leak as well. after 3 or 4 days of running I
end up with a 2GB+ AVL..

Huh, sorry for not letting you know, I already fixed that :-) (Please
grab the latest CVS HEAD and confirm.)

Ok that looks much better, after running it for 8ish hours I'm not seeing any
of the previous footprint growth.

Darcy, please also apply the following patch and see if it reduces the
CPU consumption to a reasonable level.

This is looking much better now too, it's brought the AVL down to near 0% CPU
usage.

What this patch does is keep the pgstats data for 1 second in the
autovac launcher. The idea is to avoid reading the data too frequently.
I coded it so that it doesn't affect the worker, because it would make
the table recheck code less effective.

--
Darcy Buskermolen
Command Prompt, Inc.
+1.503.667.4564 X 102
http://www.commandprompt.com/
PostgreSQL solutions since 1997

#7Alvaro Herrera
alvherre@commandprompt.com
In reply to: Darcy Buskermolen (#6)
Re: autovacuum launcher eating too much CPU

Darcy Buskermolen wrote:

On September 14, 2007 06:36 am, Alvaro Herrera wrote:

Darcy, please also apply the following patch and see if it reduces the
CPU consumption to a reasonable level.

This is looking much better now too, it's brought the AVL down to near 0% CPU
usage.

Thanks, applied. I still feel CPU usage is somewhat excessive but I
don't think there's much to be done about it. Maybe I'm just testing
with too many databases.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"Saca el libro que tu religi�n considere como el indicado para encontrar la
oraci�n que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Ducl�s)

#8Darcy Buskermolen
darcyb@commandprompt.com
In reply to: Alvaro Herrera (#7)
Re: autovacuum launcher eating too much CPU

On September 23, 2007 09:12 pm, Alvaro Herrera wrote:

Darcy Buskermolen wrote:

On September 14, 2007 06:36 am, Alvaro Herrera wrote:

Darcy, please also apply the following patch and see if it reduces the
CPU consumption to a reasonable level.

This is looking much better now too, it's brought the AVL down to near 0%
CPU usage.

Thanks, applied. I still feel CPU usage is somewhat excessive but I
don't think there's much to be done about it. Maybe I'm just testing
with too many databases.

My findings were against 83 DB's

--
Darcy Buskermolen
Command Prompt, Inc.
+1.503.667.4564 X 102
http://www.commandprompt.com/
PostgreSQL solutions since 1997

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Darcy Buskermolen (#8)
Re: autovacuum launcher eating too much CPU

Darcy Buskermolen wrote:

On September 23, 2007 09:12 pm, Alvaro Herrera wrote:

Darcy Buskermolen wrote:

On September 14, 2007 06:36 am, Alvaro Herrera wrote:

Darcy, please also apply the following patch and see if it reduces the
CPU consumption to a reasonable level.

This is looking much better now too, it's brought the AVL down to near 0%
CPU usage.

Thanks, applied. I still feel CPU usage is somewhat excessive but I
don't think there's much to be done about it. Maybe I'm just testing
with too many databases.

My findings were against 83 DB's

I was testing with 300 and naptime=10s.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.