autovacuum scheduling starvation and frenzy

Started by Jeff Janesover 11 years ago11 messages

jeff.janes@gmail.com

over 11 years ago

In testing 9.4 with some long running tests, I noticed that autovacuum
launcher/worker sometimes goes a bit nuts. It vacuums the same database
repeatedly without respect to the nap time.

As far as I can tell, the behavior is the same in older versions, but I
haven't tested that.

This is my understanding of what is happening:

If you have a database with a large table in it that has just passed
autovacuum_freeze_max_age, all future workers will be funnelled into that
database until the wrap-around completes. But only one of those workers
can actually vacuum the one table which is holding back the frozenxid.
Maybe the 2nd worker to come along will find other useful work to do, but
eventually all the vacuuming that needs doing is already in progress, and
so each worker starts up, gets directed to this database, finds it can't
help, and exits. So all other databases are entirely starved of
autovacuuming for the entire duration of the wrap-around vacuuming of this
one large table.

Also, the launcher decides when to launch the next worker by looking at the
scheduled time of the least-recently-vacuumed database (with the implicit
intention that that is the one that will get chosen to vacuum next). But
since the worker gets redirected to the wrap-around database instead of the
least-recently-vacuumed database, the least-recently-vacuumed database
never gets it schedule updated and always looks like it is chronologically
overdue. That means the launcher keeps launching new workers as fast as
the previous ones exit, ignoring the nap time. So there is one long running
worker actually making progress, plus a frenzy of workers all attacking the
same database, finding that there is nothing they can do.

I think that a database more than autovacuum_freeze_max_age should get
first priority, but only if its next scheduled vacuum time is in the past.
If it can beneficially use more than one vacuum worker, they would usually
accumulate there naturally within a few naptimes iterations[1]you could argue that all other max_workers processes could become pinned down in long running vacuums of other nonrisk databases between the time that the database crosses autovacuum_freeze_max_age (and has its first worker started), and the time its nap time expires and so it becomes eligible for a second one. But that seems like a weak argument, as it could just have easily happened that all of them got pinned down in nonrisk databases a few transactions *before* the database crosses autovacuum_freeze_max_age in the first place.. And if it
can't usefully use more than one worker, don't prevent other databases from
using them.

[1]: you could argue that all other max_workers processes could become pinned down in long running vacuums of other nonrisk databases between the time that the database crosses autovacuum_freeze_max_age (and has its first worker started), and the time its nap time expires and so it becomes eligible for a second one. But that seems like a weak argument, as it could just have easily happened that all of them got pinned down in nonrisk databases a few transactions *before* the database crosses autovacuum_freeze_max_age in the first place.
pinned down in long running vacuums of other nonrisk databases between the
time that the database crosses autovacuum_freeze_max_age (and has its first
worker started), and the time its nap time expires and so it becomes
eligible for a second one. But that seems like a weak argument, as it
could just have easily happened that all of them got pinned down in nonrisk
databases a few transactions *before* the database crosses
autovacuum_freeze_max_age in the first place.

Does this analysis and proposal seem sound?

Cheers,

Jeff

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Jeff Janes (#1)

Re: autovacuum scheduling starvation and frenzy

Jeff Janes wrote:

If you have a database with a large table in it that has just passed
autovacuum_freeze_max_age, all future workers will be funnelled into that
database until the wrap-around completes. But only one of those workers
can actually vacuum the one table which is holding back the frozenxid.
Maybe the 2nd worker to come along will find other useful work to do, but
eventually all the vacuuming that needs doing is already in progress, and
so each worker starts up, gets directed to this database, finds it can't
help, and exits. So all other databases are entirely starved of
autovacuuming for the entire duration of the wrap-around vacuuming of this
one large table.

Bah. Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger. Ignoring the
second one past freeze_max_age seems bad also.

This code is in autovacuum.c, do_start_worker(). Not sure what does
your proposal look like in terms of code. I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared. If there are, move down the
list. The first in the list not skipped is chosen for vacuuming.

(Do we need to consider the situation that all databases were skipped by
the above logic, and if so then perhaps pick up the first DB in the
list?)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jeff Janes

jeff.janes@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#2)

1 attachment(s)

Re: autovacuum scheduling starvation and frenzy

On Thu, May 15, 2014 at 12:55 PM, Alvaro Herrera
<alvherre@2ndquadrant.com>wrote:

Jeff Janes wrote:

If you have a database with a large table in it that has just passed
autovacuum_freeze_max_age, all future workers will be funnelled into that
database until the wrap-around completes. But only one of those workers
can actually vacuum the one table which is holding back the frozenxid.
Maybe the 2nd worker to come along will find other useful work to do, but
eventually all the vacuuming that needs doing is already in progress, and
so each worker starts up, gets directed to this database, finds it can't
help, and exits. So all other databases are entirely starved of
autovacuuming for the entire duration of the wrap-around vacuuming of

this

one large table.

Bah. Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger. Ignoring the
second one past freeze_max_age seems bad also.

I'm not sure how bad that is. If you really do want to get the frozenxid
advanced as soon as possible, it makes sense to focus on one at a time,
rather than splitting the available IO throughput between two of them. So
I wouldn't go out of my way to enable two to run at the same time, nor go
out of my way to prevent it.

If most wrap around scans were done as part of a true emergency it would
make sense to forbid all other vacuums (but only if you also automatically
disabled autovacuum_vacuum_cost_delay as part of the emergency) so as not
to divide up the IO throughput. But most are not emergencies, as
200,000,000 is a long way from 2,000,000,000.

This code is in autovacuum.c, do_start_worker(). Not sure what does
your proposal look like in terms of code.

I wasn't sure either, I was mostly trying the analyze the situation. But I
decided just moving the "skipit" chunk of code to above the wrap-around
code might work for experimental purposes, as attached. It has been
running for a few of hours that way and I no longer see the frenzies
occurring whenever pgbench_history gets vacuumed..

But I can't figure out why we sometimes use adl_next_worker and sometimes
use last_autovac_time, which makes me question how much I really understand
this code.

I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared.

I think we would want to check for one worker that is still running, and at
least one other worker that started and completed since the wraparound
threshold was exceeded. If there are multiple tables in the database that
need full scanning, it would make sense to have multiple workers. But if a
worker already started and finished without increasing the frozenxid and,
another attempt probably won't accomplish much either. But I have no idea
how to do that bookkeeping, or how much of an improvement it would be over
something simpler.

Cheers,

Jeff

Attachments:

vac_wrap_move.patchapplication/octet-stream; name=vac_wrap_move.patchDownload

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
new file mode 100644
index b53cfdb..4b4faad
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
*************** do_start_worker(void)
*** 1169,1174 ****
--- 1169,1206 ----
  		avw_dbase  *tmp = lfirst(cell);
  		dlist_iter	iter;
  
+ 		/*
+ 		 * Also, skip a database that appears on the database list as having
+ 		 * been processed recently (less than autovacuum_naptime seconds ago).
+ 		 * We do this so that we don't select a database which we just
+ 		 * selected, but that pgstat hasn't gotten around to updating the last
+ 		 * autovacuum time yet.
+ 		 */
+ 		skipit = false;
+ 
+ 		dlist_reverse_foreach(iter, &DatabaseList)
+ 		{
+ 			avl_dbase  *dbp = dlist_container(avl_dbase, adl_node, iter.cur);
+ 
+ 			if (dbp->adl_datid == tmp->adw_datid)
+ 			{
+ 				/*
+ 				 * Skip this database if its next_worker value falls between
+ 				 * the current time and the current time plus naptime.
+ 				 */
+ 				if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
+ 												current_time, 0) &&
+ 					!TimestampDifferenceExceeds(current_time,
+ 												dbp->adl_next_worker,
+ 												autovacuum_naptime * 1000))
+ 					skipit = true;
+ 
+ 				break;
+ 			}
+ 		}
+ 		if (skipit)
+ 			continue;
+ 
  		/* Check to see if this one is at risk of wraparound */
  		if (TransactionIdPrecedes(tmp->adw_frozenxid, xidForceLimit))
  		{
*************** do_start_worker(void)
*** 1203,1240 ****
  			continue;
  
  		/*
- 		 * Also, skip a database that appears on the database list as having
- 		 * been processed recently (less than autovacuum_naptime seconds ago).
- 		 * We do this so that we don't select a database which we just
- 		 * selected, but that pgstat hasn't gotten around to updating the last
- 		 * autovacuum time yet.
- 		 */
- 		skipit = false;
- 
- 		dlist_reverse_foreach(iter, &DatabaseList)
- 		{
- 			avl_dbase  *dbp = dlist_container(avl_dbase, adl_node, iter.cur);
- 
- 			if (dbp->adl_datid == tmp->adw_datid)
- 			{
- 				/*
- 				 * Skip this database if its next_worker value falls between
- 				 * the current time and the current time plus naptime.
- 				 */
- 				if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
- 												current_time, 0) &&
- 					!TimestampDifferenceExceeds(current_time,
- 												dbp->adl_next_worker,
- 												autovacuum_naptime * 1000))
- 					skipit = true;
- 
- 				break;
- 			}
- 		}
- 		if (skipit)
- 			continue;
- 
- 		/*
  		 * Remember the db with oldest autovac time.  (If we are here, both
  		 * tmp->entry and db->entry must be non-null.)
  		 */
--- 1235,1240 ----

Jeff Janes

jeff.janes@gmail.com

over 11 years ago

In reply to: Jeff Janes (#3)

Re: autovacuum scheduling starvation and frenzy

On Thu, May 15, 2014 at 4:06 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, May 15, 2014 at 12:55 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Jeff Janes wrote:

If you have a database with a large table in it that has just passed
autovacuum_freeze_max_age, all future workers will be funnelled into
that
database until the wrap-around completes. But only one of those workers
can actually vacuum the one table which is holding back the frozenxid.
Maybe the 2nd worker to come along will find other useful work to do,
but
eventually all the vacuuming that needs doing is already in progress,
and
so each worker starts up, gets directed to this database, finds it can't
help, and exits. So all other databases are entirely starved of
autovacuuming for the entire duration of the wrap-around vacuuming of
this
one large table.

Bah. Of course :-(

Note that if you have two databases in danger of wraparound, the oldest
will always be chosen until it's no longer in danger. Ignoring the
second one past freeze_max_age seems bad also.

I'm not sure how bad that is. If you really do want to get the frozenxid
advanced as soon as possible, it makes sense to focus on one at a time,
rather than splitting the available IO throughput between two of them. So I
wouldn't go out of my way to enable two to run at the same time, nor go out
of my way to prevent it.

If most wrap around scans were done as part of a true emergency it would
make sense to forbid all other vacuums (but only if you also automatically
disabled autovacuum_vacuum_cost_delay as part of the emergency) so as not to
divide up the IO throughput. But most are not emergencies, as 200,000,000
is a long way from 2,000,000,000.

This code is in autovacuum.c, do_start_worker(). Not sure what does
your proposal look like in terms of code.

I wasn't sure either, I was mostly trying the analyze the situation. But I
decided just moving the "skipit" chunk of code to above the wrap-around code
might work for experimental purposes, as attached. It has been running for
a few of hours that way and I no longer see the frenzies occurring whenever
pgbench_history gets vacuumed..

I didn't add this patch to the commitfest, because it was just a point
for discussion and not actually proposed for application. But It
doesn't seem to have provoked much discussion either.

Should I go add this to the next commitfest?

I do see it listed as a resolved item in
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

But I can't find a commit that would resolve it, so does that mean the
resolution was that the behavior was not new in 9.4 and so didn't need
to be fixed for it?

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Jeff Janes (#4)

Re: autovacuum scheduling starvation and frenzy

Jeff Janes <jeff.janes@gmail.com> writes:

I didn't add this patch to the commitfest, because it was just a point
for discussion and not actually proposed for application. But It
doesn't seem to have provoked much discussion either.

Should I go add this to the next commitfest?

I do see it listed as a resolved item in
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

But I can't find a commit that would resolve it, so does that mean the
resolution was that the behavior was not new in 9.4 and so didn't need
to be fixed for it?

It looks to me like Robert added that item to the "open items" page,
but he put it at the bottom --- ie in the "already resolved items"
list:

https://wiki.postgresql.org/index.php?title=PostgreSQL_9.4_Open_Items&diff=22417&oldid=22380

Probably this was a mistake and it should have gone into the still-to-do
list.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Tom Lane (#5)

Re: autovacuum scheduling starvation and frenzy

On Mon, Jun 23, 2014 at 7:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

I didn't add this patch to the commitfest, because it was just a point
for discussion and not actually proposed for application. But It
doesn't seem to have provoked much discussion either.

Should I go add this to the next commitfest?

I do see it listed as a resolved item in
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items

But I can't find a commit that would resolve it, so does that mean the
resolution was that the behavior was not new in 9.4 and so didn't need
to be fixed for it?

It looks to me like Robert added that item to the "open items" page,
but he put it at the bottom --- ie in the "already resolved items"
list:

https://wiki.postgresql.org/index.php?title=PostgreSQL_9.4_Open_Items&diff=22417&oldid=22380

Probably this was a mistake and it should have gone into the still-to-do
list.

Yeah. Oops.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Jeff Janes (#3)

1 attachment(s)

Re: autovacuum scheduling starvation and frenzy

Jeff Janes wrote:

I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared.

I think we would want to check for one worker that is still running, and at
least one other worker that started and completed since the wraparound
threshold was exceeded. If there are multiple tables in the database that
need full scanning, it would make sense to have multiple workers. But if a
worker already started and finished without increasing the frozenxid and,
another attempt probably won't accomplish much either. But I have no idea
how to do that bookkeeping, or how much of an improvement it would be over
something simpler.

How about something like this:

* if autovacuum is disabled, then don't check these conditions; the only
reason we're in do_start_worker() in that case is that somebody
signalled postmaster that some database needs a for-wraparound emergency
vacuum.

* if autovacuum is on, and the database was processed less than
autovac_naptime/2 ago, and there are no workers running in that database
now, then ignore the database.

Otherwise, consider it for xid-wraparound vacuuming. So if we launched
a worker recently, but it already finished, we would start another one.
(If the worker finished, the database should not be in need of a
for-wraparound vacuum again, so this seems sensible). Also, we give
priority to a database in danger sooner than the full autovac_naptime
period; not immediately after the previous worker started, which should
give room for other databases to be processed.

The attached patch implements that. I only tested it on HEAD, but
AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to
9.2. Given the lack of complaints, I'm unsure about backpatching
further back than 9.3 anyway.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

vac_wrap.patchtext/x-diff; charset=us-asciiDownload

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 1069,1074 **** db_comparator(const void *a, const void *b)
--- 1069,1120 ----
  }
  
  /*
+  * Are there any running workers in the given database?
+  */
+ static bool
+ db_has_running_workers(avl_dbase *db)
+ {
+ 	bool		hasworkers = false;
+ 	dlist_iter	iter;
+ 
+ 	/* Allow for a NULL avl_dbase entry */
+ 	if (!db)
+ 		return false;
+ 
+ 	LWLockAcquire(AutovacuumLock, LW_SHARED);
+ 	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
+ 	{
+ 		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+ 
+ 		if (worker->wi_dboid == db->adl_datid)
+ 		{
+ 			hasworkers = true;
+ 			break;
+ 		}
+ 	}
+ 	LWLockRelease(AutovacuumLock);
+ 
+ 	return hasworkers;
+ }
+ 
+ /*
+  * Was this database processed less than the given number of milliseconds ago?
+  */
+ static bool
+ db_was_recently_processed(avl_dbase *db, TimestampTz current_time, int ms)
+ {
+ 	/* Allow for a NULL avl_dbase entry */
+ 	if (!db)
+ 		return false;
+ 
+ 	if (!TimestampDifferenceExceeds(db->adl_next_worker, current_time, 0) &&
+ 		!TimestampDifferenceExceeds(current_time, db->adl_next_worker, ms))
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ /*
   * do_start_worker
   *
   * Bare-bones procedure for starting an autovacuum worker from the launcher.
***************
*** 1090,1096 **** do_start_worker(void)
  	bool		for_multi_wrap;
  	avw_dbase  *avdb;
  	TimestampTz current_time;
! 	bool		skipit = false;
  	Oid			retval = InvalidOid;
  	MemoryContext tmpcxt,
  				oldcxt;
--- 1136,1142 ----
  	bool		for_multi_wrap;
  	avw_dbase  *avdb;
  	TimestampTz current_time;
! 	bool		skipped = false;
  	Oid			retval = InvalidOid;
  	MemoryContext tmpcxt,
  				oldcxt;
***************
*** 1118,1124 **** do_start_worker(void)
  	/* use fresh stats */
  	autovac_refresh_stats();
  
! 	/* Get a list of databases */
  	dblist = get_database_list();
  
  	/*
--- 1164,1170 ----
  	/* use fresh stats */
  	autovac_refresh_stats();
  
! 	/* Get a list of databases in pg_database */
  	dblist = get_database_list();
  
  	/*
***************
*** 1128,1135 **** do_start_worker(void)
  	 */
  	recentXid = ReadNewTransactionId();
  	xidForceLimit = recentXid - autovacuum_freeze_max_age;
! 	/* ensure it's a "normal" XID, else TransactionIdPrecedes misbehaves */
! 	/* this can cause the limit to go backwards by 3, but that's OK */
  	if (xidForceLimit < FirstNormalTransactionId)
  		xidForceLimit -= FirstNormalTransactionId;
  
--- 1174,1183 ----
  	 */
  	recentXid = ReadNewTransactionId();
  	xidForceLimit = recentXid - autovacuum_freeze_max_age;
! 	/*
! 	 * Ensure it's a "normal" XID, else TransactionIdPrecedes misbehaves.  This
! 	 * can cause the limit to go backwards by 3, but that's OK.
! 	 */
  	if (xidForceLimit < FirstNormalTransactionId)
  		xidForceLimit -= FirstNormalTransactionId;
  
***************
*** 1148,1153 **** do_start_worker(void)
--- 1196,1209 ----
  	 * if any is in MultiXactId wraparound.  Note that those in Xid wraparound
  	 * danger are given more priority than those in multi wraparound danger.
  	 *
+ 	 * (However, we ignore databases in danger of Xid or multixact wraparound
+ 	 * if a worker has recently been started in them and it is still working.
+ 	 * The rationale for this is that other databases might need attention even
+ 	 * if they are not in danger of wraparound, and starting another worker too
+ 	 * soon after the first one would give no benefit.  We disable this check
+ 	 * when autovacuum is nominally disabled, though, because in that mode the
+ 	 * only reason we're here is to process endangered databases.)
+ 	 *
  	 * Note that a database with no stats entry is not considered, except for
  	 * Xid wraparound purposes.  The theory is that if no one has ever
  	 * connected to it since the stats were last initialized, it doesn't need
***************
*** 1167,1192 **** do_start_worker(void)
  	foreach(cell, dblist)
  	{
  		avw_dbase  *tmp = lfirst(cell);
  		dlist_iter	iter;
  
! 		/* Check to see if this one is at risk of wraparound */
! 		if (TransactionIdPrecedes(tmp->adw_frozenxid, xidForceLimit))
  		{
  			if (avdb == NULL ||
  				TransactionIdPrecedes(tmp->adw_frozenxid,
  									  avdb->adw_frozenxid))
  				avdb = tmp;
! 			for_xid_wrap = true;
  			continue;
  		}
  		else if (for_xid_wrap)
  			continue;			/* ignore not-at-risk DBs */
! 		else if (MultiXactIdPrecedes(tmp->adw_minmulti, multiForceLimit))
  		{
  			if (avdb == NULL ||
  				MultiXactIdPrecedes(tmp->adw_minmulti, avdb->adw_minmulti))
  				avdb = tmp;
! 			for_multi_wrap = true;
  			continue;
  		}
  		else if (for_multi_wrap)
--- 1223,1273 ----
  	foreach(cell, dblist)
  	{
  		avw_dbase  *tmp = lfirst(cell);
+ 		avl_dbase  *dbp = NULL;
  		dlist_iter	iter;
  
! 		/* Find this database's entry in the launcher's list, if any */
! 		dlist_reverse_foreach(iter, &DatabaseList)
! 		{
! 			avl_dbase  *db = dlist_container(avl_dbase, adl_node, iter.cur);
! 
! 			if (db->adl_datid == tmp->adw_datid)
! 			{
! 				dbp = db;
! 				break;
! 			}
! 		}
! 
! 		/* Check to see if this database is at risk of wraparound */
! 		if (TransactionIdPrecedes(tmp->adw_frozenxid, xidForceLimit) &&
! 			(!AutoVacuumingActive() ||
! 			 !db_was_recently_processed(dbp, current_time,
! 										autovacuum_naptime * 1000 / 2) ||
! 			 !db_has_running_workers(dbp)))
  		{
  			if (avdb == NULL ||
  				TransactionIdPrecedes(tmp->adw_frozenxid,
  									  avdb->adw_frozenxid))
+ 			{
  				avdb = tmp;
! 				for_xid_wrap = true;
! 			}
  			continue;
  		}
  		else if (for_xid_wrap)
  			continue;			/* ignore not-at-risk DBs */
! 		else if (MultiXactIdPrecedes(tmp->adw_minmulti, multiForceLimit) &&
! 				 (!AutoVacuumingActive() ||
! 				  !db_was_recently_processed(dbp, current_time,
! 											 autovacuum_naptime * 1000 / 2) ||
! 				  !db_has_running_workers(dbp)))
  		{
  			if (avdb == NULL ||
  				MultiXactIdPrecedes(tmp->adw_minmulti, avdb->adw_minmulti))
+ 			{
  				avdb = tmp;
! 				for_multi_wrap = true;
! 			}
  			continue;
  		}
  		else if (for_multi_wrap)
***************
*** 1208,1238 **** do_start_worker(void)
  		 * We do this so that we don't select a database which we just
  		 * selected, but that pgstat hasn't gotten around to updating the last
  		 * autovacuum time yet.
  		 */
! 		skipit = false;
! 
! 		dlist_reverse_foreach(iter, &DatabaseList)
  		{
! 			avl_dbase  *dbp = dlist_container(avl_dbase, adl_node, iter.cur);
! 
! 			if (dbp->adl_datid == tmp->adw_datid)
! 			{
! 				/*
! 				 * Skip this database if its next_worker value falls between
! 				 * the current time and the current time plus naptime.
! 				 */
! 				if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
! 												current_time, 0) &&
! 					!TimestampDifferenceExceeds(current_time,
! 												dbp->adl_next_worker,
! 												autovacuum_naptime * 1000))
! 					skipit = true;
! 
! 				break;
! 			}
! 		}
! 		if (skipit)
  			continue;
  
  		/*
  		 * Remember the db with oldest autovac time.  (If we are here, both
--- 1289,1304 ----
  		 * We do this so that we don't select a database which we just
  		 * selected, but that pgstat hasn't gotten around to updating the last
  		 * autovacuum time yet.
+ 		 *
+ 		 * Exact criterion is to skip if its next_worker value falls between
+ 		 * the current time and the current time plus naptime.
  		 */
! 		if (db_was_recently_processed(dbp, current_time,
! 									  autovacuum_naptime * 1000))
  		{
! 			skipped = true;
  			continue;
+ 		}
  
  		/*
  		 * Remember the db with oldest autovac time.  (If we are here, both
***************
*** 1270,1276 **** do_start_worker(void)
  
  		retval = avdb->adw_datid;
  	}
! 	else if (skipit)
  	{
  		/*
  		 * If we skipped all databases on the list, rebuild it, because it
--- 1336,1342 ----
  
  		retval = avdb->adw_datid;
  	}
! 	else if (skipped)
  	{
  		/*
  		 * If we skipped all databases on the list, rebuild it, because it

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#7)

Re: autovacuum scheduling starvation and frenzy

Alvaro Herrera wrote:

The attached patch implements that. I only tested it on HEAD, but
AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to
9.2. Given the lack of complaints, I'm unsure about backpatching
further back than 9.3 anyway.

FWIW my intention is to make sure this patch is in 9.4beta3.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#7)

Re: autovacuum scheduling starvation and frenzy

On Tue, Sep 30, 2014 at 5:59 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Jeff Janes wrote:

I think that instead of
trying to get a single target database in that foreach loop, we could
try to build a prioritized list (in-wraparound-danger first, then
in-multixid-wraparound danger, then the one with the oldest autovac time
of all the ones that remain); then recheck the wrap-around condition by
seeing whether there are other workers in that database that started
after the wraparound condition appeared.

I think we would want to check for one worker that is still running, and at
least one other worker that started and completed since the wraparound
threshold was exceeded. If there are multiple tables in the database that
need full scanning, it would make sense to have multiple workers. But if a
worker already started and finished without increasing the frozenxid and,
another attempt probably won't accomplish much either. But I have no idea
how to do that bookkeeping, or how much of an improvement it would be over
something simpler.

How about something like this:

* if autovacuum is disabled, then don't check these conditions; the only
reason we're in do_start_worker() in that case is that somebody
signalled postmaster that some database needs a for-wraparound emergency
vacuum.

* if autovacuum is on, and the database was processed less than
autovac_naptime/2 ago, and there are no workers running in that database
now, then ignore the database.

Otherwise, consider it for xid-wraparound vacuuming. So if we launched
a worker recently, but it already finished, we would start another one.
(If the worker finished, the database should not be in need of a
for-wraparound vacuum again, so this seems sensible). Also, we give
priority to a database in danger sooner than the full autovac_naptime
period; not immediately after the previous worker started, which should
give room for other databases to be processed.

The attached patch implements that. I only tested it on HEAD, but
AFAICS it applies cleanly to 9.4 and 9.3; fairly sure it won't apply to
9.2. Given the lack of complaints, I'm unsure about backpatching
further back than 9.3 anyway.

This kind of seems like throwing darts at the wall. It could be
better if we are right to skip the database already being vacuumed for
wraparound, or worse if we're not.

I'm not sure that we should do this at all, or at least not without
testing it extensively first. We could easily shoot ourselves in the
foot.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#9)

Re: autovacuum scheduling starvation and frenzy

Robert Haas wrote:

This kind of seems like throwing darts at the wall. It could be
better if we are right to skip the database already being vacuumed for
wraparound, or worse if we're not.

Well, it only skips the DB for half the naptime interval, so that other
databases have a chance to be chosen before that. If you set up a
nonsensical interval such as one day, this might be problematic.

(I'm not sure I understand the darts analogy.)

Maybe instead of some interval we could have a flag that alternates
between on and off: let one other database be chosen, then the one in
danger, then some other database again. But if you have large numbers
of databases, this isn't a very solution; you only waste half the
workers rather than all of them .. meh.

Here's another idea: have a counter of the number of tables that are in
danger of xid/multixact wraparound; only let that many workers process
the database in a row. Of course, the problem is how to determine how
many tables are in danger when we haven't even connected to the database
in the first place. We could try to store a counter in pgstats, ugh.
Or have the first for-wraparound worker store a number in shared memory
which launcher can read. Double ugh.

I'm not sure that we should do this at all, or at least not without
testing it extensively first. We could easily shoot ourselves in the
foot.

Well, we need to do *something*, because having workers directed towards
a database on which they can't do any good causes problems too -- other
databases accumulate bloat in the meantime.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#10)

Re: autovacuum scheduling starvation and frenzy

On Wed, Oct 1, 2014 at 11:44 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

This kind of seems like throwing darts at the wall. It could be
better if we are right to skip the database already being vacuumed for
wraparound, or worse if we're not.

Well, it only skips the DB for half the naptime interval, so that other
databases have a chance to be chosen before that. If you set up a
nonsensical interval such as one day, this might be problematic.

(I'm not sure I understand the darts analogy.)

I guess I meant: this seems pretty hit-or-miss. I don't see why we
should expect it to be better than what we have now. Sure, maybe
there's a table in some other database that needs to be vacuumed for
bloat more urgently than a table in the wraparound database needs to
be vacuumed to prevent XID wraparound. But the reverse could be true
also - in which case your patch could cause a cluster that would
merely have bloated to instead shut down.

The way to really know would be for the AV launcher to have knowledge
of how many tables there are in each database that are beyond the
wraparound theshold and not already been vacuumed. Then we could skip
wraparound databases where that number is 0, and give priority to
those where it isn't. I guess this is more or less what you said in
the portion of your email I'm not quoting here, but like you I'm not
quite sure how to implement that. Still, I'm reluctant to just go
change the behavior; I think it's optimistic to think that any
algorithm for making decisions without real knowledge will be better
than any other.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers