Per table autovacuum vacuum cost limit behaviour strange

kommi.haribabu@gmail.com

almost 12 years ago

In reply to: Mark Kirkwood (#1)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Wed, Feb 12, 2014 at 12:32 PM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

A while back we were discussing rapid space bloat of tables under certain
circumstances. One further case I am examining is a highly volatile single
table, and how to tame its space blowout.

I've got a nice simple example (attached). Making use of pgbench to run it
as usual ():

$ createdb cache
$ psql cache < schema.sql
$ pgbench -n -c8 -T300 -f volatile0.sql cache

...causes the table (imaginatively named 'cache0') to grow several GB with
default autovacuum parameters. Some minimal changes will rein in the growth
to about 100MB:

$ grep -e naptime -e autovacuum_vacuum_cost_limit postgresql.conf
autovacuum_naptime = 5s
autovacuum_vacuum_cost_limit = 10000

However the cost_limit setting is likely to be way too aggressive
globally. No problem I figured, I'd leave it at the default (200) and use
ALTER TABLE to change it for *just* the 'cache0' table:

cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_limit=10000);

However re-running the pgbench test results in several GB worth of space
used by this table. Hmmm - looks like setting this parameter per table does
not work how I expected. Looking at src/backend/postmaster/autovacuum.c I
see some balancing calculations in autovac_balance_cost() and
AutoVacuumUpdateDelay(), the effect which seems to be (after adding some
debugging elogs) to reset the actual effective cost_limit back to 200 for
this table: viz (rel 16387 is cache0):

LOG: autovac_balance_cost(pid=24058 db=16384, rel=16387, cost_limit=200,
cost_limit_base=10000, cost_delay=20)
LOG: autovac_update_delay(pid=24058 db=16384, rel=16387, cost_limit=200,
cost_delay=20)

Is this working as intended? I did wonder if it was an artifact of only
having 1 table (creating another one made no difference)...or perhaps only
1 active worker... I found I had to lobotomize the balancing calc by doing:

cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_delay=0);

before I got the same effect as just setting the cost_limit globally. I'm
now a bit confused about whether I understand how setting cost_limit and
cost_delay via ALTER TABLE works (or in fact if it is working properly for
that matter).

When I go through the code for checking the same, I got the following
behavior.

The default values of vacuum parameters - cost_limit - 200 and cost_delay -
0
The default values of auto vacuum parameters - cost_limit - (-1) and
cost_delay - 20ms.

1. User is not provided any vacuum parameters to the table, so the vacuum
options for the table are cost_limit - 200 and cost_delay - 20
2. User is provided cost_limit as 1000 to the table, so the vacuum options
for the table are cost_limit - 1000 and cost_delay - 20

For the above two cases, the "autovac_balance_cost" function sets the cost
parameters as cost_limit - 200 and cost_delay - 20.

3. User is provided cost_limit as 1000 and cost_delay as 10 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
10

This case the cost_limit - 100 and cost_delay - 10.

4. User is provided cost_limit as 1000 and cost_delay as 100 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
100

This case the cost_limit - 1000 and cost_delay - 100

From the above observations, The cost parameters of vacuum are not working
as they specified.
please correct me if anything wrong in my observation.

Regards,
Hari Babu
Fujitsu Australia

mark.kirkwood@catalyst.net.nz

almost 12 years ago

In reply to: Haribabu Kommi (#2)

1 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 13/02/14 17:13, Haribabu Kommi wrote:

On Wed, Feb 12, 2014 at 12:32 PM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

A while back we were discussing rapid space bloat of tables under certain
circumstances. One further case I am examining is a highly volatile single
table, and how to tame its space blowout.

I've got a nice simple example (attached). Making use of pgbench to run it
as usual ():

$ createdb cache
$ psql cache < schema.sql
$ pgbench -n -c8 -T300 -f volatile0.sql cache

...causes the table (imaginatively named 'cache0') to grow several GB with
default autovacuum parameters. Some minimal changes will rein in the growth
to about 100MB:

$ grep -e naptime -e autovacuum_vacuum_cost_limit postgresql.conf
autovacuum_naptime = 5s
autovacuum_vacuum_cost_limit = 10000

However the cost_limit setting is likely to be way too aggressive
globally. No problem I figured, I'd leave it at the default (200) and use
ALTER TABLE to change it for *just* the 'cache0' table:

cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_limit=10000);

However re-running the pgbench test results in several GB worth of space
used by this table. Hmmm - looks like setting this parameter per table does
not work how I expected. Looking at src/backend/postmaster/autovacuum.c I
see some balancing calculations in autovac_balance_cost() and
AutoVacuumUpdateDelay(), the effect which seems to be (after adding some
debugging elogs) to reset the actual effective cost_limit back to 200 for
this table: viz (rel 16387 is cache0):

LOG: autovac_balance_cost(pid=24058 db=16384, rel=16387, cost_limit=200,
cost_limit_base=10000, cost_delay=20)
LOG: autovac_update_delay(pid=24058 db=16384, rel=16387, cost_limit=200,
cost_delay=20)

Is this working as intended? I did wonder if it was an artifact of only
having 1 table (creating another one made no difference)...or perhaps only
1 active worker... I found I had to lobotomize the balancing calc by doing:

cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_delay=0);

before I got the same effect as just setting the cost_limit globally. I'm
now a bit confused about whether I understand how setting cost_limit and
cost_delay via ALTER TABLE works (or in fact if it is working properly for
that matter).

When I go through the code for checking the same, I got the following
behavior.

The default values of vacuum parameters - cost_limit - 200 and cost_delay -
0
The default values of auto vacuum parameters - cost_limit - (-1) and
cost_delay - 20ms.

1. User is not provided any vacuum parameters to the table, so the vacuum
options for the table are cost_limit - 200 and cost_delay - 20
2. User is provided cost_limit as 1000 to the table, so the vacuum options
for the table are cost_limit - 1000 and cost_delay - 20

For the above two cases, the "autovac_balance_cost" function sets the cost
parameters as cost_limit - 200 and cost_delay - 20.

3. User is provided cost_limit as 1000 and cost_delay as 10 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
10

This case the cost_limit - 100 and cost_delay - 10.

4. User is provided cost_limit as 1000 and cost_delay as 100 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
100

This case the cost_limit - 1000 and cost_delay - 100

From the above observations, The cost parameters of vacuum are not working
as they specified.
please correct me if anything wrong in my observation.

FWIW - I can confirm these calculations in 9.4devel. I found the
attached patch handy for logging what the balanced limit and delay was.

Regards

Mark

Attachments:

autovacuum.c.difftext/x-patch; name=autovacuum.c.diffDownload

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 77e683a..246607c 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1739,6 +1739,12 @@ AutoVacuumUpdateDelay(void)
 	{
 		VacuumCostDelay = MyWorkerInfo->wi_cost_delay;
 		VacuumCostLimit = MyWorkerInfo->wi_cost_limit;
+		elog(LOG, "autovac_update_delay(pid=%u db=%u, rel=%u, cost_limit=%d, cost_delay=%d)",
+				 MyWorkerInfo->wi_proc->pid,
+				 MyWorkerInfo->wi_dboid, 
+				 MyWorkerInfo->wi_tableoid,
+				 MyWorkerInfo->wi_cost_limit,
+				 MyWorkerInfo->wi_cost_delay);
 	}
 }
 
@@ -1811,7 +1817,7 @@ autovac_balance_cost(void)
 											worker->wi_cost_limit_base),
 										1);
 
-			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
+			elog(LOG, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
 				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
 				 worker->wi_cost_limit, worker->wi_cost_limit_base,
 				 worker->wi_cost_delay);

kommi.haribabu@gmail.com

almost 12 years ago

In reply to: Mark Kirkwood (#3)

1 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Thu, Feb 13, 2014 at 3:31 PM, Mark Kirkwood wrote:

On 13/02/14 17:13, Haribabu Kommi wrote:

When I go through the code for checking the same, I got the following
behavior.

The default values of vacuum parameters - cost_limit - 200 and cost_delay
-
0
The default values of auto vacuum parameters - cost_limit - (-1) and
cost_delay - 20ms.

1. User is not provided any vacuum parameters to the table, so the vacuum
options for the table are cost_limit - 200 and cost_delay - 20
2. User is provided cost_limit as 1000 to the table, so the vacuum options
for the table are cost_limit - 1000 and cost_delay - 20

For the above two cases, the "autovac_balance_cost" function sets the cost
parameters as cost_limit - 200 and cost_delay - 20.

3. User is provided cost_limit as 1000 and cost_delay as 10 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
10

This case the cost_limit - 100 and cost_delay - 10.

4. User is provided cost_limit as 1000 and cost_delay as 100 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
100

This case the cost_limit - 1000 and cost_delay - 100

From the above observations, The cost parameters of vacuum are not
working
as they specified.
please correct me if anything wrong in my observation.

FWIW - I can confirm these calculations in 9.4devel. I found the attached
patch handy for logging what the balanced limit and delay was.

I changed the balance cost calculations a little bit to give priority to
the user provided per table autovacuum parameters.
If any user specified per table vacuum parameters exists and those are
different with guc vacuum parameters then the
balance cost calculations will not include that worker in calculation. Only
the cost is distributed between other workers
with specified guc vacuum cost parameter.

The problem in this calculation is if the user provides same guc values to
the per table values also then it doesn't consider them in calculation.
Patch is attached in the mail. please provide you suggestions or
corrections in this approach.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

per_table_vacuum_para_v1.patchapplication/octet-stream; name=per_table_vacuum_para_v1.patchDownload

*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***************
*** 4955,4961 **** COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
          This setting can be overridden for individual tables by
!         changing storage parameters.
         </para>
        </listitem>
       </varlistentry>
--- 4955,4962 ----
          This parameter can only be set in the <filename>postgresql.conf</>
          file or on the server command line.
          This setting can be overridden for individual tables by
!         changing storage parameters. The user specified value is ignored if it
!         is the same limit as <xref linkend="guc-vacuum-cost-limit"> value.
         </para>
        </listitem>
       </varlistentry>
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 1776,1795 **** autovac_balance_cost(void)
  	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
  		return;
  
! 	/* caculate the total base cost limit of active workers */
  	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  			cost_total +=
  				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
  	}
- 	/* there are no cost limits -- nothing to do */
- 	if (cost_total <= 0)
- 		return;
  
  	/*
  	 * Adjust cost limit of each active worker to balance the total of cost
--- 1776,1794 ----
  	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
  		return;
  
! 	/* calculate the total base cost limit of active workers */
  	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 &&
! 			worker->wi_cost_delay > 0 &&
! 			worker->wi_cost_limit_base == vac_cost_limit)
  			cost_total +=
  				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
  	}
  
  	/*
  	 * Adjust cost limit of each active worker to balance the total of cost
***************
*** 1803,1820 **** autovac_balance_cost(void)
  		if (worker->wi_proc != NULL &&
  			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  		{
! 			int			limit = (int)
! 			(cost_avail * worker->wi_cost_limit_base / cost_total);
  
! 			/*
! 			 * We put a lower bound of 1 on the cost_limit, to avoid division-
! 			 * by-zero in the vacuum code.	Also, in case of roundoff trouble
! 			 * in these calculations, let's be sure we don't ever set
! 			 * cost_limit to more than the base value.
! 			 */
! 			worker->wi_cost_limit = Max(Min(limit,
! 											worker->wi_cost_limit_base),
! 										1);
  
  			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
  				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
--- 1802,1823 ----
  		if (worker->wi_proc != NULL &&
  			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  		{
! 			if (worker->wi_cost_limit_base == vac_cost_limit &&
! 				cost_total > 0)
! 			{
! 				int			limit = (int)
! 				(cost_avail * worker->wi_cost_limit_base / cost_total);
  
! 				/*
! 				 * We put a lower bound of 1 on the cost_limit, to avoid
! 				 * division- by-zero in the vacuum code.  Also, in case of
! 				 * roundoff trouble in these calculations, let's be sure we
! 				 * don't ever set cost_limit to more than the base value.
! 				 */
! 				worker->wi_cost_limit = Max(Min(limit,
! 												worker->wi_cost_limit_base),
! 											1);
! 			}
  
  			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
  				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Haribabu Kommi (#4)

Re: Per table autovacuum vacuum cost limit behaviour strange

I hadn't noticed this thread. I will give this a look. Thanks for
providing a patch.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Haribabu Kommi (#4)

Re: Per table autovacuum vacuum cost limit behaviour strange

Haribabu Kommi escribiï¿½:

I changed the balance cost calculations a little bit to give priority to
the user provided per table autovacuum parameters.
If any user specified per table vacuum parameters exists and those are
different with guc vacuum parameters then the
balance cost calculations will not include that worker in calculation. Only
the cost is distributed between other workers
with specified guc vacuum cost parameter.

The problem in this calculation is if the user provides same guc values to
the per table values also then it doesn't consider them in calculation.

I think this is a strange approach to the problem, because if you
configure the backends just so, they are completely ignored instead of
being adjusted. And this would have action-at-a-distance consequences
because if you change the defaults in postgresql.conf you end up with
completely different behavior on the tables for which you have carefully
tuned the delay so that they are ignored in rebalance calculations.

I think that rather than ignoring some backends completely, we should be
looking at how to "weight" the balancing calculations among all the
backends in some smart way that doesn't mean they end up with the
default values of limit, which AFAIU is what happens now -- which is
stupid. Not real sure how to do that, perhaps base it on the
globally-configured ratio of delay/limit vs. the table-specific ratio.

What I mean is that perhaps the current approach is all wrong and we
need to find a better algorithm to suit this case and more generally.
Of course, I don't mean to say that it should behave completely
differently than now in normal cases, only that it shouldn't give
completely stupid results in corner cases such as this one.

As an example, suppose that global limit=200 and global delay=20 (the
defaults). Then we have a global ratio of 5. If all three tables being
vacuumed currently are using the default values, then they all have
ratio=5 and therefore all should have the same limit and delay settings
applied after rebalance. Now, if two tables have ratio=5 and one table
has been configured to have a very fast vacuum, that is limit=10000,
then ratio for that table is 10000/20=500. Therefore that table should
be configured, after rebalance, to have a limit and delay that are 100
times faster than the settings for the other two tables. (And there is
a further constraint that the total delay per "limit unit" should be
so-and-so to accomodate getting the correct total delay per limit unit.)

I haven't thought about how to code that, but I don't think it should be
too difficult. Want to give it a try? I think it makes sense to modify
both the running delay and the running limit to achieve whatever ratio
we come up with, except that delay should probably not go below 10ms
because, apparently, some platforms have that much of sleep granularity
and it wouldn't really work to have a smaller delay.

Am I making sense?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

kommi.haribabu@gmail.com

almost 12 years ago

In reply to: Alvaro Herrera (#6)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Feb 15, 2014 9:19 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:

Haribabu Kommi escribió:

I changed the balance cost calculations a little bit to give priority to
the user provided per table autovacuum parameters.
If any user specified per table vacuum parameters exists and those are
different with guc vacuum parameters then the
balance cost calculations will not include that worker in calculation.

Only

the cost is distributed between other workers
with specified guc vacuum cost parameter.

The problem in this calculation is if the user provides same guc values

the per table values also then it doesn't consider them in calculation.

I think this is a strange approach to the problem, because if you
configure the backends just so, they are completely ignored instead of
being adjusted. And this would have action-at-a-distance consequences
because if you change the defaults in postgresql.conf you end up with
completely different behavior on the tables for which you have carefully
tuned the delay so that they are ignored in rebalance calculations.

I think that rather than ignoring some backends completely, we should be
looking at how to "weight" the balancing calculations among all the
backends in some smart way that doesn't mean they end up with the
default values of limit, which AFAIU is what happens now -- which is
stupid. Not real sure how to do that, perhaps base it on the
globally-configured ratio of delay/limit vs. the table-specific ratio.

What I mean is that perhaps the current approach is all wrong and we
need to find a better algorithm to suit this case and more generally.
Of course, I don't mean to say that it should behave completely
differently than now in normal cases, only that it shouldn't give
completely stupid results in corner cases such as this one.

As an example, suppose that global limit=200 and global delay=20 (the
defaults). Then we have a global ratio of 5. If all three tables being
vacuumed currently are using the default values, then they all have
ratio=5 and therefore all should have the same limit and delay settings
applied after rebalance. Now, if two tables have ratio=5 and one table
has been configured to have a very fast vacuum, that is limit=10000,
then ratio for that table is 10000/20=500. Therefore that table should
be configured, after rebalance, to have a limit and delay that are 100
times faster than the settings for the other two tables. (And there is
a further constraint that the total delay per "limit unit" should be
so-and-so to accomodate getting the correct total delay per limit unit.)

I haven't thought about how to code that, but I don't think it should be
too difficult. Want to give it a try? I think it makes sense to modify
both the running delay and the running limit to achieve whatever ratio
we come up with, except that delay should probably not go below 10ms
because, apparently, some platforms have that much of sleep granularity
and it wouldn't really work to have a smaller delay.

Am I making sense?

Yes makes sense and it's a good approach also not leaving the delay
parameter as is. Thanks I will give a try.

Regards,
Hari Babu
Fujitsu Australia

kommi.haribabu@gmail.com

almost 12 years ago

In reply to: Haribabu Kommi (#7)

1 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Sat, Feb 15, 2014 at 10:47 AM, Haribabu Kommi
<kommi.haribabu@gmail.com>wrote:

On Feb 15, 2014 9:19 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com>
wrote:

I think this is a strange approach to the problem, because if you
configure the backends just so, they are completely ignored instead of
being adjusted. And this would have action-at-a-distance consequences
because if you change the defaults in postgresql.conf you end up with
completely different behavior on the tables for which you have carefully
tuned the delay so that they are ignored in rebalance calculations.

I think that rather than ignoring some backends completely, we should be
looking at how to "weight" the balancing calculations among all the
backends in some smart way that doesn't mean they end up with the
default values of limit, which AFAIU is what happens now -- which is
stupid. Not real sure how to do that, perhaps base it on the
globally-configured ratio of delay/limit vs. the table-specific ratio.

What I mean is that perhaps the current approach is all wrong and we
need to find a better algorithm to suit this case and more generally.
Of course, I don't mean to say that it should behave completely
differently than now in normal cases, only that it shouldn't give
completely stupid results in corner cases such as this one.

As an example, suppose that global limit=200 and global delay=20 (the
defaults). Then we have a global ratio of 5. If all three tables being
vacuumed currently are using the default values, then they all have
ratio=5 and therefore all should have the same limit and delay settings
applied after rebalance. Now, if two tables have ratio=5 and one table
has been configured to have a very fast vacuum, that is limit=10000,
then ratio for that table is 10000/20=500. Therefore that table should
be configured, after rebalance, to have a limit and delay that are 100
times faster than the settings for the other two tables. (And there is
a further constraint that the total delay per "limit unit" should be
so-and-so to accomodate getting the correct total delay per limit unit.)

I haven't thought about how to code that, but I don't think it should be
too difficult. Want to give it a try? I think it makes sense to modify
both the running delay and the running limit to achieve whatever ratio
we come up with, except that delay should probably not go below 10ms
because, apparently, some platforms have that much of sleep granularity
and it wouldn't really work to have a smaller delay.

Am I making sense?

Yes makes sense and it's a good approach also not leaving the delay
parameter as is. Thanks I will give a try.

I modified the "autovac_balance_cost" function to balance the costs using
the number of running workers, instead
of default vacuum cost parameters.

Lets assume there are 4 workers running currently with default cost values
of limit 200 and delay 20ms.
The cost will be distributed as 50 and 10ms each.

Suppose if one worker is having a different cost limit value as 1000, which
is 5 times more than default value.
The cost will be distributed as 50 and 10ms each for other 3 workers and
250 and 10ms for the worker having
cost limit value other than default. By this way also it still ensures the
cost limit value is 5 times more than other workers.

By this way the worker with user specified autovacuum cost parameters is
not ignored completely.
Patch is attached. Please let me know your suggestions.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

per_table_vacuum_para_v2.patchapplication/octet-stream; name=per_table_vacuum_para_v2.patchDownload

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 224,229 **** typedef struct WorkerInfoData
--- 224,230 ----
  	PGPROC	   *wi_proc;
  	TimestampTz wi_launchtime;
  	int			wi_cost_delay;
+ 	int			wi_cost_delay_base;
  	int			wi_cost_limit;
  	int			wi_cost_limit_base;
  } WorkerInfoData;
***************
*** 1760,1825 **** autovac_balance_cost(void)
  	 * The idea here is that we ration out I/O equally.  The amount of I/O
  	 * that a worker can consume is determined by cost_limit/cost_delay, so we
  	 * try to equalize those ratios rather than the raw limit settings.
- 	 *
- 	 * note: in cost_limit, zero also means use value from elsewhere, because
- 	 * zero is not a valid value.
  	 */
! 	int			vac_cost_limit = (autovacuum_vac_cost_limit > 0 ?
! 								autovacuum_vac_cost_limit : VacuumCostLimit);
! 	int			vac_cost_delay = (autovacuum_vac_cost_delay >= 0 ?
! 								autovacuum_vac_cost_delay : VacuumCostDelay);
! 	double		cost_total;
! 	double		cost_avail;
  	dlist_iter	iter;
  
- 	/* not set? nothing to do */
- 	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
- 		return;
- 
- 	/* caculate the total base cost limit of active workers */
- 	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
! 			cost_total +=
! 				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
  	}
- 	/* there are no cost limits -- nothing to do */
- 	if (cost_total <= 0)
- 		return;
  
  	/*
! 	 * Adjust cost limit of each active worker to balance the total of cost
! 	 * limit to autovacuum_vacuum_cost_limit.
  	 */
- 	cost_avail = (double) vac_cost_limit / vac_cost_delay;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  		{
- 			int			limit = (int)
- 			(cost_avail * worker->wi_cost_limit_base / cost_total);
- 
  			/*
  			 * We put a lower bound of 1 on the cost_limit, to avoid division-
! 			 * by-zero in the vacuum code.	Also, in case of roundoff trouble
! 			 * in these calculations, let's be sure we don't ever set
! 			 * cost_limit to more than the base value.
  			 */
! 			worker->wi_cost_limit = Max(Min(limit,
! 											worker->wi_cost_limit_base),
! 										1);
  
! 			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
  				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
  				 worker->wi_cost_limit, worker->wi_cost_limit_base,
! 				 worker->wi_cost_delay);
  		}
  	}
  }
--- 1761,1808 ----
  	 * The idea here is that we ration out I/O equally.  The amount of I/O
  	 * that a worker can consume is determined by cost_limit/cost_delay, so we
  	 * try to equalize those ratios rather than the raw limit settings.
  	 */
! 	int			num_runningWorkers = 0;
  	dlist_iter	iter;
  
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay_base > 0)
! 			num_runningWorkers++;
  	}
  
  	/*
! 	 * Adjust cost limit and cost delay of each active worker to balance the
! 	 * total of cost_limit/cost_delay based on number of active workers.
  	 */
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay_base > 0)
  		{
  			/*
  			 * We put a lower bound of 1 on the cost_limit, to avoid division-
! 			 * by-zero in the vacuum code.
! 			 */
! 			worker->wi_cost_limit = Max((worker->wi_cost_limit_base / num_runningWorkers),
! 									1);
! 
! 			/*
! 			 * We put a lower bound of 10ms on the cost_delay as some platforms have
! 			 * this as sleep granularity.
  			 */
! 			worker->wi_cost_delay = Max((worker->wi_cost_delay_base / num_runningWorkers),
! 									10);
  
! 			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d, cost_delay_base=%d)",
  				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
  				 worker->wi_cost_limit, worker->wi_cost_limit_base,
! 				 worker->wi_cost_delay, worker->wi_cost_delay_base);
  		}
  	}
  }
***************
*** 2285,2290 **** do_autovacuum(void)
--- 2268,2274 ----
  
  		/* advertise my cost delay parameters for the balancing algorithm */
  		MyWorkerInfo->wi_cost_delay = tab->at_vacuum_cost_delay;
+ 		MyWorkerInfo->wi_cost_delay_base = tab->at_vacuum_cost_delay;
  		MyWorkerInfo->wi_cost_limit = tab->at_vacuum_cost_limit;
  		MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Haribabu Kommi (#8)

Re: Per table autovacuum vacuum cost limit behaviour strange

Haribabu Kommi wrote:

I modified the "autovac_balance_cost" function to balance the costs using
the number of running workers, instead
of default vacuum cost parameters.

Just as a heads-up, this patch wasn't part of the commitfest, but I
intend to review it and possibly commit for 9.4. Not immediately but at
some point.

Arguably this is a bug fix, since the autovac rebalance code behaves
horribly in cases such as the one described here, so I should consider a
backpatch right away. However I don't think it's a good idea to do that
without more field testing. Perhaps we can backpatch later if the new
code demonstrates its sanity.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

amit.kapila16@gmail.com

over 11 years ago

In reply to: Haribabu Kommi (#8)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

I modified the "autovac_balance_cost" function to balance the costs using
the number of running workers, instead
of default vacuum cost parameters.

Lets assume there are 4 workers running currently with default cost values
of limit 200 and delay 20ms.
The cost will be distributed as 50 and 10ms each.

Suppose if one worker is having a different cost limit value as 1000, which
is 5 times more than default value.
The cost will be distributed as 50 and 10ms each for other 3 workers and 250
and 10ms for the worker having
cost limit value other than default. By this way also it still ensures the
cost limit value is 5 times more than other workers.

Won't this change break the basic idea of autovacuum_vacuum_cost_limit
which is as follows:
"Note that the value is distributed proportionally among the running autovacuum
workers, if there is more than one, so that the sum of the limits of each worker
never exceeds the limit on this variable.".

Basically with proposed change, the sum of limits of each worker will be more
than autovacuum_vacuum_cost_limit and I think main reason for same is that
the new calculation doesn't consider autovacuum_vacuum_cost_limit or other
similar parameters.

I think current calculation gives appropriate consideration for table level
vacuum settings when autovacuum_vacuum_cost_limit is configured
with more care (i.e it is more than table level settings). As an example
consider the below case:

autovacuum_vacuum_cost_limit = 10000
t1 (autovacuum_vacuum_cost_limit = 1000)
t2 (default)
t3 (default)
t4 (default)

Consider other settings as Default.

Now cost_limit after autovac_balance_cost is as follows:
Worker-1 for t1 = 322
Worker-2 for t2 = 3225
Worker-3 for t3 = 3225
Worker-4 for t3 = 3225

So in this way proper consideration has been given to table level
vacuum settings and guc configured for autovacuum_vacuum_cost_limit
with current code.

Now it might be the case that we want to improve current calculation for
cases where it doesn't work well, but I think it has to be better than current
behaviour and it is better to consider both guc's and table level settings with
some better formula.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

kommi.haribabu@gmail.com

over 11 years ago

In reply to: Amit Kapila (#10)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Mon, May 5, 2014 at 1:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

I modified the "autovac_balance_cost" function to balance the costs using
the number of running workers, instead
of default vacuum cost parameters.

Lets assume there are 4 workers running currently with default cost values
of limit 200 and delay 20ms.
The cost will be distributed as 50 and 10ms each.

Suppose if one worker is having a different cost limit value as 1000, which
is 5 times more than default value.
The cost will be distributed as 50 and 10ms each for other 3 workers and 250
and 10ms for the worker having
cost limit value other than default. By this way also it still ensures the
cost limit value is 5 times more than other workers.

Won't this change break the basic idea of autovacuum_vacuum_cost_limit
which is as follows:
"Note that the value is distributed proportionally among the running autovacuum
workers, if there is more than one, so that the sum of the limits of each worker
never exceeds the limit on this variable.".

It is not breaking the behavior. This setting can be overridden for
individual tables by
changing storage parameters. Still the cost values for the default
tables are under the guc limit.

Basically with proposed change, the sum of limits of each worker will be more
than autovacuum_vacuum_cost_limit and I think main reason for same is that
the new calculation doesn't consider autovacuum_vacuum_cost_limit or other
similar parameters.

If user doesn't provide any table specific value then
autovacuum_vacuum_cost_limit guc
value is set to the table. So the same is used in the calculation.

I think current calculation gives appropriate consideration for table level
vacuum settings when autovacuum_vacuum_cost_limit is configured
with more care (i.e it is more than table level settings). As an example
consider the below case:

autovacuum_vacuum_cost_limit = 10000
t1 (autovacuum_vacuum_cost_limit = 1000)
t2 (default)
t3 (default)
t4 (default)

Consider other settings as Default.

Now cost_limit after autovac_balance_cost is as follows:
Worker-1 for t1 = 322
Worker-2 for t2 = 3225
Worker-3 for t3 = 3225
Worker-4 for t3 = 3225

So in this way proper consideration has been given to table level
vacuum settings and guc configured for autovacuum_vacuum_cost_limit
with current code.

It works for the case where the table specific values less than the
default cost limit.
The same logic doesn't work with higher values. Usually the table
specific values
are more than default values to the tables where the faster vacuuming
is expected.

Now it might be the case that we want to improve current calculation for
cases where it doesn't work well, but I think it has to be better than current
behaviour and it is better to consider both guc's and table level settings with
some better formula.

With the proposed change, it works for both fine whether the table
specific value is higher
or lower to the default value. It works on the factor of the
difference between the default value
to the table specific value.

default autovacuum_vacuum_cost_limit = 10000
t1 - 1000, t2 - default, t3 - default, t4 - default --> balance costs
t1 - 250, t2 - 2500, t3 - 2500, t4 - 2500.

t1 - 20000, t2 - default, t3 - default, t4 - default --> balance
costs t1 - 5000, t2 - 2500, t3 - 2500, t4 - 2500.

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

amit.kapila16@gmail.com

over 11 years ago

In reply to: Haribabu Kommi (#11)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Mon, May 5, 2014 at 6:35 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Mon, May 5, 2014 at 1:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

I modified the "autovac_balance_cost" function to balance the costs using
the number of running workers, instead
of default vacuum cost parameters.

Lets assume there are 4 workers running currently with default cost values
of limit 200 and delay 20ms.
The cost will be distributed as 50 and 10ms each.

Suppose if one worker is having a different cost limit value as 1000, which
is 5 times more than default value.
The cost will be distributed as 50 and 10ms each for other 3 workers and 250
and 10ms for the worker having
cost limit value other than default. By this way also it still ensures the
cost limit value is 5 times more than other workers.

Won't this change break the basic idea of autovacuum_vacuum_cost_limit
which is as follows:
"Note that the value is distributed proportionally among the running autovacuum
workers, if there is more than one, so that the sum of the limits of each worker
never exceeds the limit on this variable.".

It is not breaking the behavior. This setting can be overridden for
individual tables by
changing storage parameters. Still the cost values for the default
tables are under the guc limit.

Could you think of a case where in current calculation it doesn't follow
what I mentioned above ("the sum of the limits of each worker
never exceeds the limit on this variable.")?

Here what I could understand is that sum of cost_limit for all
autovacuum workers should never exceed the value of
autovacuum_vacuum_cost_limit which seems to be always the
case in current code but same is not true for proposed patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

mark.kirkwood@catalyst.net.nz

over 11 years ago

In reply to: Amit Kapila (#12)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 05/05/14 15:22, Amit Kapila wrote:

Here what I could understand is that sum of cost_limit for all
autovacuum workers should never exceed the value of
autovacuum_vacuum_cost_limit which seems to be always the
case in current code but same is not true for proposed patch.

Right, but have a look at the 1st message in this thread - the current
behavior (and to a large extent the above condition) means that setting
cost limits per table does not work in any remotely intuitive way.

ITSM that the whole purpose of a per table setting in this context is to
override the behavior of auto vacuum throttling - and currently this
does not happen unless you get real brutal (i.e setting the cost delay
to zero in addition to setting cost limit...making the whole cost limit
a bit pointless).

regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

amit.kapila16@gmail.com

over 11 years ago

In reply to: Mark Kirkwood (#13)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 05/05/14 15:22, Amit Kapila wrote:
Right, but have a look at the 1st message in this thread - the current
behavior (and to a large extent the above condition) means that setting
cost limits per table does not work in any remotely intuitive way.

ITSM that the whole purpose of a per table setting in this context is to
override the behavior of auto vacuum throttling - and currently this does
not happen unless you get real brutal (i.e setting the cost delay to zero in
addition to setting cost limit...making the whole cost limit a bit
pointless).

I think meaning of per table setting is just that it overrides the default
value of autovacuum_vacuum_cost_limit for that table and the rest of
calculation or concept remains same. This is what currently code does
and the same is mentioned in docs as far as I can understand.

As per current behaviour the per-table cost_limit is also adjusted based
on the setting of GUC autovacuum_vacuum_cost_limit and right now it
follows a simple principle that the total cost limit for all workers should be
equal to autovacuum_vacuum_cost_limit. Even code has below comment:

/*
* Adjust cost limit of each active worker to balance the total of cost
* limit to autovacuum_vacuum_cost_limit.
*/

Now If you want to change for the case where user specifies value per
table which is more than autovacuum_vacuum_cost_limit or otherwise,
then I think the new definition should be bit more clear and it is better
not to impact current calculation for default values.

I could think of 2 ways to change this:

a. if user has specified cost_limit value for table, then it just uses it
rather than rebalancing based on value of system-wide guc variable
autovacuum_vacuum_cost_limit
b. another could be to restrict setting per-table value to be lesser than
system-wide value?

The former is used for auto vacuum parameters like scale_factor and
later is used for parameters like freeze_max_age.

Thoughts?

Alvaro, do you think above options makes sense to solve this problem?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

mark.kirkwood@catalyst.net.nz

over 11 years ago

In reply to: Amit Kapila (#14)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 06/05/14 16:28, Amit Kapila wrote:

On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 05/05/14 15:22, Amit Kapila wrote:
Right, but have a look at the 1st message in this thread - the current
behavior (and to a large extent the above condition) means that setting
cost limits per table does not work in any remotely intuitive way.

ITSM that the whole purpose of a per table setting in this context is to
override the behavior of auto vacuum throttling - and currently this does
not happen unless you get real brutal (i.e setting the cost delay to zero in
addition to setting cost limit...making the whole cost limit a bit
pointless).

I think meaning of per table setting is just that it overrides the default
value of autovacuum_vacuum_cost_limit for that table and the rest of
calculation or concept remains same. This is what currently code does
and the same is mentioned in docs as far as I can understand.

As per current behaviour the per-table cost_limit is also adjusted based
on the setting of GUC autovacuum_vacuum_cost_limit and right now it
follows a simple principle that the total cost limit for all workers should be
equal to autovacuum_vacuum_cost_limit. Even code has below comment:

/*
* Adjust cost limit of each active worker to balance the total of cost
* limit to autovacuum_vacuum_cost_limit.
*/

Now If you want to change for the case where user specifies value per
table which is more than autovacuum_vacuum_cost_limit or otherwise,
then I think the new definition should be bit more clear and it is better
not to impact current calculation for default values.

I could think of 2 ways to change this:

a. if user has specified cost_limit value for table, then it just uses it
rather than rebalancing based on value of system-wide guc variable
autovacuum_vacuum_cost_limit
b. another could be to restrict setting per-table value to be lesser than
system-wide value?

The former is used for auto vacuum parameters like scale_factor and
later is used for parameters like freeze_max_age.

Thoughts?

Alvaro, do you think above options makes sense to solve this problem?

Yes indeed - the code currently working differently from what one would
expect. However the usual reason for handing knobs to the user for
individual object is so that special configurations can be applied to
them. The current method of operation of the per table knobs does not do
this (not without clubbing 'em on the head)

The (ahem) sensible way that one would expect (perhaps even need)
autovacuum throttling to work is:

- set sensible defaults for all the usual (well behaved) tables
- set a few really aggressive overrides for a handful of the naughty ones

Runaway free space bloat is one of the things that can really mangle a
postgres system (I've been called in to rescue a few in my time)...
there needs to be a way to control those few badly behaved tables ...
without removing the usefulness of throttling the others.

Regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Mark Kirkwood (#15)

Re: Per table autovacuum vacuum cost limit behaviour strange

Mark Kirkwood wrote:

On 06/05/14 16:28, Amit Kapila wrote:

On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

I could think of 2 ways to change this:

a. if user has specified cost_limit value for table, then it just uses it
rather than rebalancing based on value of system-wide guc variable
autovacuum_vacuum_cost_limit
b. another could be to restrict setting per-table value to be lesser than
system-wide value?

The former is used for auto vacuum parameters like scale_factor and
later is used for parameters like freeze_max_age.

Thoughts?

Alvaro, do you think above options makes sense to solve this problem?

I've been giving some thought to this. Really, there is no way to
handle this sensibly while at the same time keeping the documented
behavior -- or in other words, what we have documented is not useful
behavior. Your option (b) above is an easy solution to the problem,
however it means that the user will have serious trouble configuring the
system in scenarios such as volatile tables, as Mark says -- essentially
that will foreclose the option of using autovacuum for them.

I'm not sure I like your (a) proposal much better. One problem there is
that if you set the values for a table to be exactly the same values as
in postgresql.conf, it will behave completely different because it will
not participate in balancing. To me this seems to violate POLA.

I checked Haribabu's latest patch in this thread, and didn't like it
much. If you set up a table to have cost_delay=1000, it runs at that
speed when vacuumed alone; but if there are two workers, it goes at half
the speed even if the other one is configured with a very small
cost_delay (in essence, "waste" the allocated I/O bandwidth). Three
workers, it goes at a third of the speed -- again, even if the other
tables are configured to go much slower than the volatile one. This
seems too simplistic. It might be okay when you have only one or two
very large or high-churn tables, and small numbers of workers, but it's
not unreasonable to think that you might have lots more workers if your
DB has many high-churn tables.

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay. The highest
the value is for any table, the fastest it is vacuumed. (It makes sense
in physical terms: a higher cost_limit makes it faster, because vacuum
sleeps less often; and a higher cost_delay makes it go slower, because
vacuums sleeps for longer.) Now, the critical issue is to notice that
not all tables are equal; they can be split in two groups, those that go
faster than the global delay equivalent
(i.e. the effective values of GUC variables
autovacuum_vacuum_cost_limit/autovacuum_vacuum_cost_delay), and those
that go equal or slower. For the latter group, the rebalancing
algorithm "distributes" the allocated I/O by the global vars, in a
pro-rated manner. For the former group (tables vacuumed faster than
global delay equiv), to rebalance we don't consider the global delay
equiv but the delay equiv of the fastest table currently being vacuumed.

Suppose we have two tables, delay_equiv=10 each (which is the default
value). If they are both vacuumed in parallel, then we distribute a
delay_equiv of 5 to each (so set cost_limit=100, cost_delay=20). As
soon as one of them finishes, the remaining one is allowed to upgrade to
delay_equiv=10 (cost_limit=200, cost_delay=20).

Now add a third table, delay_equiv=500 (cost_limit=10000, cost_delay=20;
this is Mark's volatile table). If it's being vacuumed on its own, just
assign cost_limit=10000 cost_delay=20, as normal. If one of the other
two tables are being vacuumed, that one will use delay_equiv=10, as per
above. To balance the volatile table, we take the delay_equiv of this
one and subtract the already handed-out delay_equiv of 10; so we set the
volatile table to delay_equiv=490 (cost_limit=9800, cost_delay=20).

If we do it this way, the whole system is running at the full speed
enabled by the fastest table we have set the per-table options, but also
we have scaled things so that the slow tables go slow and the fast
tables go fast.

As a more elaborate example, add a fourth table with delay_equiv=50
(cost_limit=1000, cost_delay=20). This is also faster than the global
vars, so we put it in the first group. If all four tables are being
vacuumed in parallel, we have the two slow tables going at delay_equiv=5
each (cost_limit=100, cost_delay=20); then there are delay_equiv=490 to
distribute among the remaining ones; pro-rating this we have
delay_equiv=445 (cost_limit=8900, cost_delay=20) for the volatile table
and delay_equiv=45 (cost_limit=900, cost_delay=20) for the other one.

If one of the slowest tables finished vacuuming, the other one will
speed up to delay_equiv=10, and the two fastest ones will go on
unchanged. If both finish and the fast tables keep going, the faster
one will go at delay_equiv=454 and the other one at delay_equiv=45.
Note that the volatile table will go a bit faster while the other one is
barely affected.

Essentially, if you configure a table with a delay-equiv that's greater
than the system configured values, you're giving permission for vacuum
to use more I/O, but each table has its own limit to how fast it can go.

The (ahem) sensible way that one would expect (perhaps even need)
autovacuum throttling to work is:

- set sensible defaults for all the usual (well behaved) tables
- set a few really aggressive overrides for a handful of the naughty ones

Does my proposal above satisfy your concerns?

Runaway free space bloat is one of the things that can really mangle
a postgres system (I've been called in to rescue a few in my
time)... there needs to be a way to control those few badly behaved
tables ... without removing the usefulness of throttling the others.

Agreed.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#16)

1 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

Alvaro Herrera wrote:

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay.

Here's a patch that implements this idea. As you see this is quite a
bit more complicated that Haribabu's proposal.

There are two holes in this:

1. if you ALTER DATABASE to change vacuum delay for a database, those
values are not considered in the global equiv delay. I don't think this
is very important and anyway we haven't considered this very much, so
it's okay if we don't handle it.

2. If you have a "fast worker" that's only slightly faster than regular
workers, it will become slower in some cases. This is explained in a
FIXME comment in the patch.

I don't really have any more time to invest in this, but I would like to
see it in 9.4. Mark, would you test this? Haribabu, how open are you
to fixing point (2) above?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

per_table_vacuum_para_v3.patchtext/x-diff; charset=us-asciiDownload

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 223,231 **** typedef struct WorkerInfoData
  	Oid			wi_tableoid;
  	PGPROC	   *wi_proc;
  	TimestampTz wi_launchtime;
! 	int			wi_cost_delay;
! 	int			wi_cost_limit;
  	int			wi_cost_limit_base;
  } WorkerInfoData;
  
  typedef struct WorkerInfoData *WorkerInfo;
--- 223,234 ----
  	Oid			wi_tableoid;
  	PGPROC	   *wi_proc;
  	TimestampTz wi_launchtime;
! 	/* the "base" values are the configured values */
  	int			wi_cost_limit_base;
+ 	int			wi_cost_delay_base;
+ 	/* these are the ones actually in effect, considering balancing */
+ 	int			wi_cost_limit;
+ 	int			wi_cost_delay;
  } WorkerInfoData;
  
  typedef struct WorkerInfoData *WorkerInfo;
***************
*** 248,253 **** typedef enum
--- 251,257 ----
   *
   * av_signal		set by other processes to indicate various conditions
   * av_launcherpid	the PID of the autovacuum launcher
+  * av_vacuum_cost_*	globally configured values for the cost_delay feature
   * av_freeWorkers	the WorkerInfo freelist
   * av_runningWorkers the WorkerInfo non-free queue
   * av_startingWorker pointer to WorkerInfo currently being started (cleared by
***************
*** 261,266 **** typedef struct
--- 265,272 ----
  {
  	sig_atomic_t av_signal[AutoVacNumSignals];
  	pid_t		av_launcherpid;
+ 	int			av_vacuum_cost_delay;
+ 	int			av_vacuum_cost_limit;
  	dlist_head	av_freeWorkers;
  	dlist_head	av_runningWorkers;
  	WorkerInfo	av_startingWorker;
***************
*** 296,301 **** static List *get_database_list(void);
--- 302,309 ----
  static void rebuild_database_list(Oid newdb);
  static int	db_comparator(const void *a, const void *b);
  static void autovac_balance_cost(void);
+ static int choose_vacuum_cost_delay(AutoVacOpts *avopts);
+ static int choose_vacuum_cost_limit(AutoVacOpts *avopts);
  
  static void do_autovacuum(void);
  static void FreeWorkerInfo(int code, Datum arg);
***************
*** 566,571 **** AutoVacLauncherMain(int argc, char *argv[])
--- 574,583 ----
  	SetConfigOption("default_transaction_isolation", "read committed",
  					PGC_SUSET, PGC_S_OVERRIDE);
  
+ 	/* Set up initial values for cost delay balancing algorithm */
+ 	AutoVacuumShmem->av_vacuum_cost_limit = choose_vacuum_cost_limit(NULL);
+ 	AutoVacuumShmem->av_vacuum_cost_delay = choose_vacuum_cost_delay(NULL);
+ 
  	/* in emergency mode, just start a worker and go away */
  	if (!AutoVacuumingActive())
  	{
***************
*** 638,643 **** AutoVacLauncherMain(int argc, char *argv[])
--- 650,657 ----
  
  			/* rebalance in case the default cost parameters changed */
  			LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+ 			AutoVacuumShmem->av_vacuum_cost_limit = choose_vacuum_cost_limit(NULL);
+ 			AutoVacuumShmem->av_vacuum_cost_delay = choose_vacuum_cost_delay(NULL);
  			autovac_balance_cost();
  			LWLockRelease(AutovacuumLock);
  
***************
*** 1717,1722 **** FreeWorkerInfo(int code, Datum arg)
--- 1731,1737 ----
  		MyWorkerInfo->wi_proc = NULL;
  		MyWorkerInfo->wi_launchtime = 0;
  		MyWorkerInfo->wi_cost_delay = 0;
+ 		MyWorkerInfo->wi_cost_delay_base = 0;
  		MyWorkerInfo->wi_cost_limit = 0;
  		MyWorkerInfo->wi_cost_limit_base = 0;
  		dlist_push_head(&AutoVacuumShmem->av_freeWorkers,
***************
*** 1742,1749 **** AutoVacuumUpdateDelay(void)
--- 1757,1766 ----
  {
  	if (MyWorkerInfo)
  	{
+ 		LWLockAcquire(AutovacuumLock, LW_SHARED);
  		VacuumCostDelay = MyWorkerInfo->wi_cost_delay;
  		VacuumCostLimit = MyWorkerInfo->wi_cost_limit;
+ 		LWLockRelease(AutovacuumLock);
  	}
  }
  
***************
*** 1756,1830 **** AutoVacuumUpdateDelay(void)
  static void
  autovac_balance_cost(void)
  {
- 	/*
- 	 * The idea here is that we ration out I/O equally.  The amount of I/O
- 	 * that a worker can consume is determined by cost_limit/cost_delay, so we
- 	 * try to equalize those ratios rather than the raw limit settings.
- 	 *
- 	 * note: in cost_limit, zero also means use value from elsewhere, because
- 	 * zero is not a valid value.
- 	 */
- 	int			vac_cost_limit = (autovacuum_vac_cost_limit > 0 ?
- 								autovacuum_vac_cost_limit : VacuumCostLimit);
- 	int			vac_cost_delay = (autovacuum_vac_cost_delay >= 0 ?
- 								autovacuum_vac_cost_delay : VacuumCostDelay);
- 	double		cost_total;
- 	double		cost_avail;
  	dlist_iter	iter;
! 
! 	/* not set? nothing to do */
! 	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
  		return;
  
! 	/* caculate the total base cost limit of active workers */
! 	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
! 		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
! 			cost_total +=
! 				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
  	}
- 	/* there are no cost limits -- nothing to do */
- 	if (cost_total <= 0)
- 		return;
  
  	/*
! 	 * Adjust cost limit of each active worker to balance the total of cost
! 	 * limit to autovacuum_vacuum_cost_limit.
  	 */
- 	cost_avail = (double) vac_cost_limit / vac_cost_delay;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
! 		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  		{
- 			int			limit = (int)
- 			(cost_avail * worker->wi_cost_limit_base / cost_total);
  
! 			/*
! 			 * We put a lower bound of 1 on the cost_limit, to avoid division-
! 			 * by-zero in the vacuum code.  Also, in case of roundoff trouble
! 			 * in these calculations, let's be sure we don't ever set
! 			 * cost_limit to more than the base value.
! 			 */
! 			worker->wi_cost_limit = Max(Min(limit,
! 											worker->wi_cost_limit_base),
! 										1);
! 
! 			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
! 				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
! 				 worker->wi_cost_limit, worker->wi_cost_limit_base,
! 				 worker->wi_cost_delay);
  		}
  	}
  }
  
  /*
   * get_database_list
   *		Return a list of all databases found in pg_database.
   *
--- 1773,1954 ----
  static void
  autovac_balance_cost(void)
  {
  	dlist_iter	iter;
! 	int			num_regular_workers = 0;
! 	int			num_fast_workers = 0;
! 	float4		global_equiv_delay;
! 	float4		fast_equiv_delay;
! 	float4		regular_total_equiv_delay;
! 	float4		fast_total_equiv_delay;
! 
! 	/* not set in this worker? nothing to do */
! 	if (MyWorkerInfo && (MyWorkerInfo->wi_cost_limit_base <= 0 ||
! 						 MyWorkerInfo->wi_cost_delay_base <= 0))
  		return;
  
! 	/*
! 	 * We use a metric we call "equivalent delay", equal to cost_limit divided
! 	 * by cost_delay, to enable a reasonably simple algorithm to distribute
! 	 * vacuum I/O bandwidth among all active workers.  The "global" equiv delay
! 	 * is the value computed from the GUC parameters in effect for this
! 	 * database (computed from postgresql.conf values.  Values set in ALTER
! 	 * DATABASE have no effect here, but probably need to fix that XXX).  This
! 	 * value lets us split workers in two classes: regular workers are those
! 	 * that have an equiv delay less than or equal to global equiv delay, while
! 	 * fast workers are those that have a value greater than global equiv
! 	 * delay.
! 	 */
! 	global_equiv_delay = AutoVacuumShmem->av_vacuum_cost_limit /
! 		AutoVacuumShmem->av_vacuum_cost_delay;
! 
! 	/*
! 	 * Find the sum of equiv delay values in each class; also find the equiv
! 	 * delay of the fastest among all the fast workers.
! 	 */
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+ 		float4		this_equiv_delay;
+ 
+ 		/* ignore inactive workers, and workers not using cost_delay */
+ 		if (worker->wi_proc == NULL ||
+ 			worker->wi_cost_limit_base <= 0 || worker->wi_cost_delay_base <= 0)
+ 			continue;
+ 
+ 		this_equiv_delay = worker->wi_cost_limit_base / worker->wi_cost_delay_base;
  
! 		if (this_equiv_delay <= global_equiv_delay)
! 		{
! 			/* A regular worker -- count it and add its equiv delay as such */
! 			num_regular_workers++;
! 			regular_total_equiv_delay += this_equiv_delay;
! 		}
! 		else
! 		{
! 			/*
! 			 * A fast worker.  Count and add its equiv delay in a separate
! 			 * total; we also use these to determine the fast_equiv_delay which
! 			 * is the value which we will distribute to all of them, separately
! 			 * from global_equiv_delay.
! 			 */
! 			if (num_fast_workers == 0)
! 				fast_equiv_delay = this_equiv_delay;
! 			else if (this_equiv_delay > fast_equiv_delay)
! 				fast_equiv_delay = this_equiv_delay;
! 			num_fast_workers++;
! 			fast_total_equiv_delay += this_equiv_delay;
! 		}
  	}
  
  	/*
! 	 * FIXME We shouldn't actually distribute all of global equiv delay to
! 	 * regular workers and all of (fast equiv delay - global equiv delay) to
! 	 * fast workers; that makes fast workers much slower if the fast equiv
! 	 * delay is only slightly faster than regular workers.  For example,
! 	 * consider the scenario with one regular worker with equiv_delay=10 and
! 	 * one fast worker with equiv_delay=11; after subtraction the fast worker
! 	 * will have equiv_delay=1).  This part needs more fiddling to avoid this
! 	 * problem.
! 	 *
! 	 * Perhaps the way to solve this is to consider that all workers (including
! 	 * fast ones) get their fraction of global delay, and then fast workers get
! 	 * their pro-rated share of fast equiv delay *added* to that.  This would
! 	 * make regular workers slower.
! 	 *
! 	 * Another idea is to subtract only a fraction of global_equiv_delay from
! 	 * fast_equiv_delay here, rather than all of it (and of course decrease
! 	 * global_equiv_delay accordingly).  The fraction would be variable: if
! 	 * fast_equiv_delay is much higher than global_equiv_delay, then subtract
! 	 * all of it; if both values are close enough, subtract only half (or
! 	 * rather than subtracting half, subtract a pro-rated fraction according to
! 	 * the number of workers in each class.)
! 	 */
! 	if (num_regular_workers > 0)
! 		fast_equiv_delay -= global_equiv_delay;
! 
! 	/*
! 	 * Now we have all parameters we need; compute the values for individual
! 	 * workers.
  	 */
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+ 		float4		this_equiv_delay_base;
+ 		float4		this_equiv_delay_frac;
+ 
+ 		/* again, ignore inactive workers, and workers not using cost_delay */
+ 		if (worker->wi_proc == NULL ||
+ 			worker->wi_cost_limit_base <= 0 || worker->wi_cost_delay_base <= 0)
+ 			continue;
  
! 		this_equiv_delay_base = worker->wi_cost_limit_base / worker->wi_cost_delay_base;
! 
! 		/*
! 		 * If it's a regular worker, use pro-rated fraction of global_equiv_delay;
! 		 * otherwise, use pro-rated fraction of fast_total_equiv_delay
! 		 */
! 		if (this_equiv_delay_base <= global_equiv_delay)
  		{
  
! 			/* A regular worker; use a pro-rated fraction of global_equiv_delay */
! 			this_equiv_delay_frac =
! 				this_equiv_delay_base * global_equiv_delay / regular_total_equiv_delay;
  		}
+ 		else
+ 		{
+ 			/* a fast worker: use a pro-rated fraction of fast_equiv_delay */
+ 			this_equiv_delay_frac =
+ 				this_equiv_delay_base * fast_equiv_delay / fast_total_equiv_delay;
+ 		}
+ 
+ 		/*
+ 		 * Convert back into cost_limit and cost_delay values, and set them
+ 		 * into the worker's shmem struct.  We put a lower bound of 1 to the
+ 		 * cost_limit, to avoid a division-by-zero in the vacuum code.
+ 		 */
+ 		worker->wi_cost_limit = Max(this_equiv_delay_frac *
+ 									worker->wi_cost_delay_base,
+ 									1);
+ 		worker->wi_cost_delay = worker->wi_cost_delay_base;
+ 
+ 		elog(LOG, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_delay=%d)",
+ 			 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
+ 			 worker->wi_cost_limit, worker->wi_cost_delay);
  	}
  }
  
  /*
+  * Determine the vacuum_cost_delay value to use: if this is a table and it has
+  * reloptions, use the value from there; otherwise the autovacuum parameter,
+  * unless it's -1 in which case we use plain vacuum_cost_delay.
+  */
+ static int
+ choose_vacuum_cost_delay(AutoVacOpts *avopts)
+ {
+ 	if (avopts && avopts->vacuum_cost_delay >= 0)
+ 		return avopts->vacuum_cost_delay;
+ 	if (autovacuum_vac_cost_delay >= 0)
+ 		return autovacuum_vac_cost_delay;
+ 	return VacuumCostDelay;
+ }
+ 
+ /*
+  * Determine the vacuum_cost_limit value to use: if this is a table and it has
+  * reloptions, use the value from there; otherwise the autovacuum parameter,
+  * unless it's -1 or 0 in which case we use plain vacuum_cost_limit.
+  */
+ static int
+ choose_vacuum_cost_limit(AutoVacOpts *avopts)
+ {
+ 	/* 0 is an invalid value here, so avoid that */
+ 	if (avopts && avopts->vacuum_cost_limit > 0)
+ 		return avopts->vacuum_cost_limit;
+ 	if (autovacuum_vac_cost_limit > 0)
+ 		return autovacuum_vac_cost_limit;
+ 	return VacuumCostLimit;
+ }
+ 
+ /*
   * get_database_list
   *		Return a list of all databases found in pg_database.
   *
***************
*** 2202,2209 **** do_autovacuum(void)
  		Oid			relid = lfirst_oid(cell);
  		autovac_table *tab;
  		bool		skipit;
- 		int			stdVacuumCostDelay;
- 		int			stdVacuumCostLimit;
  		dlist_iter	iter;
  
  		CHECK_FOR_INTERRUPTS();
--- 2326,2331 ----
***************
*** 2272,2302 **** do_autovacuum(void)
  		MyWorkerInfo->wi_tableoid = relid;
  		LWLockRelease(AutovacuumScheduleLock);
  
- 		/*
- 		 * Remember the prevailing values of the vacuum cost GUCs.  We have to
- 		 * restore these at the bottom of the loop, else we'll compute wrong
- 		 * values in the next iteration of autovac_balance_cost().
- 		 */
- 		stdVacuumCostDelay = VacuumCostDelay;
- 		stdVacuumCostLimit = VacuumCostLimit;
- 
  		/* Must hold AutovacuumLock while mucking with cost balance info */
  		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
  
  		/* advertise my cost delay parameters for the balancing algorithm */
  		MyWorkerInfo->wi_cost_delay = tab->at_vacuum_cost_delay;
  		MyWorkerInfo->wi_cost_limit = tab->at_vacuum_cost_limit;
  		MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;
  
  		/* do a balance */
  		autovac_balance_cost();
  
- 		/* set the active cost parameters from the result of that */
- 		AutoVacuumUpdateDelay();
- 
  		/* done */
  		LWLockRelease(AutovacuumLock);
  
  		/* clean up memory before each iteration */
  		MemoryContextResetAndDeleteChildren(PortalContext);
  
--- 2394,2417 ----
  		MyWorkerInfo->wi_tableoid = relid;
  		LWLockRelease(AutovacuumScheduleLock);
  
  		/* Must hold AutovacuumLock while mucking with cost balance info */
  		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
  
  		/* advertise my cost delay parameters for the balancing algorithm */
  		MyWorkerInfo->wi_cost_delay = tab->at_vacuum_cost_delay;
+ 		MyWorkerInfo->wi_cost_delay_base = tab->at_vacuum_cost_delay;
  		MyWorkerInfo->wi_cost_limit = tab->at_vacuum_cost_limit;
  		MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;
  
  		/* do a balance */
  		autovac_balance_cost();
  
  		/* done */
  		LWLockRelease(AutovacuumLock);
  
+ 		/* set the active cost parameters from the result of that */
+ 		AutoVacuumUpdateDelay();
+ 
  		/* clean up memory before each iteration */
  		MemoryContextResetAndDeleteChildren(PortalContext);
  
***************
*** 2381,2390 **** deleted:
  		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
  		MyWorkerInfo->wi_tableoid = InvalidOid;
  		LWLockRelease(AutovacuumLock);
- 
- 		/* restore vacuum cost GUCs for the next iteration */
- 		VacuumCostDelay = stdVacuumCostDelay;
- 		VacuumCostLimit = stdVacuumCostLimit;
  	}
  
  	/*
--- 2496,2501 ----
***************
*** 2532,2550 **** table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  		 * defaults, autovacuum's own first and plain vacuum second.
  		 */
  
! 		/* -1 in autovac setting means use plain vacuum_cost_delay */
! 		vac_cost_delay = (avopts && avopts->vacuum_cost_delay >= 0)
! 			? avopts->vacuum_cost_delay
! 			: (autovacuum_vac_cost_delay >= 0)
! 			? autovacuum_vac_cost_delay
! 			: VacuumCostDelay;
! 
! 		/* 0 or -1 in autovac setting means use plain vacuum_cost_limit */
! 		vac_cost_limit = (avopts && avopts->vacuum_cost_limit > 0)
! 			? avopts->vacuum_cost_limit
! 			: (autovacuum_vac_cost_limit > 0)
! 			? autovacuum_vac_cost_limit
! 			: VacuumCostLimit;
  
  		/* these do not have autovacuum-specific settings */
  		freeze_min_age = (avopts && avopts->freeze_min_age >= 0)
--- 2643,2650 ----
  		 * defaults, autovacuum's own first and plain vacuum second.
  		 */
  
! 		vac_cost_delay = choose_vacuum_cost_delay(avopts);
! 		vac_cost_limit = choose_vacuum_cost_limit(avopts);
  
  		/* these do not have autovacuum-specific settings */
  		freeze_min_age = (avopts && avopts->freeze_min_age >= 0)
***************
*** 2935,2940 **** AutoVacuumShmemInit(void)
--- 3035,3042 ----
  		dlist_init(&AutoVacuumShmem->av_freeWorkers);
  		dlist_init(&AutoVacuumShmem->av_runningWorkers);
  		AutoVacuumShmem->av_startingWorker = NULL;
+ 		AutoVacuumShmem->av_vacuum_cost_limit = 0;
+ 		AutoVacuumShmem->av_vacuum_cost_delay = 0;
  
  		worker = (WorkerInfo) ((char *) AutoVacuumShmem +
  							   MAXALIGN(sizeof(AutoVacuumShmemStruct)));

#18

mark.kirkwood@catalyst.net.nz

over 11 years ago

In reply to: Alvaro Herrera (#17)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 27/08/14 10:27, Alvaro Herrera wrote:

Alvaro Herrera wrote:

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay.

Here's a patch that implements this idea. As you see this is quite a
bit more complicated that Haribabu's proposal.

There are two holes in this:

1. if you ALTER DATABASE to change vacuum delay for a database, those
values are not considered in the global equiv delay. I don't think this
is very important and anyway we haven't considered this very much, so
it's okay if we don't handle it.

2. If you have a "fast worker" that's only slightly faster than regular
workers, it will become slower in some cases. This is explained in a
FIXME comment in the patch.

I don't really have any more time to invest in this, but I would like to
see it in 9.4. Mark, would you test this? Haribabu, how open are you
to fixing point (2) above?

Thanks Alvaro - I will take a look.

regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

kommi.haribabu@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#17)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Wed, Aug 27, 2014 at 8:27 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Alvaro Herrera wrote:

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay.

Here's a patch that implements this idea. As you see this is quite a
bit more complicated that Haribabu's proposal.

There are two holes in this:

1. if you ALTER DATABASE to change vacuum delay for a database, those
values are not considered in the global equiv delay. I don't think this
is very important and anyway we haven't considered this very much, so
it's okay if we don't handle it.

2. If you have a "fast worker" that's only slightly faster than regular
workers, it will become slower in some cases. This is explained in a
FIXME comment in the patch.

I don't really have any more time to invest in this, but I would like to
see it in 9.4. Mark, would you test this? Haribabu, how open are you
to fixing point (2) above?

Thanks Alvaro. I will check the point(2).

Regards,
Hari Babu
Fujitsu Australia

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

mark.kirkwood@catalyst.net.nz

over 11 years ago

In reply to: Alvaro Herrera (#17)

2 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 27/08/14 10:27, Alvaro Herrera wrote:

Alvaro Herrera wrote:

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay.

Here's a patch that implements this idea. As you see this is quite a
bit more complicated that Haribabu's proposal.

There are two holes in this:

1. if you ALTER DATABASE to change vacuum delay for a database, those
values are not considered in the global equiv delay. I don't think this
is very important and anyway we haven't considered this very much, so
it's okay if we don't handle it.

2. If you have a "fast worker" that's only slightly faster than regular
workers, it will become slower in some cases. This is explained in a
FIXME comment in the patch.

I don't really have any more time to invest in this, but I would like to
see it in 9.4. Mark, would you test this? Haribabu, how open are you
to fixing point (2) above?

I did some testing with this patch applied.

Minimally tweaking autovacuum (naptime of 5s) with a single table
'cache0' created with a cost limit setting of 10000, running:

$ pgbench -n -c8 -T300 -f volatile0.sql cache

and monitoring the size of 'cache0' table showed a steady state of:

cache=# SELECT pg_relation_size('cache0')/(1024*1024) AS mb;
mb
------
85

So far so good. Adding another table 'cache1' similar to the previous
but lacking any per table autovacuum settings, and running 2 pgbench
sessions:

$ pgbench -n -c8 -T300 -f volatile0.sql cache
$ pgbench -n -c8 -T300 -f volatile1.sql cache

(volatile1.sql just uses table 'cache1' instead of 'cache0') shows after
a few minutes:

cache=# SELECT relname,pg_relation_size(oid)/(1024*1024) AS mb
FROM pg_class WHERE relname like 'cache_';
relname | mb
---------+--------
cache0 | 664
cache1 | 1900

So we are definitely seeing the 'fast' worker being slowed down. Also,
the growth of 'cache1' was only a bit faster than 'cache0' - so the
'slow' worker was getting a speed boost was well.

So looks like good progress, but yeah - point (2) is obviously rearing
its head in this test.

Cheers

Mark

#21

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#16)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Tue, Aug 26, 2014 at 12:19 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
I could think of 2 ways to change this:

a. if user has specified cost_limit value for table, then it just uses it
rather than rebalancing based on value of system-wide guc variable
autovacuum_vacuum_cost_limit
b. another could be to restrict setting per-table value to be lesser than
system-wide value?

The former is used for auto vacuum parameters like scale_factor and
later is used for parameters like freeze_max_age.

I've been giving some thought to this. Really, there is no way to
handle this sensibly while at the same time keeping the documented
behavior -- or in other words, what we have documented is not useful
behavior. Your option (b) above is an easy solution to the problem,
however it means that the user will have serious trouble configuring the
system in scenarios such as volatile tables, as Mark says -- essentially
that will foreclose the option of using autovacuum for them.

I'm not sure I like your (a) proposal much better. One problem there is
that if you set the values for a table to be exactly the same values as
in postgresql.conf, it will behave completely different because it will
not participate in balancing. To me this seems to violate POLA.

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", ...

I favor option (a). There's something to be said for your proposal
in terms of logical consistency with what we have now, but to be
honest I'm not sure it's the behavior anyone wants (I would welcome
more feedback on what people actually want). I think we should view
an attempt to set a limit for a particular table as a way to control
the rate at which that table is vacuumed - period.

At least in situations that I've encountered, it's typical to be able
to determine the frequency with which a given table needs to be
vacuumed to avoid runaway bloat, and from that you can work backwards
to figure out how fast you must process it in MB/s, and from there you
can work backwards to figure out what cost delay will achieve that
effect. But if the system tinkers with the cost delay under the hood,
then you're vacuuming at a different (slower) rate and, of course, the
table bloats.

Now, in the case where you are setting an overall limit, there is at
least an argument to be made that you can determine the overall rate
of autovacuum-induced I/O activity that the system can tolerate, and
set your limits to stay within that budget, and then let the system
decide how to divide that I/O up between workers. But if you're
overriding a per-table limit, I don't really see how that holds any
water. The system I/O budget doesn't go up just because one
particular table is being vacuumed rather than any other. The only
plausible use case for setting a per-table rate that I can see is when
you actually want the system to use that exact rate for that
particular table.

I might be missing something, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#21)

Re: Per table autovacuum vacuum cost limit behaviour strange

Robert Haas wrote:

Now, in the case where you are setting an overall limit, there is at
least an argument to be made that you can determine the overall rate
of autovacuum-induced I/O activity that the system can tolerate, and
set your limits to stay within that budget, and then let the system
decide how to divide that I/O up between workers. But if you're
overriding a per-table limit, I don't really see how that holds any
water. The system I/O budget doesn't go up just because one
particular table is being vacuumed rather than any other. The only
plausible use case for setting a per-table rate that I can see is when
you actually want the system to use that exact rate for that
particular table.

Yeah, this makes sense to me too -- at least as long as you only have
one such table. But if you happen to have more than one, and due to
some bad luck they happen to be vacuumed concurrently, they will eat a
larger share of your I/O bandwidth budget than you anticipated, which
you might not like.

Thus what I am saying is that those should be scaled down too to avoid
peaks. Now, my proposal above mentioned subtracting the speed of tables
under the limit, from the speed of those above the limit; maybe we can
just rip that part out. Then we end up with the behavior you want, that
is to have the fast table vacuum as fast as it is configured when it's
the only fast table being vacuumed; and also with what I say, which is
that if you have two of them, the two balance the I/O consumption (but
only among themselves, not with the slow ones.)

Since figuring out this subtraction is the only thing missing from the
patch I posted, ISTM we could have something committable with very
little extra effort if we agree on this.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#22)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Thu, Aug 28, 2014 at 1:36 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

Now, in the case where you are setting an overall limit, there is at
least an argument to be made that you can determine the overall rate
of autovacuum-induced I/O activity that the system can tolerate, and
set your limits to stay within that budget, and then let the system
decide how to divide that I/O up between workers. But if you're
overriding a per-table limit, I don't really see how that holds any
water. The system I/O budget doesn't go up just because one
particular table is being vacuumed rather than any other. The only
plausible use case for setting a per-table rate that I can see is when
you actually want the system to use that exact rate for that
particular table.

Yeah, this makes sense to me too -- at least as long as you only have
one such table. But if you happen to have more than one, and due to
some bad luck they happen to be vacuumed concurrently, they will eat a
larger share of your I/O bandwidth budget than you anticipated, which
you might not like.

I agree that you might not like that. But you might not like having
the table vacuumed slower than the configured rate, either. My
impression is that the time between vacuums isn't really all that
negotiable for some people. I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away. Now that is obviously more a matter for autovacuum_naptime
than this option, but the point seems general to me: if you need the
table vacuumed every N seconds, minutes, or hours, and it only gets
vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there
are other autovacuum workers running, the table is going to bloat.
That *might* be better than busting your I/O budget, but it might also
be (and I think frequently is) much worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#23)

Re: Per table autovacuum vacuum cost limit behaviour strange

Robert Haas wrote:

I agree that you might not like that. But you might not like having
the table vacuumed slower than the configured rate, either. My
impression is that the time between vacuums isn't really all that
negotiable for some people. I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away.

Wow, that's extreme. For that case you can set
autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
lets vacuum run at full speed -- no throttling at all. Would that
satisfy the concern?

Now that is obviously more a matter for autovacuum_naptime
than this option, but the point seems general to me: if you need the
table vacuumed every N seconds, minutes, or hours, and it only gets
vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there
are other autovacuum workers running, the table is going to bloat.

There might be another problem here which is that if you have all your
workers busy because they are vacuuming large tables that don't have
churn high enough to warrant disrupting the whole environment (thus low
cost_limit), then the table will bloat no matter what you set its cost
limit to. So there's not only a matter of a low enough naptime (which
is a bad thing for the rest of the system, also) but also one of
something similar to priority inversion; should you speed up the
vacuuming of those large tables so that one worker is freed soon enough
to get to the high-churn table?

Was the solution for that customer to set an external tool running
vacuum on that table?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#24)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Thu, Aug 28, 2014 at 4:56 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

I agree that you might not like that. But you might not like having
the table vacuumed slower than the configured rate, either. My
impression is that the time between vacuums isn't really all that
negotiable for some people. I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away.

Wow, that's extreme. For that case you can set
autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
lets vacuum run at full speed -- no throttling at all. Would that
satisfy the concern?

Well, maybe, if you want to run completely unthrottled. But I have no
evidence that's a common desire.

Now that is obviously more a matter for autovacuum_naptime
than this option, but the point seems general to me: if you need the
table vacuumed every N seconds, minutes, or hours, and it only gets
vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there
are other autovacuum workers running, the table is going to bloat.

There might be another problem here which is that if you have all your
workers busy because they are vacuuming large tables that don't have
churn high enough to warrant disrupting the whole environment (thus low
cost_limit), then the table will bloat no matter what you set its cost
limit to. So there's not only a matter of a low enough naptime (which
is a bad thing for the rest of the system, also) but also one of
something similar to priority inversion; should you speed up the
vacuuming of those large tables so that one worker is freed soon enough
to get to the high-churn table?

I don't think so. I continue to believe that the we need to provide
the user with the tools to be certain that table X will get vacuumed
at least every Y seconds/minutes/hours. To me, allowing the user to
set a rate that the system will not adjust or manipulate in any way
makes this a lot easier than anything else we might do.

Was the solution for that customer to set an external tool running
vacuum on that table?

Nope, we just changed settings.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

mark.kirkwood@catalyst.net.nz

over 11 years ago

In reply to: Alvaro Herrera (#24)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 29/08/14 08:56, Alvaro Herrera wrote:

Robert Haas wrote:

I agree that you might not like that. But you might not like having
the table vacuumed slower than the configured rate, either. My
impression is that the time between vacuums isn't really all that
negotiable for some people. I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away.

Wow, that's extreme. For that case you can set
autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
lets vacuum run at full speed -- no throttling at all. Would that
satisfy the concern?

Well no - you might have a whole lot of big tables that you want vacuum
to not get too aggressive on, but a few small tables that are highly
volatile. So you want *them* vacuumed really fast to prevent them
becoming huge tables with only a few rows therein, but your system might
not be able to handle *all* your tables being vacuum full speed.

This is a fairly common scenario for (several) web CMS systems that tend
to want to have session and/cache tables that are small and extremely
volatile, plus the rest of the (real) data that is bigger and vastly
less volatile.

While there is a valid objection along the lines of "don't use a
database instead of memcache", it does seem reasonable that Postgres
should be able to cope with this type of workload.

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Mark Kirkwood (#26)

Re: Per table autovacuum vacuum cost limit behaviour strange

Mark Kirkwood wrote:

On 29/08/14 08:56, Alvaro Herrera wrote:

Robert Haas wrote:

I agree that you might not like that. But you might not like having
the table vacuumed slower than the configured rate, either. My
impression is that the time between vacuums isn't really all that
negotiable for some people. I had one customer who had horrible bloat
issues on a table that was vacuumed every minute; when we changed the
configuration so that it was vacuumed every 15 seconds, those problems
went away.

Wow, that's extreme. For that case you can set
autovacuum_vacuum_cost_limit to 0, which disables the whole thing and
lets vacuum run at full speed -- no throttling at all. Would that
satisfy the concern?

Well no - you might have a whole lot of big tables that you want
vacuum to not get too aggressive on, but a few small tables that are
highly volatile. So you want *them* vacuumed really fast to prevent
them becoming huge tables with only a few rows therein, but your
system might not be able to handle *all* your tables being vacuum
full speed.

I meant setting cost limit to 0 *for those tables* only, not for all of
them.

Anyway it seems to me maybe there is room for a new table storage
parameter, say autovacuum_do_balance which means to participate in the
balancing program or not.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

amit.kapila16@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#16)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Tue, Aug 26, 2014 at 9:49 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay. The highest
the value is for any table, the fastest it is vacuumed. (It makes sense
in physical terms: a higher cost_limit makes it faster, because vacuum
sleeps less often; and a higher cost_delay makes it go slower, because
vacuums sleeps for longer.) Now, the critical issue is to notice that
not all tables are equal; they can be split in two groups, those that go
faster than the global delay equivalent
(i.e. the effective values of GUC variables
autovacuum_vacuum_cost_limit/autovacuum_vacuum_cost_delay), and those
that go equal or slower. For the latter group, the rebalancing
algorithm "distributes" the allocated I/O by the global vars, in a
pro-rated manner. For the former group (tables vacuumed faster than
global delay equiv), to rebalance we don't consider the global delay
equiv but the delay equiv of the fastest table currently being vacuumed.

Suppose we have two tables, delay_equiv=10 each (which is the default
value). If they are both vacuumed in parallel, then we distribute a
delay_equiv of 5 to each (so set cost_limit=100, cost_delay=20). As
soon as one of them finishes, the remaining one is allowed to upgrade to
delay_equiv=10 (cost_limit=200, cost_delay=20).

Now add a third table, delay_equiv=500 (cost_limit=10000, cost_delay=20;
this is Mark's volatile table). If it's being vacuumed on its own, just
assign cost_limit=10000 cost_delay=20, as normal. If one of the other
two tables are being vacuumed, that one will use delay_equiv=10, as per
above. To balance the volatile table, we take the delay_equiv of this
one and subtract the already handed-out delay_equiv of 10; so we set the
volatile table to delay_equiv=490 (cost_limit=9800, cost_delay=20).

If we do it this way, the whole system is running at the full speed
enabled by the fastest table we have set the per-table options, but also
we have scaled things so that the slow tables go slow and the fast
tables go fast.

As a more elaborate example, add a fourth table with delay_equiv=50
(cost_limit=1000, cost_delay=20). This is also faster than the global
vars, so we put it in the first group. If all four tables are being
vacuumed in parallel, we have the two slow tables going at delay_equiv=5
each (cost_limit=100, cost_delay=20); then there are delay_equiv=490 to
distribute among the remaining ones; pro-rating this we have
delay_equiv=445 (cost_limit=8900, cost_delay=20) for the volatile table
and delay_equiv=45 (cost_limit=900, cost_delay=20) for the other one.

How will this calculation behave if third table has delay_equiv = 30
and fourth table has delay_equiv = 20 which are both greater than
default delay_equiv = 10, so they will participate in fast group, as
per my understanding from above calculation both might get same
delay_equiv, but I might be wrong because still your patch has
FixMe and I haven't yet fully understood the code of patch.

In general, I have a feeling that distributing vacuuming speed is
a good way to tune the system, however if user wants to override
that by providing specific values for particular tables, we should
honour that setting.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#29

amit.kapila16@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#22)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Thu, Aug 28, 2014 at 11:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Robert Haas wrote:

Now, in the case where you are setting an overall limit, there is at
least an argument to be made that you can determine the overall rate
of autovacuum-induced I/O activity that the system can tolerate, and
set your limits to stay within that budget, and then let the system
decide how to divide that I/O up between workers. But if you're
overriding a per-table limit, I don't really see how that holds any
water. The system I/O budget doesn't go up just because one
particular table is being vacuumed rather than any other. The only
plausible use case for setting a per-table rate that I can see is when
you actually want the system to use that exact rate for that
particular table.

Yeah, this makes sense to me too -- at least as long as you only have
one such table. But if you happen to have more than one, and due to
some bad luck they happen to be vacuumed concurrently, they will eat a
larger share of your I/O bandwidth budget than you anticipated, which
you might not like.

I think to control I/O bandwidth, there should be a separate mechanism
(may be similar to what Simon proposed for WAL rate limiting) rather
than by changing user specified values internally where he might
specifically want that value to be used. This can give more predictable
results which user can control.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#30

kommi.haribabu@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#17)

1 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Wed, Aug 27, 2014 at 8:27 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Alvaro Herrera wrote:

So my proposal is a bit more complicated. First we introduce the notion
of a single number, to enable sorting and computations: the "delay
equivalent", which is the cost_limit divided by cost_delay.

Here's a patch that implements this idea. As you see this is quite a
bit more complicated that Haribabu's proposal.

There are two holes in this:

1. if you ALTER DATABASE to change vacuum delay for a database, those
values are not considered in the global equiv delay. I don't think this
is very important and anyway we haven't considered this very much, so
it's okay if we don't handle it.

In the attached patch, I changed the balance_cost function to get the
global cost values.
With this change the above problem can be addressed.

2. If you have a "fast worker" that's only slightly faster than regular
workers, it will become slower in some cases. This is explained in a
FIXME comment in the patch.

I changed as follows to handle these scenarios,

If fast workers equiv_delay is more than double global equiv_delay then the
global equiv_delay is decreased from fast workers equiv_delay and distribute
the same among fast workers.

If the difference delay between fast workers equiv_delay to global equiv_delay
is less than global equiv_delay, the fast workers equiv_delay along with
global equiv_delay is changed as below.

fast_equiv_delay -= (diff_equiv_delay) / num_fast_workers

global_equiv_delay -= (diff_equiv_delay) / num_regular_workers

Sorry for the late reply. Please let me know your comments.

Regards,
Hari Babu
Fujitsu Australia

Attachments:

per_table_vacuum_para_v4.patchapplication/octet-stream; name=per_table_vacuum_para_v4.patchDownload

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 223,231 **** typedef struct WorkerInfoData
  	Oid			wi_tableoid;
  	PGPROC	   *wi_proc;
  	TimestampTz wi_launchtime;
! 	int			wi_cost_delay;
! 	int			wi_cost_limit;
  	int			wi_cost_limit_base;
  } WorkerInfoData;
  
  typedef struct WorkerInfoData *WorkerInfo;
--- 223,234 ----
  	Oid			wi_tableoid;
  	PGPROC	   *wi_proc;
  	TimestampTz wi_launchtime;
! 	/* the "base" values are the configured values */
  	int			wi_cost_limit_base;
+ 	int			wi_cost_delay_base;
+ 	/* these are the ones actually in effect, considering balancing */
+ 	int			wi_cost_limit;
+ 	int			wi_cost_delay;
  } WorkerInfoData;
  
  typedef struct WorkerInfoData *WorkerInfo;
***************
*** 248,253 **** typedef enum
--- 251,257 ----
   *
   * av_signal		set by other processes to indicate various conditions
   * av_launcherpid	the PID of the autovacuum launcher
+  * av_vacuum_cost_*	globally configured values for the cost_delay feature
   * av_freeWorkers	the WorkerInfo freelist
   * av_runningWorkers the WorkerInfo non-free queue
   * av_startingWorker pointer to WorkerInfo currently being started (cleared by
***************
*** 261,266 **** typedef struct
--- 265,272 ----
  {
  	sig_atomic_t av_signal[AutoVacNumSignals];
  	pid_t		av_launcherpid;
+ 	int			av_vacuum_cost_delay;
+ 	int			av_vacuum_cost_limit;
  	dlist_head	av_freeWorkers;
  	dlist_head	av_runningWorkers;
  	WorkerInfo	av_startingWorker;
***************
*** 296,301 **** static List *get_database_list(void);
--- 302,309 ----
  static void rebuild_database_list(Oid newdb);
  static int	db_comparator(const void *a, const void *b);
  static void autovac_balance_cost(void);
+ static int choose_vacuum_cost_delay(AutoVacOpts *avopts);
+ static int choose_vacuum_cost_limit(AutoVacOpts *avopts);
  
  static void do_autovacuum(void);
  static void FreeWorkerInfo(int code, Datum arg);
***************
*** 1717,1722 **** FreeWorkerInfo(int code, Datum arg)
--- 1725,1731 ----
  		MyWorkerInfo->wi_proc = NULL;
  		MyWorkerInfo->wi_launchtime = 0;
  		MyWorkerInfo->wi_cost_delay = 0;
+ 		MyWorkerInfo->wi_cost_delay_base = 0;
  		MyWorkerInfo->wi_cost_limit = 0;
  		MyWorkerInfo->wi_cost_limit_base = 0;
  		dlist_push_head(&AutoVacuumShmem->av_freeWorkers,
***************
*** 1742,1749 **** AutoVacuumUpdateDelay(void)
--- 1751,1760 ----
  {
  	if (MyWorkerInfo)
  	{
+ 		LWLockAcquire(AutovacuumLock, LW_SHARED);
  		VacuumCostDelay = MyWorkerInfo->wi_cost_delay;
  		VacuumCostLimit = MyWorkerInfo->wi_cost_limit;
+ 		LWLockRelease(AutovacuumLock);
  	}
  }
  
***************
*** 1756,1830 **** AutoVacuumUpdateDelay(void)
  static void
  autovac_balance_cost(void)
  {
- 	/*
- 	 * The idea here is that we ration out I/O equally.  The amount of I/O
- 	 * that a worker can consume is determined by cost_limit/cost_delay, so we
- 	 * try to equalize those ratios rather than the raw limit settings.
- 	 *
- 	 * note: in cost_limit, zero also means use value from elsewhere, because
- 	 * zero is not a valid value.
- 	 */
- 	int			vac_cost_limit = (autovacuum_vac_cost_limit > 0 ?
- 								autovacuum_vac_cost_limit : VacuumCostLimit);
- 	int			vac_cost_delay = (autovacuum_vac_cost_delay >= 0 ?
- 								autovacuum_vac_cost_delay : VacuumCostDelay);
- 	double		cost_total;
- 	double		cost_avail;
  	dlist_iter	iter;
! 
! 	/* not set? nothing to do */
! 	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
  		return;
  
! 	/* caculate the total base cost limit of active workers */
! 	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
! 		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
! 			cost_total +=
! 				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
  	}
- 	/* there are no cost limits -- nothing to do */
- 	if (cost_total <= 0)
- 		return;
  
  	/*
! 	 * Adjust cost limit of each active worker to balance the total of cost
! 	 * limit to autovacuum_vacuum_cost_limit.
  	 */
- 	cost_avail = (double) vac_cost_limit / vac_cost_delay;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
! 		if (worker->wi_proc != NULL &&
! 			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  		{
- 			int			limit = (int)
- 			(cost_avail * worker->wi_cost_limit_base / cost_total);
  
! 			/*
! 			 * We put a lower bound of 1 on the cost_limit, to avoid division-
! 			 * by-zero in the vacuum code.  Also, in case of roundoff trouble
! 			 * in these calculations, let's be sure we don't ever set
! 			 * cost_limit to more than the base value.
! 			 */
! 			worker->wi_cost_limit = Max(Min(limit,
! 											worker->wi_cost_limit_base),
! 										1);
! 
! 			elog(DEBUG2, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_limit_base=%d, cost_delay=%d)",
! 				 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
! 				 worker->wi_cost_limit, worker->wi_cost_limit_base,
! 				 worker->wi_cost_delay);
  		}
  	}
  }
  
  /*
   * get_database_list
   *		Return a list of all databases found in pg_database.
   *
--- 1767,1958 ----
  static void
  autovac_balance_cost(void)
  {
  	dlist_iter	iter;
! 	int			num_regular_workers = 0;
! 	int			num_fast_workers = 0;
! 	float4		global_equiv_delay;
! 	float4		fast_equiv_delay;
! 	float4		regular_total_equiv_delay;
! 	float4		fast_total_equiv_delay;
! 
! 	/* not set in this worker? nothing to do */
! 	if (MyWorkerInfo && (MyWorkerInfo->wi_cost_limit_base <= 0 ||
! 						 MyWorkerInfo->wi_cost_delay_base <= 0))
  		return;
  
! 	AutoVacuumShmem->av_vacuum_cost_limit = choose_vacuum_cost_limit(NULL);
! 	AutoVacuumShmem->av_vacuum_cost_delay = choose_vacuum_cost_delay(NULL);
! 
! 	/*
! 	 * We use a metric we call "equivalent delay", equal to cost_limit divided
! 	 * by cost_delay, to enable a reasonably simple algorithm to distribute
! 	 * vacuum I/O bandwidth among all active workers.  The "global" equiv delay
! 	 * is the value computed from the GUC parameters in effect for this
! 	 * database. This value lets us split workers in two classes: regular workers
! 	 * are those that have an equiv delay less than or equal to global equiv delay,
! 	 * while fast workers are those that have a value greater than global equiv
! 	 * delay.
! 	 */
! 	global_equiv_delay = AutoVacuumShmem->av_vacuum_cost_limit /
! 		AutoVacuumShmem->av_vacuum_cost_delay;
! 
! 	/*
! 	 * Find the sum of equiv delay values in each class; also find the equiv
! 	 * delay of the fastest among all the fast workers.
! 	 */
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+ 		float4		this_equiv_delay;
+ 
+ 		/* ignore inactive workers, and workers not using cost_delay */
+ 		if (worker->wi_proc == NULL ||
+ 			worker->wi_cost_limit_base <= 0 || worker->wi_cost_delay_base <= 0)
+ 			continue;
+ 
+ 		this_equiv_delay = worker->wi_cost_limit_base / worker->wi_cost_delay_base;
  
! 		if (this_equiv_delay <= global_equiv_delay)
! 		{
! 			/* A regular worker -- count it and add its equiv delay as such */
! 			num_regular_workers++;
! 			regular_total_equiv_delay += this_equiv_delay;
! 		}
! 		else
! 		{
! 			/*
! 			 * A fast worker.  Count and add its equiv delay in a separate
! 			 * total; we also use these to determine the fast_equiv_delay which
! 			 * is the value which we will distribute to all of them, separately
! 			 * from global_equiv_delay.
! 			 */
! 			if (num_fast_workers == 0)
! 				fast_equiv_delay = this_equiv_delay;
! 			else if (this_equiv_delay > fast_equiv_delay)
! 				fast_equiv_delay = this_equiv_delay;
! 			num_fast_workers++;
! 			fast_total_equiv_delay += this_equiv_delay;
! 		}
! 	}
! 
! 	/*
! 	 * We shouldn't actually distribute all of global equiv delay to
! 	 * regular workers and all of (fast equiv delay - global equiv delay) to
! 	 * fast workers; that makes fast workers much slower if the fast equiv
! 	 * delay is only slightly faster than regular workers.  For example,
! 	 * consider the scenario with one regular worker with equiv_delay=10 and
! 	 * one fast worker with equiv_delay=11; after subtraction the fast worker
! 	 * will have equiv_delay=1).  This part needs more fiddling to avoid this
! 	 * problem.
! 	 *
! 	 * Perhaps the way to solve this is to consider that all workers (including
! 	 * fast ones) get their fraction of global delay, and then fast workers get
! 	 * their pro-rated share of fast equiv delay *added* to that.  This would
! 	 * make regular workers slower.
! 	 *
! 	 * Idea is to subtract only a fraction of global_equiv_delay from
! 	 * fast_equiv_delay here, rather than all of it (and of course decrease
! 	 * global_equiv_delay accordingly).  The fraction would be variable: if
! 	 * fast_equiv_delay is much higher than global_equiv_delay, then subtract
! 	 * all of it; if both values are close enough, subtract a pro-rated fraction
! 	 * according to the number of workers in each class.
! 	 */
! 	if (num_regular_workers > 0)
! 	{
! 		float4 diff_equiv_delay = fast_equiv_delay - global_equiv_delay;
! 
! 		if (diff_equiv_delay < global_equiv_delay)
! 		{
! 			fast_equiv_delay -= diff_equiv_delay / num_fast_workers;
! 			global_equiv_delay -= diff_equiv_delay / num_regular_workers;
! 		}
! 		else
! 			fast_equiv_delay -= global_equiv_delay;
  	}
  
  	/*
! 	 * Now we have all parameters we need; compute the values for individual
! 	 * workers.
  	 */
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
+ 		float4		this_equiv_delay_base;
+ 		float4		this_equiv_delay_frac;
+ 
+ 		/* again, ignore inactive workers, and workers not using cost_delay */
+ 		if (worker->wi_proc == NULL ||
+ 			worker->wi_cost_limit_base <= 0 || worker->wi_cost_delay_base <= 0)
+ 			continue;
  
! 		this_equiv_delay_base = worker->wi_cost_limit_base / worker->wi_cost_delay_base;
! 
! 		/*
! 		 * If it's a regular worker, use pro-rated fraction of global_equiv_delay;
! 		 * otherwise, use pro-rated fraction of fast_total_equiv_delay
! 		 */
! 		if (this_equiv_delay_base <= global_equiv_delay)
  		{
  
! 			/* A regular worker; use a pro-rated fraction of global_equiv_delay */
! 			this_equiv_delay_frac =
! 				this_equiv_delay_base * global_equiv_delay / regular_total_equiv_delay;
  		}
+ 		else
+ 		{
+ 			/* a fast worker: use a pro-rated fraction of fast_equiv_delay */
+ 			this_equiv_delay_frac =
+ 				this_equiv_delay_base * fast_equiv_delay / fast_total_equiv_delay;
+ 		}
+ 
+ 		/*
+ 		 * Convert back into cost_limit and cost_delay values, and set them
+ 		 * into the worker's shmem struct.  We put a lower bound of 1 to the
+ 		 * cost_limit, to avoid a division-by-zero in the vacuum code.
+ 		 */
+ 		worker->wi_cost_limit = Max(this_equiv_delay_frac *
+ 									worker->wi_cost_delay_base,
+ 									1);
+ 		worker->wi_cost_delay = worker->wi_cost_delay_base;
+ 
+ 		elog(LOG, "autovac_balance_cost(pid=%u db=%u, rel=%u, cost_limit=%d, cost_delay=%d)",
+ 			 worker->wi_proc->pid, worker->wi_dboid, worker->wi_tableoid,
+ 			 worker->wi_cost_limit, worker->wi_cost_delay);
  	}
  }
  
  /*
+  * Determine the vacuum_cost_delay value to use: if this is a table and it has
+  * reloptions, use the value from there; otherwise the autovacuum parameter,
+  * unless it's -1 in which case we use plain vacuum_cost_delay.
+  */
+ static int
+ choose_vacuum_cost_delay(AutoVacOpts *avopts)
+ {
+ 	if (avopts && avopts->vacuum_cost_delay >= 0)
+ 		return avopts->vacuum_cost_delay;
+ 	if (autovacuum_vac_cost_delay >= 0)
+ 		return autovacuum_vac_cost_delay;
+ 	return VacuumCostDelay;
+ }
+ 
+ /*
+  * Determine the vacuum_cost_limit value to use: if this is a table and it has
+  * reloptions, use the value from there; otherwise the autovacuum parameter,
+  * unless it's -1 or 0 in which case we use plain vacuum_cost_limit.
+  */
+ static int
+ choose_vacuum_cost_limit(AutoVacOpts *avopts)
+ {
+ 	/* 0 is an invalid value here, so avoid that */
+ 	if (avopts && avopts->vacuum_cost_limit > 0)
+ 		return avopts->vacuum_cost_limit;
+ 	if (autovacuum_vac_cost_limit > 0)
+ 		return autovacuum_vac_cost_limit;
+ 	return VacuumCostLimit;
+ }
+ 
+ /*
   * get_database_list
   *		Return a list of all databases found in pg_database.
   *
***************
*** 2202,2209 **** do_autovacuum(void)
  		Oid			relid = lfirst_oid(cell);
  		autovac_table *tab;
  		bool		skipit;
- 		int			stdVacuumCostDelay;
- 		int			stdVacuumCostLimit;
  		dlist_iter	iter;
  
  		CHECK_FOR_INTERRUPTS();
--- 2330,2335 ----
***************
*** 2272,2302 **** do_autovacuum(void)
  		MyWorkerInfo->wi_tableoid = relid;
  		LWLockRelease(AutovacuumScheduleLock);
  
- 		/*
- 		 * Remember the prevailing values of the vacuum cost GUCs.  We have to
- 		 * restore these at the bottom of the loop, else we'll compute wrong
- 		 * values in the next iteration of autovac_balance_cost().
- 		 */
- 		stdVacuumCostDelay = VacuumCostDelay;
- 		stdVacuumCostLimit = VacuumCostLimit;
- 
  		/* Must hold AutovacuumLock while mucking with cost balance info */
  		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
  
  		/* advertise my cost delay parameters for the balancing algorithm */
  		MyWorkerInfo->wi_cost_delay = tab->at_vacuum_cost_delay;
  		MyWorkerInfo->wi_cost_limit = tab->at_vacuum_cost_limit;
  		MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;
  
  		/* do a balance */
  		autovac_balance_cost();
  
- 		/* set the active cost parameters from the result of that */
- 		AutoVacuumUpdateDelay();
- 
  		/* done */
  		LWLockRelease(AutovacuumLock);
  
  		/* clean up memory before each iteration */
  		MemoryContextResetAndDeleteChildren(PortalContext);
  
--- 2398,2421 ----
  		MyWorkerInfo->wi_tableoid = relid;
  		LWLockRelease(AutovacuumScheduleLock);
  
  		/* Must hold AutovacuumLock while mucking with cost balance info */
  		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
  
  		/* advertise my cost delay parameters for the balancing algorithm */
  		MyWorkerInfo->wi_cost_delay = tab->at_vacuum_cost_delay;
+ 		MyWorkerInfo->wi_cost_delay_base = tab->at_vacuum_cost_delay;
  		MyWorkerInfo->wi_cost_limit = tab->at_vacuum_cost_limit;
  		MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;
  
  		/* do a balance */
  		autovac_balance_cost();
  
  		/* done */
  		LWLockRelease(AutovacuumLock);
  
+ 		/* set the active cost parameters from the result of that */
+ 		AutoVacuumUpdateDelay();
+ 
  		/* clean up memory before each iteration */
  		MemoryContextResetAndDeleteChildren(PortalContext);
  
***************
*** 2381,2390 **** deleted:
  		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
  		MyWorkerInfo->wi_tableoid = InvalidOid;
  		LWLockRelease(AutovacuumLock);
- 
- 		/* restore vacuum cost GUCs for the next iteration */
- 		VacuumCostDelay = stdVacuumCostDelay;
- 		VacuumCostLimit = stdVacuumCostLimit;
  	}
  
  	/*
--- 2500,2505 ----
***************
*** 2532,2550 **** table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  		 * defaults, autovacuum's own first and plain vacuum second.
  		 */
  
! 		/* -1 in autovac setting means use plain vacuum_cost_delay */
! 		vac_cost_delay = (avopts && avopts->vacuum_cost_delay >= 0)
! 			? avopts->vacuum_cost_delay
! 			: (autovacuum_vac_cost_delay >= 0)
! 			? autovacuum_vac_cost_delay
! 			: VacuumCostDelay;
! 
! 		/* 0 or -1 in autovac setting means use plain vacuum_cost_limit */
! 		vac_cost_limit = (avopts && avopts->vacuum_cost_limit > 0)
! 			? avopts->vacuum_cost_limit
! 			: (autovacuum_vac_cost_limit > 0)
! 			? autovacuum_vac_cost_limit
! 			: VacuumCostLimit;
  
  		/* these do not have autovacuum-specific settings */
  		freeze_min_age = (avopts && avopts->freeze_min_age >= 0)
--- 2647,2654 ----
  		 * defaults, autovacuum's own first and plain vacuum second.
  		 */
  
! 		vac_cost_delay = choose_vacuum_cost_delay(avopts);
! 		vac_cost_limit = choose_vacuum_cost_limit(avopts);
  
  		/* these do not have autovacuum-specific settings */
  		freeze_min_age = (avopts && avopts->freeze_min_age >= 0)
***************
*** 2935,2940 **** AutoVacuumShmemInit(void)
--- 3039,3046 ----
  		dlist_init(&AutoVacuumShmem->av_freeWorkers);
  		dlist_init(&AutoVacuumShmem->av_runningWorkers);
  		AutoVacuumShmem->av_startingWorker = NULL;
+ 		AutoVacuumShmem->av_vacuum_cost_limit = 0;
+ 		AutoVacuumShmem->av_vacuum_cost_delay = 0;
  
  		worker = (WorkerInfo) ((char *) AutoVacuumShmem +
  							   MAXALIGN(sizeof(AutoVacuumShmemStruct)));

#31

Gregory Smith

gregsmithpgsql@gmail.com

over 11 years ago

In reply to: Robert Haas (#21)

Re: Per table autovacuum vacuum cost limit behaviour strange

On 8/28/14, 12:18 PM, Robert Haas wrote:

At least in situations that I've encountered, it's typical to be able
to determine the frequency with which a given table needs to be
vacuumed to avoid runaway bloat, and from that you can work backwards
to figure out how fast you must process it in MB/s, and from there you
can work backwards to figure out what cost delay will achieve that
effect. But if the system tinkers with the cost delay under the hood,
then you're vacuuming at a different (slower) rate and, of course, the
table bloats.

The last time I took a whack at this, I worked toward making all of the
parameters operate in terms of target MB/s, for exactly this style of
thinking and goal. Those converted into the same old mechanism under
the hood and I got the math right to give the same behavior for the
simple cases, but that could have been simplified eventually. I
consider that line of thinking to be the only useful one here.

The answer I like to these values that don't inherit as expected in the
GUC tree is to nuke that style of interface altogether in favor of
simplifer bandwidth measured one, then perhaps add multiple QoS levels.
Certainly no interest in treating the overly complicated innards of cost
computation as a bug and fixing them with even more complicated behavior.

The part of this I was trying hard to find time to do myself by the next
CF was a better bloat measure tool needed to actually see the problem
better. With that in hand, and some nasty test cases, I wanted to come
back to simplified MB/s vacuum parameters with easier to understand
sharing rules again. If other people are hot to go on that topic, I
don't care if I actually do the work; I just have a pretty clear view of
what I think people want.

The only plausible use case for setting a per-table rate that I can
see is when you actually want the system to use that exact rate for
that particular table. That's the main one, for these must run on
schedule or else jobs.

Yes.

On 8/29/14, 9:45 AM, Alvaro Herrera wrote:

Anyway it seems to me maybe there is room for a new table storage
parameter, say autovacuum_do_balance which means to participate in the
balancing program or not.

If that eliminates some of the hairy edge cases, sure.

A useful concept to consider is having a soft limit that most thing work
against, along with a total hard limit for the server. When one of
these tight schedule queries with !autovacuum_do_balance starts, they
must run at their designed speed with no concern for anyone else. Which
means:

a) Their bandwidth gets pulled out of the regular, soft limit numbers
until they're done. Last time I had one of these jobs, once the big
important boys were running, everyone else in the regular shared set
were capped at vacuum_cost_limit=5 worth of work. Just enough to keep
up with system catalog things, and over the course of many hours process
small tables.

b) If you try to submit multiple locked rate jobs at once, and the total
goes over the hard limit, they have to just be aborted. If the rush of
users comes back at 8AM, and you can clean the table up by then if you
give it 10MB/s, what you cannot do is let some other user decrease your
rate such that you're unfinished at 8AM. Then you'll have aggressive AV
competing against the user load you were trying to prepare for. It's
better to just throw a serious error that forces someone to look at the
hard limit budget and adjust the schedule instead. The systems with
this sort of problem are getting cleaned up every single day, almost
continuously; missing a day is not bad as long as it's noted and fixed
again before the next cleanup window.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#21)

Re: Per table autovacuum vacuum cost limit behaviour strange

Robert Haas wrote:

I favor option (a). There's something to be said for your proposal
in terms of logical consistency with what we have now, but to be
honest I'm not sure it's the behavior anyone wants (I would welcome
more feedback on what people actually want). I think we should view
an attempt to set a limit for a particular table as a way to control
the rate at which that table is vacuumed - period.

After re-reading this whole thread one more time, I think I have come to
agree with you and Amit here, because not only it is simpler to
implement, but it is also simpler to document. Per Greg Smith's opinion
elsewhere in the thread, it seems that for end users it doesn't make
sense to make the already complicated mechanism even more complicated.

So in essence what we're going to do is that the balance mechanism
considers only tables that don't have per-table configuration options;
for those that do, we will use the values configured there without any
changes.

I'll see about implementing this and making sure it finds its way to
9.4beta3.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#32)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Tue, Sep 30, 2014 at 6:16 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

I favor option (a). There's something to be said for your proposal
in terms of logical consistency with what we have now, but to be
honest I'm not sure it's the behavior anyone wants (I would welcome
more feedback on what people actually want). I think we should view
an attempt to set a limit for a particular table as a way to control
the rate at which that table is vacuumed - period.

After re-reading this whole thread one more time, I think I have come to
agree with you and Amit here, because not only it is simpler to
implement, but it is also simpler to document. Per Greg Smith's opinion
elsewhere in the thread, it seems that for end users it doesn't make
sense to make the already complicated mechanism even more complicated.

So in essence what we're going to do is that the balance mechanism
considers only tables that don't have per-table configuration options;
for those that do, we will use the values configured there without any
changes.

I'll see about implementing this and making sure it finds its way to
9.4beta3.

Cool!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#32)

1 attachment(s)

Re: Per table autovacuum vacuum cost limit behaviour strange

Alvaro Herrera wrote:

So in essence what we're going to do is that the balance mechanism
considers only tables that don't have per-table configuration options;
for those that do, we will use the values configured there without any
changes.

I'll see about implementing this and making sure it finds its way to
9.4beta3.

Here's a patch that makes it work as proposed.

How do people feel about back-patching this? On one hand it seems
there's a lot of fear of changing autovacuum behavior in back branches,
because for many production systems it has carefully been tuned; on the
other hand, it seems hard to believe that anyone has tuned the system to
work sanely given how insanely per-table options behave in the current
code.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

per_table_vacuum_para_v5.patchtext/x-diff; charset=us-asciiDownload

*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***************
*** 792,801 **** analyze threshold = analyze base threshold + analyze scale factor * number of tu
     </para>
  
     <para>
!     When multiple workers are running, the cost limit is
      <quote>balanced</quote> among all the running workers, so that the
!     total impact on the system is the same, regardless of the number
!     of workers actually running.
     </para>
    </sect2>
   </sect1>
--- 792,804 ----
     </para>
  
     <para>
!     When multiple workers are running, the cost delay parameters are
      <quote>balanced</quote> among all the running workers, so that the
!     total I/O impact on the system is the same regardless of the number
!     of workers actually running.  However, any workers processing tables whose
!     <literal>autovacuum_vacuum_cost_delay</> or
!     <literal>autovacuum_vacuum_cost_limit</> have been set are not considered
!     in the balancing algorithm.
     </para>
    </sect2>
   </sect1>
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 193,198 **** typedef struct autovac_table
--- 193,199 ----
  	int			at_multixact_freeze_table_age;
  	int			at_vacuum_cost_delay;
  	int			at_vacuum_cost_limit;
+ 	bool		at_dobalance;
  	bool		at_wraparound;
  	char	   *at_relname;
  	char	   *at_nspname;
***************
*** 223,228 **** typedef struct WorkerInfoData
--- 224,230 ----
  	Oid			wi_tableoid;
  	PGPROC	   *wi_proc;
  	TimestampTz wi_launchtime;
+ 	bool		wi_dobalance;
  	int			wi_cost_delay;
  	int			wi_cost_limit;
  	int			wi_cost_limit_base;
***************
*** 1716,1721 **** FreeWorkerInfo(int code, Datum arg)
--- 1718,1724 ----
  		MyWorkerInfo->wi_tableoid = InvalidOid;
  		MyWorkerInfo->wi_proc = NULL;
  		MyWorkerInfo->wi_launchtime = 0;
+ 		MyWorkerInfo->wi_dobalance = false;
  		MyWorkerInfo->wi_cost_delay = 0;
  		MyWorkerInfo->wi_cost_limit = 0;
  		MyWorkerInfo->wi_cost_limit_base = 0;
***************
*** 1776,1788 **** autovac_balance_cost(void)
  	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
  		return;
  
! 	/* caculate the total base cost limit of active workers */
  	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
  			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  			cost_total +=
  				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
--- 1779,1792 ----
  	if (vac_cost_limit <= 0 || vac_cost_delay <= 0)
  		return;
  
! 	/* calculate the total base cost limit of participating active workers */
  	cost_total = 0.0;
  	dlist_foreach(iter, &AutoVacuumShmem->av_runningWorkers)
  	{
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
+ 			worker->wi_dobalance &&
  			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  			cost_total +=
  				(double) worker->wi_cost_limit_base / worker->wi_cost_delay;
***************
*** 1801,1806 **** autovac_balance_cost(void)
--- 1805,1811 ----
  		WorkerInfo	worker = dlist_container(WorkerInfoData, wi_links, iter.cur);
  
  		if (worker->wi_proc != NULL &&
+ 			worker->wi_dobalance &&
  			worker->wi_cost_limit_base > 0 && worker->wi_cost_delay > 0)
  		{
  			int			limit = (int)
***************
*** 2285,2290 **** do_autovacuum(void)
--- 2290,2296 ----
  
  		/* advertise my cost delay parameters for the balancing algorithm */
  		MyWorkerInfo->wi_cost_delay = tab->at_vacuum_cost_delay;
+ 		MyWorkerInfo->wi_dobalance = tab->at_dobalance;
  		MyWorkerInfo->wi_cost_limit = tab->at_vacuum_cost_limit;
  		MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;
  
***************
*** 2579,2584 **** table_recheck_autovac(Oid relid, HTAB *table_toast_map,
--- 2585,2598 ----
  		tab->at_relname = NULL;
  		tab->at_nspname = NULL;
  		tab->at_datname = NULL;
+ 
+ 		/*
+ 		 * If any of the cost delay parameters has been set individually for
+ 		 * this table, disable the balancing algorithm.
+ 		 */
+ 		tab->at_dobalance =
+ 			!(avopts && (avopts->vacuum_cost_limit > 0 ||
+ 						 avopts->vacuum_cost_delay > 0));
  	}
  
  	heap_freetuple(classTup);

#35

robertmhaas@gmail.com

over 11 years ago

In reply to: Alvaro Herrera (#34)

Re: Per table autovacuum vacuum cost limit behaviour strange

On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Alvaro Herrera wrote:

So in essence what we're going to do is that the balance mechanism
considers only tables that don't have per-table configuration options;
for those that do, we will use the values configured there without any
changes.

I'll see about implementing this and making sure it finds its way to
9.4beta3.

Here's a patch that makes it work as proposed.

How do people feel about back-patching this? On one hand it seems
there's a lot of fear of changing autovacuum behavior in back branches,
because for many production systems it has carefully been tuned; on the
other hand, it seems hard to believe that anyone has tuned the system to
work sanely given how insanely per-table options behave in the current
code.

I agree with both of those arguments. I have run into very few
customers who have used the autovacuum settings to customize behavior
for particular tables, and anyone who hasn't should see no change
(right?), so my guess is that the practical impact of the change will
be pretty limited. On the other hand, it's a clear behavior change.
Someone could have set the per-table limit to something enormous and
never suffered from that setting because it has basically no effect as
things stand right now today; and that person might get an unpleasant
surprise when they update.

I would at least back-patch it to 9.4. I could go either way on
whether to back-patch into older branches. I lean mildly in favor of
it at the moment, but with considerable trepidation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#35)

Re: Per table autovacuum vacuum cost limit behaviour strange

* Robert Haas (robertmhaas@gmail.com) wrote:

On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Alvaro Herrera wrote:

So in essence what we're going to do is that the balance mechanism
considers only tables that don't have per-table configuration options;
for those that do, we will use the values configured there without any
changes.

I'll see about implementing this and making sure it finds its way to
9.4beta3.

Here's a patch that makes it work as proposed.

How do people feel about back-patching this? On one hand it seems
there's a lot of fear of changing autovacuum behavior in back branches,
because for many production systems it has carefully been tuned; on the
other hand, it seems hard to believe that anyone has tuned the system to
work sanely given how insanely per-table options behave in the current
code.

I agree with both of those arguments. I have run into very few
customers who have used the autovacuum settings to customize behavior
for particular tables, and anyone who hasn't should see no change
(right?), so my guess is that the practical impact of the change will
be pretty limited. On the other hand, it's a clear behavior change.
Someone could have set the per-table limit to something enormous and
never suffered from that setting because it has basically no effect as
things stand right now today; and that person might get an unpleasant
surprise when they update.

I would at least back-patch it to 9.4. I could go either way on
whether to back-patch into older branches. I lean mildly in favor of
it at the moment, but with considerable trepidation.

I'm fine with putting it into 9.4. I'm not sure that I see the value in
changing the back-branches and then having to deal with the "well, if
you're on 9.3.5 then X, but if you're on 9.3.6 then Y" or having to
figure out how to deal with the documentation for this.

Has there been any thought as to what pg_upgrade should do..?

Thanks,

Stephen

#37

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Stephen Frost (#36)

Re: Per table autovacuum vacuum cost limit behaviour strange

Stephen Frost wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

I agree with both of those arguments. I have run into very few
customers who have used the autovacuum settings to customize behavior
for particular tables, and anyone who hasn't should see no change
(right?), so my guess is that the practical impact of the change will
be pretty limited. On the other hand, it's a clear behavior change.
Someone could have set the per-table limit to something enormous and
never suffered from that setting because it has basically no effect as
things stand right now today; and that person might get an unpleasant
surprise when they update.

I would at least back-patch it to 9.4. I could go either way on
whether to back-patch into older branches. I lean mildly in favor of
it at the moment, but with considerable trepidation.

I'm fine with putting it into 9.4. I'm not sure that I see the value in
changing the back-branches and then having to deal with the "well, if
you're on 9.3.5 then X, but if you're on 9.3.6 then Y" or having to
figure out how to deal with the documentation for this.

Well, the value obviously is that we would fix the bug that Mark
Kirkwood reported that started this thread.

Basically, if you are on 9.3.5 or earlier any per-table options for
autovacuum cost delay will misbehave (meaning: any such table will be
processed with settings flattened according to balancing of the standard
options, _not_ the configured ones). If you are on 9.3.6 or newer they
will behave as described in the docs.

Has there been any thought as to what pg_upgrade should do..?

Yes, I'm thinking there's nothing it should do.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#37)

Re: Per table autovacuum vacuum cost limit behaviour strange

Alvaro Herrera wrote:

Basically, if you are on 9.3.5 or earlier any per-table options for
autovacuum cost delay will misbehave (meaning: any such table will be
processed with settings flattened according to balancing of the standard
options, _not_ the configured ones). If you are on 9.3.6 or newer they
will behave as described in the docs.

Another thing to note is that if you have configured a table to have
cost_limit *less* than the default (say 150 instead of the default 200),
the balance system will again break that and process the table at 200
instead; in other words, the balancing system has completely broken the
ability to tweak the cost system for individual tables in autovacuum.

With the current code you can change the *ratios* of cost_limit that
would be applied (by setting one table to 5000 and another to 150, when
they are vacuumed in parallel you will get them at 194 and 5 instead of
the values you set, but the total limit allowed will never be exceeded
no matter what you do.)

With the v5 patch, the example tables above will be vacuumed at exactly
5000 and 150 instead. The more complex patch I produced earlier would
have them vacuumed at something like 4900 and 100 instead, so you
wouldn't exceed the total of 5000. I think there is some value to that
idea, but it seems the complexity of managing this is too high.

The ill effect of the patch is that it will be possible to have
autovacuum utilize a much larger fraction of the I/O bandwidth. This
was also possible before, but only by setting the cost delay to 0 (in
effect disabling throttling for the table). If somebody is thinking
that you could mount an attack to a server by having tables that use all
the I/O bandwidth, well -- it's not a new problem so I don't think we
need to solve it with this patch.

I am rather surprised that nobody has reported this problem before. I
am now of the mind that this is clearly a bug that should be fixed all
the way back.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Alvaro Herrera (#38)

Re: Per table autovacuum vacuum cost limit behaviour strange

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

Alvaro Herrera wrote:

Basically, if you are on 9.3.5 or earlier any per-table options for
autovacuum cost delay will misbehave (meaning: any such table will be
processed with settings flattened according to balancing of the standard
options, _not_ the configured ones). If you are on 9.3.6 or newer they
will behave as described in the docs.

Another thing to note is that if you have configured a table to have
cost_limit *less* than the default (say 150 instead of the default 200),
the balance system will again break that and process the table at 200
instead; in other words, the balancing system has completely broken the
ability to tweak the cost system for individual tables in autovacuum.

That's certainly pretty ugly.

With the v5 patch, the example tables above will be vacuumed at exactly
5000 and 150 instead. The more complex patch I produced earlier would
have them vacuumed at something like 4900 and 100 instead, so you
wouldn't exceed the total of 5000. I think there is some value to that
idea, but it seems the complexity of managing this is too high.

Agreed.

I am rather surprised that nobody has reported this problem before. I
am now of the mind that this is clearly a bug that should be fixed all
the way back.

I'm coming around to that also, however, should we worry about users who
set per-table settings and then simply forgot about them? I suppose
that won't matter too much unless the table is really active, and if it
is, they've probably already set it to zero.

Thanks,

Stephen

#40