RFC: changing autovacuum_naptime semantics

Started by Alvaro Herreraalmost 19 years ago12 messages
#1Alvaro Herrera
alvherre@commandprompt.com

Hackers,

I want to propose some very simple changes to autovacuum in order to
move forward (a bit):

1. autovacuum_naptime semantics
2. limiting the number of workers: global, per database, per tablespace?

I still haven't received the magic bullet to solve the hot table
problem, but these at least means we continue doing *something*.

Changing autovacuum_naptime semantics

Are we agreed on changing autovacuum_naptime semantics? The idea is to
make it per-database instead of the current per-cluster, i.e., a "nap"
would be the minimum time that passes between starting one worker into a
database and starting another worker in the same database.

Currently, naptime is the time elapsed between two worker runs across
all databases. So if you have 15 databases, autovacuuming each one
takes place every 15*naptime.

Eventually, we could have per-database naptime defined in pg_database,
and do away with the autovacuum_naptime GUC param (or maybe keep it as a
default value). Say for database D1 you want to have workers every 60
seconds but for database D2 you want 1 hour.

Question:
Is everybody OK with changing the autovacuum_naptime semantics?

Limiting the number of workers

I was originally proposing having a GUC parameter which would limit the
cluster-wide maximum number of workers. Additionally we could have a
per-database limit (stored in a pg_database column), being simple to
implement. Josh Drake proposed getting rid of the GUC param, saying
that it would confuse users to set the per-database limit to some higher
value than the GUC setting and then finding the lower limit enforced
(presumably because of being unaware of it).

The problem is that we need to set shared memory up for workers, so we
really need a hard limit and it must be global. Thus the GUC param is
not optional.

Other people also proposed having a per-tablespace limit. This would
make a lot of sense, tablespaces being the natural I/O units. However,
I'm not very sure it's too easy to implement, because you can put half
of database D1 and half of database D2 in tablespace T1, and the two
other halves in tablespace T2. Then enforcing the limit becomes rather
complicated and will probably mean putting a worker to sleep. I think
it makes more sense to skip implementing per-tablespace limits for now,
and have a plan to put per-tablespace IO throttles in the future.

Questions:
Is everybody OK with not putting a per-tablespace worker limit?
Is everybody OK with putting per-database worker limits on a pg_database
column?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: RFC: changing autovacuum_naptime semantics

Alvaro Herrera <alvherre@commandprompt.com> writes:

Is everybody OK with changing the autovacuum_naptime semantics?

it seems already different from 8.2, so no objection to further change.

Is everybody OK with not putting a per-tablespace worker limit?
Is everybody OK with putting per-database worker limits on a pg_database
column?

I don't think we need a new pg_database column. If it's a GUC you can
do ALTER DATABASE SET, no? Or was that what you meant?

regards, tom lane

#3Galy Lee
lee.galy@oss.ntt.co.jp
In reply to: Alvaro Herrera (#1)
Re: RFC: changing autovacuum_naptime semantics

Alvaro,

Alvaro Herrera wrote:

I still haven't received the magic bullet to solve the hot table
problem, but these at least means we continue doing *something*.

Can I know about what is your plan or idea for autovacuum improvement
for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Thanks,

Galy Lee
lee.galy _at_ ntt.oss.co.jp
NTT Open Source Software Center

#4Jim Nasby
decibel@decibel.org
In reply to: Alvaro Herrera (#1)
Re: RFC: changing autovacuum_naptime semantics

On Mar 7, 2007, at 4:00 PM, Alvaro Herrera wrote:

Is everybody OK with putting per-database worker limits on a
pg_database
column?

I'm worried that we would live to regret such a limit. I can't really
see any reason to limit how many vacuums are occurring in a database,
because there's no limiting factor there; you're either going to be
IO bound (per-tablespace), or *maybe* CPU-bound (perhaps the
Greenplum folks could enlighten us as to whether they run into vacuum
being CPU-bound on thumpers).

Changing the naptime behavior to be database related makes perfect
sense, because the minimum XID you have to worry about is a per-
database thing; I just don't see limiting the number of vacuums as
being per-database, though. I'm also skeptical that we'll be able to
come up with a good way to limit the number of backends until we get
the hot table issue addressed. Perhaps a decent compromise for now
would be to limit how many 'small table' vacuums could run on each
tablespace, and then limit how many 'unlimited table size' vacuums
could run on each tablespace, where 'small table' would probably have
to be configurable. I don't think it's the best final solution, but
it should at least solve the immediate need.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#5Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#2)
Re: RFC: changing autovacuum_naptime semantics

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Is everybody OK with not putting a per-tablespace worker limit?
Is everybody OK with putting per-database worker limits on a pg_database
column?

I don't think we need a new pg_database column. If it's a GUC you can
do ALTER DATABASE SET, no? Or was that what you meant?

No, that doesn't work unless we save the datconfig column to the
pg_database flatfile, because it's the launcher (which is not connected)
who needs to read it. Same thing with an hypothetical per-database
naptime. The launcher would also need to parse it, which is not ideal
(though not a dealbreaker either).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Galy Lee (#3)
Re: RFC: changing autovacuum_naptime semantics

Galy Lee wrote:

Hi,

Alvaro Herrera wrote:

I still haven't received the magic bullet to solve the hot table
problem, but these at least means we continue doing *something*.

Can I know about what is your plan or idea for autovacuum improvement
for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Things I want to do for 8.3:

- Make use of the launcher/worker stuff, that is, allow multiple
autovacuum processes in parallel. With luck we'll find out how to
deal with hot tables.

Things I'm not sure we'll be able to have in 8.3, in which case I'll get
to them for early 8.4:

- The maintenance window stuff, i.e., being able to throttle workers
depending on a user-defined schedule.

8.4 material:

- per-tablespace throttling, coordinating IO from multiple workers

I don't have anything else as detailed as a "plan". If you have
suggestions, I'm all ears.

Now regarding your restartable vacuum work. I think that stopping a
vacuum at some point and being able to restart it later is very cool and
may get you some hot chicks, but I'm not sure it's really useful. I
think it makes more sense to do something like throttling an ongoing
vacuum to a reduced IO rate, if the maintenance window closes. So if
you're in the middle of a heap scan and the maintenance window closes,
you immediately stop the scan and go the the index cleanup phase, *at a
reduced IO rate*. This way, the user will be able to get the benefits
of vacuuming at some not-too-distant future, without requiring the
maintenance window to open again, but without the heavy IO impact that
was allowed during the maintenance window.

Does this make sense?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Galy Lee
lee.galy@oss.ntt.co.jp
In reply to: Alvaro Herrera (#6)
Re: RFC: changing autovacuum_naptime semantics

Alvaro Herrera wrote:

I don't have anything else as detailed as a "plan". If you have
suggestions, I'm all ears.

Cool, thanks for the update. :) We also have some new ideas on the
improvement of autovacuum now. I will raise it up later.

Now regarding your restartable vacuum work.
Does this make sense?

I also have reached a similar conclusion now. Thank you.

Regards
Galy

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: RFC: changing autovacuum_naptime semantics

Alvaro Herrera <alvherre@commandprompt.com> writes:

Now regarding your restartable vacuum work. I think that stopping a
vacuum at some point and being able to restart it later is very cool and
may get you some hot chicks, but I'm not sure it's really useful.

Too true :-(

I think it makes more sense to do something like throttling an ongoing
vacuum to a reduced IO rate, if the maintenance window closes. So if
you're in the middle of a heap scan and the maintenance window closes,
you immediately stop the scan and go the the index cleanup phase, *at a
reduced IO rate*.

Er, why not just finish out the scan at the reduced I/O rate? Any sort
of "abort" behavior is going to create net inefficiency, eg doing an
index scan to remove only a few tuples. ISTM that the vacuum ought to
just continue along its existing path at a slower I/O rate.

regards, tom lane

#9Galy Lee
lee.galy@oss.ntt.co.jp
In reply to: Tom Lane (#8)
Re: RFC: changing autovacuum_naptime semantics

Tom Lane wrote:

Er, why not just finish out the scan at the reduced I/O rate? Any sort

Sometimes, you may need to vacuum large table in maintenance window and
hot table in the service time. If vacuum for hot table does not eat two
much foreground resource, then you can vacuum large table with a lower
IO rate outside maintenance window; but if vacuum for hot table is
overeating the system resource, then launcher had better to stop the
long running vacuum outside maintenance window.

But I am not insisting on the stop-start feature at this moment.
Changing the cost delay dynamically sounds more reasonable. We can use
it to balance total I/O of workers in service time or maintenance time.
It is not so difficult to achieve this by leveraging the share memory of
autovacuum.

Best Regards
Galy Lee

#10Grzegorz Jaskiewicz
gj@pointblue.com.pl
In reply to: Tom Lane (#8)
Re: RFC: changing autovacuum_naptime semantics

On Mar 9, 2007, at 6:42 AM, Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Now regarding your restartable vacuum work. I think that stopping a
vacuum at some point and being able to restart it later is very
cool and
may get you some hot chicks, but I'm not sure it's really useful.

Too true :-(

Yeah.
Wouldn't 'divide and conquer' kinda approach make it better ? Ie. let
vacuum to work on some part of table/db. Than stop, pick up another
part later, vacuum it, etc, etc ?

--
Grzegorz Jaskiewicz
gj@pointblue.com.pl

#11Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#8)
Re: RFC: changing autovacuum_naptime semantics

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Er, why not just finish out the scan at the reduced I/O rate? Any sort
of "abort" behavior is going to create net inefficiency, eg doing an
index scan to remove only a few tuples. ISTM that the vacuum ought to
just continue along its existing path at a slower I/O rate.

I think the main motivation to abort a vacuum scan is so we can switch to some
more urgent scan. So if in the middle of a 1-hour long vacuum of some big
warehouse table we realize that a small hot table is long overdue for a vacuum
we want to be able to remove the tuples we've found so far, switch to the hot
table, and when we don't have more urgent tables to vacuum resume the large
warehouse table vacuum.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#12Alvaro Herrera
alvherre@commandprompt.com
In reply to: Gregory Stark (#11)
Re: RFC: changing autovacuum_naptime semantics

Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Er, why not just finish out the scan at the reduced I/O rate? Any sort
of "abort" behavior is going to create net inefficiency, eg doing an
index scan to remove only a few tuples. ISTM that the vacuum ought to
just continue along its existing path at a slower I/O rate.

I think the main motivation to abort a vacuum scan is so we can switch to some
more urgent scan. So if in the middle of a 1-hour long vacuum of some big
warehouse table we realize that a small hot table is long overdue for a vacuum
we want to be able to remove the tuples we've found so far, switch to the hot
table, and when we don't have more urgent tables to vacuum resume the large
warehouse table vacuum.

Why not just let another autovac worker do the hot table?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support