how to plan for vacuum?

Started by Galy Leealmost 19 years ago13 messages
#1Galy Lee
lee.galy@oss.ntt.co.jp

Hi,

For I can not find too much information about how to use vacuum, I want
to ask some general information about the guideline of vacuum planning.

1. How do we know if autovacuum is enough for my application, or should
I setup a vacuum manually from cron for my application?

2. How to set the GUC parameters for autovacuum?
There are two sets of parameters for autovacuum:
- vacuum threshold and scale factor (500/0.2)
  - analyze threshold and scale factor(250/0.1)
Is there any guideline to set these parameters? When does it need to
change the default values?
 
3. How to tune cost-based delay vacuum?
I had searched in performance list; it seems that most of the practices
are based on experience / trial-and-error approach to meet the
requirement of disk utilization or CPU utilization. Is there any other
guild line to set them?

For when autovacuum is turned on by default, if the parameters for
vacuum have not been set well, it will make the system rather unstable.
So I just wonder if we should setup a section in the manual about the
tips of vacuum, then many users can easily set the vacuum parameters for
their system.

Best Regards
Galy Lee
NTT OSS Center

#2Galy Lee
lee.galy@oss.ntt.co.jp
In reply to: Galy Lee (#1)
Re: how to plan for vacuum?

Just have one example here:

workload: run pgbench in 365x24x7
database size: 100GB

the workload distribution:
06:00-24:00 100tps
00:00-06:00 20tps

how should we plan vacuum for this situation to get the highest performance?

Best regards
Galy

Galy Lee wrote:

Show quoted text

Hi,

For I can not find too much information about how to use vacuum, I want
to ask some general information about the guideline of vacuum planning.

1. How do we know if autovacuum is enough for my application, or should
I setup a vacuum manually from cron for my application?

2. How to set the GUC parameters for autovacuum?
There are two sets of parameters for autovacuum:
- vacuum threshold and scale factor (500/0.2)
  - analyze threshold and scale factor(250/0.1)
Is there any guideline to set these parameters? When does it need to
change the default values?
 
3. How to tune cost-based delay vacuum?
I had searched in performance list; it seems that most of the practices
are based on experience / trial-and-error approach to meet the
requirement of disk utilization or CPU utilization. Is there any other
guild line to set them?

For when autovacuum is turned on by default, if the parameters for
vacuum have not been set well, it will make the system rather unstable.
So I just wonder if we should setup a section in the manual about the
tips of vacuum, then many users can easily set the vacuum parameters for
their system.

#3Jim C. Nasby
jim@nasby.net
In reply to: Galy Lee (#1)
Re: how to plan for vacuum?

On Wed, Jan 24, 2007 at 02:37:44PM +0900, Galy Lee wrote:

1. How do we know if autovacuum is enough for my application, or should
I setup a vacuum manually from cron for my application?

Generally I trust autovac unless there's some tables where it's critical
that they be vacuumed frequently, such as a queue table or a web session
table.

2. How to set the GUC parameters for autovacuum?
There are two sets of parameters for autovacuum:
- vacuum threshold and scale factor (500/0.2)
?$B!! - analyze threshold and scale factor(250/0.1)
Is there any guideline to set these parameters? When does it need to
change the default values?

I find those are generally pretty good starting points; just bear in
mind that it means 20% dead space.

3. How to tune cost-based delay vacuum?
I had searched in performance list; it seems that most of the practices
are based on experience / trial-and-error approach to meet the
requirement of disk utilization or CPU utilization. Is there any other
guild line to set them?

Unless you have a means for the database to monitor IO usage on it's
own, I don't know that we have a choice...

I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jim C. Nasby (#3)
Re: how to plan for vacuum?

Jim C. Nasby wrote:

I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.

I've been considering set a default autovacuum cost delay to 10ms; does
this sound reasonable?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Jim C. Nasby
jim@nasby.net
In reply to: Alvaro Herrera (#4)
Re: [HACKERS] how to plan for vacuum?

On Thu, Jan 25, 2007 at 12:52:02AM -0300, Alvaro Herrera wrote:

Jim C. Nasby wrote:

I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.

I've been considering set a default autovacuum cost delay to 10ms; does
this sound reasonable?

For a lightly loaded system, sure. For a heavier load that might be too
much, but of course that's very dependent on not only your hardware, but
how much you want vacuum to interfere with normal operations. Though,
I'd say as a default it's probably better to be more aggressive rather
than less.

Also, it might be better to only set autovac_cost_delay by default;
presumably if someone's running vacuum by hand they want it done pronto.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#6Galy Lee
lee.galy@oss.ntt.co.jp
In reply to: Jim C. Nasby (#3)
Re: how to plan for vacuum?

Jim C. Nasby wrote:

On Wed, Jan 24, 2007 at 02:37:44PM +0900, Galy Lee wrote:

1. How do we know if autovacuum is enough for my application, or should
I setup a vacuum manually from cron for my application?

Generally I trust autovac unless there's some tables where it's critical
that they be vacuumed frequently, such as a queue table or a web session
table.

So how much can we trust autovac? I think at least the following cases
can not be covered by autovac now:
- small but high update tables which are sensitive to garbage
- very big tables which need a long time to be vacuumed.
- when we need to adjust the the max_fsm_page

2. How to set the GUC parameters for autovacuum?
There are two sets of parameters for autovacuum:
- vacuum threshold and scale factor (500/0.2)
?$B!! - analyze threshold and scale factor(250/0.1)
Is there any guideline to set these parameters? When does it need to
change the default values?

I find those are generally pretty good starting points; just bear in
mind that it means 20% dead space.

so what is the principle to set them?
- keep dead space lower than some disk limit
- or keep the garbage rate lower than fillfactor
or any other general principle?

#7Galy Lee
lee.galy@oss.ntt.co.jp
In reply to: Jim C. Nasby (#5)
Re: [PERFORM] how to plan for vacuum?

Jim C. Nasby wrote:

On Thu, Jan 25, 2007 at 12:52:02AM -0300, Alvaro Herrera wrote:

Jim C. Nasby wrote:

I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.

I've been considering set a default autovacuum cost delay to 10ms; does
this sound reasonable?

The problem in here is that we can not easily find a direct relation
between
Cost delay <-> CPU/IO utilization <--> real performance (response
time in peak hour)

It is very hard for any normal user to set this correctly. I think the
experience / trial-and-error approach is awful for the user, every DBA
need to be an expert of vacuum to keep the system stable. For vacuum is
still a big threat to the performance, a more intelligent way is needed.

A lot of efforts have contributed to make vacuum to be a more
lightweight operation, but I think we should still need more efforts on
how to make it can be used easily and safely.

So I have proposed the "vacuum in time" feature in previous; just let
vacuum know how long can it runs, and then it will minimize the impact
in the time span for you. Some argue that it should not have the
maintenance window assumption, but the most safely way is to run in the
maintenance window.

#8Ray Stell
stellr@cns.vt.edu
In reply to: Galy Lee (#6)
Re: how to plan for vacuum?

On Thu, Jan 25, 2007 at 07:29:20PM +0900, Galy Lee wrote:

so what is the principle to set them?
- keep dead space lower than some disk limit
- or keep the garbage rate lower than fillfactor
or any other general principle?

How do you measure "dead space" and "garbage rate?"

I'm a newbe, I don't even know what these terms mean, but if I can measure
them, perhaps it will gel, and really if you can't measure the effect
of a setting change, what have you got? I would hope any discussion on
autovac parms would include some metric evaluation techniques. Thanks.

#9Jim C. Nasby
jim@nasby.net
In reply to: Galy Lee (#1)
Re: how to plan for vacuum?

Please cc the list so others can reply as well...

On Thu, Jan 25, 2007 at 08:45:50AM +0100, Tomas Vondra wrote:

On Wed, Jan 24, 2007 at 02:37:44PM +0900, Galy Lee wrote:

1. How do we know if autovacuum is enough for my application, or should
I setup a vacuum manually from cron for my application?

Generally I trust autovac unless there's some tables where it's critical
that they be vacuumed frequently, such as a queue table or a web session
table.

You can tune thresholds and scale factors for that particular table
using pg_autovacuum. If you lower them appropriately, the vacuum will be
fired more often for that table - but don't lower them too much, just go
step by step until you reach values that are fine for you.

That doesn't work well if autovac gets tied up vacuuming a very large
table. Granted, when that happens there are considerations about the
long-running vacuum transaction (prior to 8.2), but in many systems
you'll still get some use out of other vacuums.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#10Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Herrera (#4)
Re: [HACKERS] how to plan for vacuum?

Alvaro Herrera wrote:

Jim C. Nasby wrote:

I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.

I've been considering set a default autovacuum cost delay to 10ms; does
this sound reasonable?

It really depends on the system. Most of our systems run anywhere from
10-25ms. I find that any more than that, Vacuum takes too long.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#11Jim C. Nasby
jim@nasby.net
In reply to: Galy Lee (#7)
Re: [PERFORM] how to plan for vacuum?

On Thu, Jan 25, 2007 at 07:52:50PM +0900, Galy Lee wrote:

It is very hard for any normal user to set this correctly. I think the
experience / trial-and-error approach is awful for the user, every DBA
need to be an expert of vacuum to keep the system stable. For vacuum is
still a big threat to the performance, a more intelligent way is needed.

Agreed.

So I have proposed the "vacuum in time" feature in previous; just let
vacuum know how long can it runs, and then it will minimize the impact
in the time span for you. Some argue that it should not have the
maintenance window assumption, but the most safely way is to run in the
maintenance window.

Most systems I work on don't have a maintenance window. For those that
do, the window is at best once a day, and that's nowhere near often
enough to be vacuuming any database I've run across. I'm not saying they
don't exist, but effort put into restricting vacuums to a maintenance
window would serve very few people. It'd be much better to put effort
into things like piggyback vacuum.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#12Ray Stell
stellr@cns.vt.edu
In reply to: Joshua D. Drake (#10)
Re: [HACKERS] how to plan for vacuum?

On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote:

It really depends on the system. Most of our systems run anywhere from
10-25ms. I find that any more than that, Vacuum takes too long.

How do you measure the impact of setting it to 12 as opposed to 15?

#13Jim Nasby
jim@nasby.net
In reply to: Ray Stell (#12)
Re: [HACKERS] how to plan for vacuum?

On Jan 25, 2007, at 10:33 AM, Ray Stell wrote:

On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote:

It really depends on the system. Most of our systems run anywhere
from
10-25ms. I find that any more than that, Vacuum takes too long.

How do you measure the impact of setting it to 12 as opposed to 15?

If you've got a tool that will report disk utilization as a
percentage it's very easy; I'll decrease the setting until I'm at
about 90% utilization with the system's normal workload (leaving some
room for spikes, etc). Sometimes I'll also tune the costs if reads
vs. writes are a concern.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)