Still recommending daily vacuum...
From
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
"Recommended practice for most sites is to schedule a database-wide
VACUUM once a day at a low-usage time of day, supplemented by more
frequent vacuuming of heavily-updated tables if necessary. (Some
installations with extremely high update rates vacuum their busiest
tables as often as once every few minutes.) If you have multiple
databases in a cluster, don't forget to VACUUM each one; the program
vacuumdb might be helpful."
Do we still want that to be our formal recommendation? ISTM it would be
more logical to recommend a combination of autovac, daily vacuumdb -a if
you can afford it and have a quiet period, and frequent manual vacuuming
of things like web session tables.
I'm happy to come up with a patch, but I figure there should be
consensus first...
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Monday 02 July 2007 17:52, Jim C. Nasby wrote:
From
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :"Recommended practice for most sites is to schedule a database-wide
VACUUM once a day at a low-usage time of day, supplemented by more
frequent vacuuming of heavily-updated tables if necessary. (Some
installations with extremely high update rates vacuum their busiest
tables as often as once every few minutes.) If you have multiple
databases in a cluster, don't forget to VACUUM each one; the program
vacuumdb might be helpful."Do we still want that to be our formal recommendation? ISTM it would be
more logical to recommend a combination of autovac, daily vacuumdb -a if
you can afford it and have a quiet period, and frequent manual vacuuming
of things like web session tables.I'm happy to come up with a patch, but I figure there should be
consensus first...
I generally recommend to try autovacuum first, augmented by
vacuum/analyze/reindex if you find trouble. I wont say there aren't
workloads that autvacuum wont handle, but in most cases it does fine, and I
expect that increase with 8.3.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
"Jim C. Nasby" <decibel@decibel.org> writes:
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.
Is there a reason to say anything beyond "use autovac"?
regards, tom lane
Tom Lane wrote:
"Jim C. Nasby" <decibel@decibel.org> writes:
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.Is there a reason to say anything beyond "use autovac"?
Did we change the default autovac parameters for 8.3 (beyond turning it
on?) because on any reasonably used database, they are way to conservative.
Joshua D. Drake
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote:
Tom Lane wrote:
"Jim C. Nasby" <decibel@decibel.org> writes:
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.Is there a reason to say anything beyond "use autovac"?
Did we change the default autovac parameters for 8.3 (beyond turning it
on?) because on any reasonably used database, they are way to conservative.
We're still on time to change them ... Any concrete proposals?
--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles." (Lao Tse)
Alvaro Herrera wrote:
Joshua D. Drake wrote:
Tom Lane wrote:
"Jim C. Nasby" <decibel@decibel.org> writes:
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.Is there a reason to say anything beyond "use autovac"?
Did we change the default autovac parameters for 8.3 (beyond turning it
on?) because on any reasonably used database, they are way to conservative.We're still on time to change them ... Any concrete proposals?
I could provide numbers from production high use databases. We could
probably back those down a little and make more reasonable numbers.
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote:
Alvaro Herrera wrote:
Joshua D. Drake wrote:
Did we change the default autovac parameters for 8.3 (beyond turning
it on?) because on any reasonably used database, they are way to
conservative.We're still on time to change them ... Any concrete proposals?
I could provide numbers from production high use databases. We could
probably back those down a little and make more reasonable numbers.
Please do so. Perhaps others can also tell their typical settings.
Best Regards
Michael Paesold
On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
Joshua D. Drake wrote:
Alvaro Herrera wrote:
Joshua D. Drake wrote:
Did we change the default autovac parameters for 8.3 (beyond turning
it on?) because on any reasonably used database, they are way to
conservative.We're still on time to change them ... Any concrete proposals?
I could provide numbers from production high use databases. We could
probably back those down a little and make more reasonable numbers.Please do so. Perhaps others can also tell their typical settings.
FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
"Jim C. Nasby" <decibel@decibel.org> writes:
http://developer.postgresql.org/pgdocs/postgres/routine-vacuuming.html :
Well, with autovac defaulting to ON in 8.3, that's certainly obsolete
text now.Is there a reason to say anything beyond "use autovac"?
There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.
--
Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <decibel@decibel.org> writes:
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
Is there a reason to say anything beyond "use autovac"?
There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.
And which part of that do you think isn't resolved in 8.3?
regards, tom lane
Jim C. Nasby wrote:
On Tue, Jul 03, 2007 at 11:31:08AM +0200, Michael Paesold wrote:
Joshua D. Drake wrote:
Alvaro Herrera wrote:
Joshua D. Drake wrote:
Did we change the default autovac parameters for 8.3 (beyond turning
it on?) because on any reasonably used database, they are way to
conservative.We're still on time to change them ... Any concrete proposals?
I could provide numbers from production high use databases. We could
probably back those down a little and make more reasonable numbers.Please do so. Perhaps others can also tell their typical settings.
FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.
How about pushing thresholds all the way down to 0?
--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia est� en otra parte" (Andre Breton)
On Tue, Jul 3, 2007 at 3:36 PM, in message <13153.1183494983@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Jim C. Nasby" <decibel@decibel.org> writes:
On Mon, Jul 02, 2007 at 11:19:12PM -0400, Tom Lane wrote:
Is there a reason to say anything beyond "use autovac"?
There is; I know that things like web session tables aren't handled very
well by autovacuum if there are any moderately large tables (anything
that will take more than a few minutes to vacuum). Eventually we should
be able to accommodate that case with multiple workers, but we'll need a
mechanism to ensure that at least one worker doesn't get tied up in
large vacuums.And which part of that do you think isn't resolved in 8.3?
We have a 406GB table where 304GB is in one table. The next two tables
are 57GB and 40GB. Inserts to these three tables are constant during the
business day, along with inserts, updates, and very few deletes to the
other tables. Database modifications are few and scattered at night and
on weekends. Virtually all queries are during the business day. The
large tables are "insert only" except for a weekend delete of the oldest
one week of data, to keep a rolling set of just over a year. (No, we
really don't want to go to weekly partitions, if it can be avoided.)
Autovacuum is enabled with very aggressive settings, to cover small
tables, including one with about 75 rows that can be updated 100 or more
times per second. Even with these settings there is zero chance of any
table of even moderate size hitting the autovacuum threshold between our
scheduled vacuums. When we tried doing a nightly vacuum analyze starting
at the end of business day, it ran well into the next day, and the users
complained of slowness until it stopped. We changed to a weeknight vacuum
analyze of the volatile tables which aren't in the big three, and a vacuum
analyze of the entire database right after the weekly delete.
Isn't this a use case where we don't want to count on autovacuum, both
from a table bloat perspective and the user impact perspective, even under
8.3?
In terms of our autovacuum settings, we have several different types of
databases, and in all of them we seem to do well with these changes from
the 8.2 defaults, combined with (except for the above configuration) a
nightly database vacuum:
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
Oh, the tiny, high-update tables occasionally bloat to hundreds or
thousands of pages because of long-running transactions, so we schedule
a daily cluster on those, just to keep things tidy.
-Kevin
On Tue, Jul 3, 2007 at 5:17 PM, in message
<468A84A1.EE98.0025.0@wicourts.gov>, "Kevin Grittner"
<Kevin.Grittner@wicourts.gov> wrote:
We have a 406GB table where 304GB is in one table. The next two tables
It's probably obvious, but I meant a 406GB database. Sorry.
Kevin Grittner wrote:
We have a 406GB table where 304GB is in one table. The next two tables
are 57GB and 40GB. Inserts to these three tables are constant during the
business day, along with inserts, updates, and very few deletes to the
other tables. Database modifications are few and scattered at night and
on weekends. Virtually all queries are during the business day. The
large tables are "insert only" except for a weekend delete of the oldest
one week of data, to keep a rolling set of just over a year. (No, we
really don't want to go to weekly partitions, if it can be avoided.)Autovacuum is enabled with very aggressive settings, to cover small
tables, including one with about 75 rows that can be updated 100 or more
times per second. Even with these settings there is zero chance of any
table of even moderate size hitting the autovacuum threshold between our
scheduled vacuums. When we tried doing a nightly vacuum analyze starting
at the end of business day, it ran well into the next day, and the users
complained of slowness until it stopped. We changed to a weeknight vacuum
analyze of the volatile tables which aren't in the big three, and a vacuum
analyze of the entire database right after the weekly delete.
Sounds like you would be served by setting those specific tables to a
lower vacuum scale factor (keeping a more normal default for the rest of
the tables), and having a non-zero vacuum delay setting (to avoid
excessive I/O consumption). Have you tried that?
The problem you would still have with 8.2 is that while one of these
tables is being vacuumed the rest won't be vacuumed at all. In 8.3 the
other tables can still be vacuumed regularly with the big vacuum still
running (a feature I dubbed "multiple workers", but we're still waiting
to know what name the marketing guys are gonna use).
In terms of our autovacuum settings, we have several different types of
databases, and in all of them we seem to do well with these changes from
the 8.2 defaults, combined with (except for the above configuration) a
nightly database vacuum:autovacuum_naptime = 10s
Another change in 8.3 is that the naptime is per-database, i.e. the time
between two consecutive autovac runs on a database. So with a setting
of 10s, if you have 10 database there will be one autovac run per
second, whereas on 8.2 there would be one autovac each 10 seconds
(unless you run out of worker slots).
Oh, the tiny, high-update tables occasionally bloat to hundreds or
thousands of pages because of long-running transactions, so we schedule
a daily cluster on those, just to keep things tidy.
If you can afford the cluster then there's no problem. I don't expect
that to change in 8.3.
--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
Alvaro Herrera wrote:
Jim C. Nasby wrote:
FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.How about pushing thresholds all the way down to 0?
As long as it handles small (or zero row) tables ok then yes. The
base_threshold in the originial contrib autovacuum was just an easy way
to not vacuum really small tables too often. If a table has only 10
rows, it's going to get vacuumed every time one row is updated. I guess
that's not a big problem with a table that small but still seems excessive.
If you think this isn't a problem with the current autovacuum, then sure
turn it down to zero, and perhaps we can even get rid of it altogether
in another release or two.
"Alvaro Herrera" <alvherre@commandprompt.com> writes:
FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.How about pushing thresholds all the way down to 0?
My intuition is that the thresholds should be lowered to about 5%.
I may be biased by the TPC-C schema where the largest table, stock, gets a
little over 20 records per page so 5% represents an average of one update per
page. But there's nothing unusual about a table like that. Waiting until 20%
of the table is potentially dead --four dead tuples out of 20 per page in the
stock table case-- seems extravagantly wasteful.
I find the idea of lowering the thresholds to 0 sort of intriguing though.
That makes the vacuum delay parameters the primary method to control how
frequently vacuum runs.
Unfortunately vacuum delay settings are hard to get right. The admin needs to
observe how much of an effect the settings have on i/o throughput which varies
from system to system. And using them to control how frequently vacuum runs
would be even harder.
In an ideal world autovacuum would be able to set the delay settings based on
how many updates had happened since the last run started. If more than 5% of
the table was cleaned by vacuum then decrease the delay settings to get this
vacuum to finish sooner and allow fewer updates. If less than 5% of the table
was cleaned by vacuum then increase the delay settings to reduce the
unnecessary impact of vacuum. But that just leaves us back where we started.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:
FWIW, I normally go with the 8.2 defaults, though I could see dropping
vacuum_scale_factor down to 0.1 or 0.15. I also think the thresholds
could be decreased further, maybe divide by 10.How about pushing thresholds all the way down to 0?
My intuition is that the thresholds should be lowered to about 5%.
I may be biased by the TPC-C schema where the largest table, stock, gets a
little over 20 records per page so 5% represents an average of one update per
page. But there's nothing unusual about a table like that. Waiting until 20%
of the table is potentially dead --four dead tuples out of 20 per page in the
stock table case-- seems extravagantly wasteful.I find the idea of lowering the thresholds to 0 sort of intriguing though.
That makes the vacuum delay parameters the primary method to control how
frequently vacuum runs.
I think you are mixing thresholds with scale factors.
vacuum tuples = threshold + reltuples * scale factor
If dead tuples are more than vacuum tuples, autovac does a vacuum.
So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations. In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway. So why not let the threshold be 0 and be done with it?
I, too, find a 0.2 scale factor a bit high. But since I don't run any
database, I fear I would be picking numbers out of thin air.
In an ideal world autovacuum would be able to set the delay settings based on
how many updates had happened since the last run started. If more than 5% of
the table was cleaned by vacuum then decrease the delay settings to get this
vacuum to finish sooner and allow fewer updates. If less than 5% of the table
was cleaned by vacuum then increase the delay settings to reduce the
unnecessary impact of vacuum. But that just leaves us back where we started.
Maybe we can construct some smarts based on something like this. The
equations we currently use are just inherited from contrib autovac,
which didn't have access to much other info. Integrated autovac can do
much better, I think.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations. In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway. So why not let the threshold be 0 and be done with it?
For very small tables, setting a threshold of 0 could mean a vacuum
after every single row update (or every other row). I think that is just
burning cycles. What about a threshold of 10 or 50, to have at least
some sanity limit? Even though the cost of vacuum of a small table is
low, it is still not free, IMHO, no?
Best Regards
Michael Paesold
Michael Paesold wrote:
Alvaro Herrera wrote:
So what you are proposing above amounts to setting scale factor = 0.05.
The threshold is unimportant -- in the case of a big table it matters
not if it's 0 or 1000, it will be almost irrelevant in calculations. In
the case of small tables, then the table will be vacuumed in almost
every iteration if the threshold is 0, which is fine because the table
is small anyway. So why not let the threshold be 0 and be done with it?For very small tables, setting a threshold of 0 could mean a vacuum
after every single row update (or every other row). I think that is just
burning cycles. What about a threshold of 10 or 50, to have at least
some sanity limit? Even though the cost of vacuum of a small table is
low, it is still not free, IMHO, no?
A bit off-topic (because probably not realistic in a 8.3 timeframe) -
but maybe the threshold should be specified in terms of "expected number of
pages to be freed", instead specifing a bias for the number of modified
rows as it is done now. Then "1" would probably be a reasonable default, because
a vacuum that won't free at least one page seems to be not really worth
the effort - it won't safe any future IO bandwith.
Just an idea I got while following this thread...
greetings, Florian Pflug
On Tue, Jul 3, 2007 at 5:34 PM, in message
<20070703223402.GA5491@alvh.no-ip.org>, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Kevin Grittner wrote:
Autovacuum is enabled with very aggressive settings, to cover small
tables, including one with about 75 rows that can be updated 100 or more
times per second. Even with these settings there is zero chance of any
table of even moderate size hitting the autovacuum threshold between our
scheduled vacuums.Sounds like you would be served by setting those specific tables to a
lower vacuum scale factor (keeping a more normal default for the rest of
the tables), and having a non-zero vacuum delay setting (to avoid
excessive I/O consumption). Have you tried that?
I did play with that, but it doens't seem to make sense in our environment.
We have about 100 databases, most of them scattered around the state, and
any extra maintenance like that has a cost, particularly with the daily
cluster changing the oid. Both from doing the math and from experience,
I can say that the autovacuum only affects the small, frequently updated
tables, so I could see no benefit. Am I missing somethign? (I can't see
where this causes any extra I/O.)
Our tables tend to fall into one of four categories, small tables with high
update rates, medium tables (millions or tens of millions of rows) with
thousands or tens of thousands of updates per day, static tables of various
sizes that are only modified as part of a software release, and big honking
tables (100s of GB) which are either insert-only or are insert with
periodic purge of old rows. Only the first group has a chance of being
autovacuumed in normal operations. Event he purges don't cause it to kick
in.
In terms of our autovacuum settings, we have several different types of
databases, and in all of them we seem to do well with these changes from
the 8.2 defaults, combined with (except for the above configuration) a
nightly database vacuum:autovacuum_naptime = 10s
Another change in 8.3 is that the naptime is per-database, i.e. the time
between two consecutive autovac runs on a database. So with a setting
of 10s, if you have 10 database there will be one autovac run per
second, whereas on 8.2 there would be one autovac each 10 seconds
(unless you run out of worker slots).
That's fine. We actually want it every ten seconds in a production
database. When you can have more updates per second than there are rows
in a small table, frequent vacuums are good. As long as the table doesn't
bloat too badly, the vacuum is typically 10 to 20 milliseconds. I'm sure
that part of it is that the table tends to remain fully cached. When these
tables were vacuumed once per minute, we ran into performance problems.
Oh, the tiny, high-update tables occasionally bloat to hundreds or
thousands of pages because of long-running transactions, so we schedule
a daily cluster on those, just to keep things tidy.If you can afford the cluster then there's no problem. I don't expect
that to change in 8.3.
Here also we're talking 10 to 20 milliseconds. I understand that in 8.2
that leaves a chance of an error, but we seem to have dodged that bullet
so far. Has that gotten any safer in 8.3?
-Kevin