perfromance impact of vacuum
All,
What impact in performance does vacuum have on an active database? I'm
being asked about this...or rather...someone is questioning the use of
postgresql because of this
Thanks
Jay
"Jay O'Connor" <joconnor@cybermesa.com> writes:
What impact in performance does vacuum have on an active database?
I've seen varying reports on this; some say it's not a problem, and some
say it is. I suspect (without real proof) that the critical issue is
whether you have adequate I/O bandwidth. If your disk is near
saturation already, the extra I/O load for vacuum can drive you over the
knee of the performance curve and cause serious degradation. But if
you've got spare I/O bandwidth then it's not a big problem.
You can reduce the I/O needs for routine vacuums by only vacuuming the
tables you need to, and selecting an appropriate vacuum frequency for
each one. (You might want to try the "autovacuum" daemon that's being
worked on, as a substitute for doing this tuning by hand. Not sure if
the daemon is ready for prime time though.)
BTW, I suggest moving any followup discussion to pgsql-performance.
regards, tom lane
On Mon, 2003-07-14 at 17:13, Jay O'Connor wrote:
What impact in performance does vacuum have on an active database? I'm
being asked about this...or rather...someone is questioning the use of
postgresql because of this
There is no easy answer to this question, other than if you vacuum
appropriately (not to often, or too infrequently) the net impact on
system performance is a win. A database system that doesn't require a
vacuum type process still has to do the same work, it just does it at
transaction time, postgres defers this work until vacuum is run, which
is typically done at off peak times.
As the author of the pg_autovacuum daemon in 7.4 contrib, I would
suggest you try it :-) The primary advantage of the daemon is that it
monitors activity and when needed, vacuums specific tables, not whole
databases. The thresholds that dictate when vacuums occur can be set at
runtime, and should allow you to get it performing the way you want. If
you have any questions or problems let me know. I am always interested
in hearing about others real world experiences.
On Tue, Jul 15, 2003 at 10:37:28 -0700,
Jay O'Connor <joconnor@cybermesa.com> wrote:
Just curious but is the length of time to vacuum influenced more by the
size of the table or the number of dead tuples?
I think the fraction of the table that is dead tuples is probably the best
thing to look at for deciding when to vacuum (unless the tables are very
small in which case checking for the number of blocks used may be better).
Import Notes
Reply to msg id not found: 20030715103728.O28847@altaica
On 2003.07.14 20:41 Matthew T. O'Connor wrote:
On Mon, 2003-07-14 at 17:13, Jay O'Connor wrote:
What impact in performance does vacuum have on an active database? I'm
being asked about this...or rather...someone is questioning the use of
postgresql because of thisThere is no easy answer to this question, other than if you vacuum
appropriately (not to often, or too infrequently) the net impact on
system performance is a win. A database system that doesn't require a
vacuum type process still has to do the same work, it just does it at
transaction time, postgres defers this work until vacuum is run, which
is typically done at off peak times.
Thanks.
Just curious but is the length of time to vacuum influenced more by the
size of the table or the number of dead tuples?
Take care,
Jay
On 2003.07.15 09:53 Bruno Wolff III wrote:
On Tue, Jul 15, 2003 at 10:37:28 -0700,
Jay O'Connor <joconnor@cybermesa.com> wrote:Just curious but is the length of time to vacuum influenced more by the
size of the table or the number of dead tuples?I think the fraction of the table that is dead tuples is probably the
best
thing to look at for deciding when to vacuum (unless the tables are very
small in which case checking for the number of blocks used may be
better).
Actually what I meant is how long the vacuum runs. We're going to have a
big database (few TB projected, but I don't know where those numbers come
from) and I'm trying to ausage concerns that vacuuming will impact
performance significantly.
Thanks
Jay
On Tue, Jul 15, 2003 at 11:04:53AM -0700, Jay O'Connor wrote:
Actually what I meant is how long the vacuum runs. We're going to have a
big database (few TB projected, but I don't know where those numbers come
from) and I'm trying to ausage concerns that vacuuming will impact
performance significantly.
It depends very heavily on your expired-tuple percentage. But it is
still not free to vacuum a large table. And vacuum full always scans
the whole table.
Remember that vacuum operates on tables, which automatically means
that it does nasty things to your cache.
The stand-alone analyse can be helpful here. It only does
samples of the tables under analysis, so you don't face the same I/O
load. If all you're doing is adding to a table, it may be worth
investigating. Keep in mind, though, you still need to vacuum every
2 billion transactions.
A
--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
On Tue, 15 Jul 2003, Andrew Sullivan wrote:
On Tue, Jul 15, 2003 at 11:04:53AM -0700, Jay O'Connor wrote:
Actually what I meant is how long the vacuum runs. We're going to have a
big database (few TB projected, but I don't know where those numbers come
from) and I'm trying to ausage concerns that vacuuming will impact
performance significantly.It depends very heavily on your expired-tuple percentage. But it is
still not free to vacuum a large table. And vacuum full always scans
the whole table.Remember that vacuum operates on tables, which automatically means
that it does nasty things to your cache.The stand-alone analyse can be helpful here. It only does
samples of the tables under analysis, so you don't face the same I/O
load. If all you're doing is adding to a table, it may be worth
investigating. Keep in mind, though, you still need to vacuum every
2 billion transactions.
this sounds like one of those places where the ability of a file system to
be told not to cache the accesses of a certain child process would be a
big win.
Wasn't there some discussionon BSD's ability to do this recently and
whether it was a win to port it into postgresql. I'd say that for large
databases being vacuumed mid-day it would be a great win.
On Tue, Jul 15, 2003 at 11:53:50AM -0500, Bruno Wolff III wrote:
On Tue, Jul 15, 2003 at 10:37:28 -0700,
Jay O'Connor <joconnor@cybermesa.com> wrote:Just curious but is the length of time to vacuum influenced more by the
size of the table or the number of dead tuples?I think the fraction of the table that is dead tuples is probably the best
thing to look at for deciding when to vacuum (unless the tables are very
small in which case checking for the number of blocks used may be better).
Actually, for a full vacuum, where the tuples are also has a significant
effect. If you have a large table and you do a large update so that the
table is as follows:
<------------------ data ------------------><deleted-tuples><updated-tuples>
The vacuum full will then move the updated over the deleted. But it does
this by scanning backwards through the table copying tuples forward over the
deleted ones. I have a sneaking suspicion Linux's caching algorithm in some
versions doesn't handle this very well. I've going to see if inserting into
another table, deleteing, vacuuming and then inserting back is faster.
Or don't do large updates on large tables.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
"All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato
scott.marlowe wrote:
The stand-alone analyse can be helpful here. It only does
samples of the tables under analysis, so you don't face the same I/O
load. If all you're doing is adding to a table, it may be worth
investigating. Keep in mind, though, you still need to vacuum every
2 billion transactions.this sounds like one of those places where the ability of a file system to
be told not to cache the accesses of a certain child process would be a
big win.Wasn't there some discussionon BSD's ability to do this recently and
whether it was a win to port it into postgresql. I'd say that for large
databases being vacuumed mid-day it would be a great win.
It was Solaris with free-behind. I hope new caching rules will fix this
soon --- a few people are working on it.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073