smartvacuum() instead of autovacuum
Hi,
I am trying to implement smartvacuum(), which do vacuum only tables having
many dead rows, instead of autovacuum.
I read sources of autovacuum and pgstat, and found it could be if
smartvacuum() gets information from pgstat like autovacuum does.
concerns:
1. autovacuum makes vacuum nodes itself, while smartvacuum() will do it by
SPI for simplicity.
2. autovacuum allocate its own MemoryContext. Does smartvacuum() need it as
well?
3. autovacuum takes new transaction. Why?
questions:
1. Why isn't there some function like pg_stat_get_dead_rows(). if there is,
smartvacuum() can be implemented by even pl/pgsql
2. autovacuum can actually conflict with VACUUM ANALYZE on another session?
Any answers and opinions are appreciated.
Regards,
Hitoshi Harada
Hitoshi Harada wrote:
I am trying to implement smartvacuum(), which do vacuum only tables
having many dead rows, instead of autovacuum.
How is this different from what autovacuum does?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Hi, Peter,
How is this different from what autovacuum does?
My application needs to do vacuum by itself, while
autovacuum does it as daemon.
The database is updated so frequently that
normal vacuum costs too much and tables to be updated are
not so many as the whole database is vacuumed.
I want to use autovacuum except the feature of daemon,
but want to control when to vacuum and which table to vacuum.
So, nothing is different between autovacuum and smartvacuum(),
but former is daemon and later is user function.
c.f. I ran autovacuum before, and my batch script did vacuum while
autovacuum did one as well on the other session at the same time.
I found the vacuum analyze conflicts each other sometime...
so I want to control vacuum my self.
http://archives.postgresql.org/pgsql-bugs/2002-12/msg00198.php
http://archives.postgresql.org/pgsql-general/2004-05/msg00015.php
Regards,
Hitoshi Harada
Show quoted text
-----Original Message-----
From: Peter Eisentraut [mailto:peter_e@gmx.net]
Sent: Sunday, October 22, 2006 10:08 PM
To: Hitoshi Harada
Cc: pgsql-hackers@postgresql.org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuumHitoshi Harada wrote:
I am trying to implement smartvacuum(), which do vacuum only tables
having many dead rows, instead of autovacuum.How is this different from what autovacuum does?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
"Hitoshi Harada" <hitoshi_harada@forcia.com> writes:
How is this different from what autovacuum does?
My application needs to do vacuum by itself, while
autovacuum does it as daemon.
The database is updated so frequently that
normal vacuum costs too much and tables to be updated are
not so many as the whole database is vacuumed.
I want to use autovacuum except the feature of daemon,
but want to control when to vacuum and which table to vacuum.
So, nothing is different between autovacuum and smartvacuum(),
but former is daemon and later is user function.
This seems completely unconvincing. What are you going to do that
couldn't be done by autovacuum?
regards, tom lane
Ok,
But my point is, autovacuum may corrupt with vacuum analyze command
on another session. My intention of smartvacuum() is based on this.
Any solution for this??
Regards,
Hitoshi Harada
Show quoted text
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, October 23, 2006 11:10 AM
To: Hitoshi Harada
Cc: 'Peter Eisentraut'; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum"Hitoshi Harada" <hitoshi_harada@forcia.com> writes:
How is this different from what autovacuum does?
My application needs to do vacuum by itself, while
autovacuum does it as daemon.
The database is updated so frequently that
normal vacuum costs too much and tables to be updated are
not so many as the whole database is vacuumed.
I want to use autovacuum except the feature of daemon,
but want to control when to vacuum and which table to vacuum.
So, nothing is different between autovacuum and smartvacuum(),
but former is daemon and later is user function.This seems completely unconvincing. What are you going to do that
couldn't be done by autovacuum?regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
If the decision to vacuum based on autovacuum criteria is good enough
for you then I think you should just focus on getting autovac to do what
you want it to do. Perhaps you just need to decrease the sleep time to a
few seconds, so that autovac will quickly detect when something needs to
be vacuumed.
The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.
On Mon, Oct 23, 2006 at 11:18:39AM +0900, Hitoshi Harada wrote:
Ok,
But my point is, autovacuum may corrupt with vacuum analyze command
on another session. My intention of smartvacuum() is based on this.
Any solution for this??Regards,
Hitoshi Harada
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, October 23, 2006 11:10 AM
To: Hitoshi Harada
Cc: 'Peter Eisentraut'; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum"Hitoshi Harada" <hitoshi_harada@forcia.com> writes:
How is this different from what autovacuum does?
My application needs to do vacuum by itself, while
autovacuum does it as daemon.
The database is updated so frequently that
normal vacuum costs too much and tables to be updated are
not so many as the whole database is vacuumed.
I want to use autovacuum except the feature of daemon,
but want to control when to vacuum and which table to vacuum.
So, nothing is different between autovacuum and smartvacuum(),
but former is daemon and later is user function.This seems completely unconvincing. What are you going to do that
couldn't be done by autovacuum?regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes:
The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.
It's conceivable that it'd make sense to allow multiple autovac
processes running in parallel. (The infrastructure part of this is easy
enough, the hard part is keeping them from all deciding to vacuum the
same table.)
One reason we have not done that already is the thought that multiple
vacuum processes would suck too much I/O to be reasonable. Now you
could dial back their resource demands with the cost-delay settings,
but it's not clear that ten autovacs running at one-tenth speed are
better than one autovac using all the cycles you can spare. Usually
I think it's best if a vacuum transaction finishes as fast as it can.
In any case, these exact same concerns would apply to manual vacuums
or a combination of manual and auto vacuum.
regards, tom lane
Tom Lane wrote:
"Jim C. Nasby" <jim@nasby.net> writes:
The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.It's conceivable that it'd make sense to allow multiple autovac
processes running in parallel. (The infrastructure part of this is easy
enough, the hard part is keeping them from all deciding to vacuum the
same table.)One reason we have not done that already is the thought that multiple
vacuum processes would suck too much I/O to be reasonable. Now you
could dial back their resource demands with the cost-delay settings,
but it's not clear that ten autovacs running at one-tenth speed are
better than one autovac using all the cycles you can spare. Usually
I think it's best if a vacuum transaction finishes as fast as it can.
I think this is one of the reasons table specific delay settings were
designed in from the beginning. I think the main use cases for multiple
vacuums at once are:
1) Vacuum per table space assuming each table space is on a different
drive with it's own I/O.
2) the frequently updated table that can't wait to be vacuumed while a
large table is being vacuumed. In this case if you set a system default
delay setting and set a more aggressive table specific delay setting for
your hot spot tables then multiple vacuums become a clear win. This is
an important case that I hope we handle soon. At this point it's one of
the main failings of the current autovacuum system.
On Mon, Oct 23, 2006 at 03:08:03PM -0400, Tom Lane wrote:
"Jim C. Nasby" <jim@nasby.net> writes:
The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.It's conceivable that it'd make sense to allow multiple autovac
processes running in parallel. (The infrastructure part of this is easy
enough, the hard part is keeping them from all deciding to vacuum the
same table.)
It might be worth creating a generic framework that prevents multiple
vacuums from hitting a table at once, autovac or not.
One reason we have not done that already is the thought that multiple
vacuum processes would suck too much I/O to be reasonable. Now you
could dial back their resource demands with the cost-delay settings,
but it's not clear that ten autovacs running at one-tenth speed are
better than one autovac using all the cycles you can spare. Usually
I think it's best if a vacuum transaction finishes as fast as it can.
There's other things that would benefit from having some idea on what IO
resources are available. For example, having a separate bgwriter (or
reader) for each set of physical volumes. So a means of grouping
tablespaces wouldn't hurt.
In any case, these exact same concerns would apply to manual vacuums
or a combination of manual and auto vacuum.
Well, the advantage to manual vacuums is that you can tune things to
utilize multiple arrays...
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote:
On Mon, Oct 23, 2006 at 03:08:03PM -0400, Tom Lane wrote:
"Jim C. Nasby" <jim@nasby.net> writes:
The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.It's conceivable that it'd make sense to allow multiple autovac
processes running in parallel. (The infrastructure part of this is easy
enough, the hard part is keeping them from all deciding to vacuum the
same table.)It might be worth creating a generic framework that prevents multiple
vacuums from hitting a table at once, autovac or not.
That one is easy, because vacuum gets a lock on the affected table that
conflicts with itself. The problem is that the second vacuum would
actually wait for the first to finish.
A naive idea is to use ConditionalLockAcquire, and if it fails just skip
the table.
One reason we have not done that already is the thought that multiple
vacuum processes would suck too much I/O to be reasonable. Now you
could dial back their resource demands with the cost-delay settings,
but it's not clear that ten autovacs running at one-tenth speed are
better than one autovac using all the cycles you can spare. Usually
I think it's best if a vacuum transaction finishes as fast as it can.
In the scenario where one table is huge and another is very small, it
can certainly be useful to vacuum the small table several times while
the huge one has only been vacuumed once. For that you definitively
need the ability to run parallel vacuums.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
If the decision to vacuum based on autovacuum criteria is good enough
for you then I think you should just focus on getting autovac to do what
you want it to do. Perhaps you just need to decrease the sleep time to a
few seconds, so that autovac will quickly detect when something needs to
be vacuumed.
Thanks, I'll do it.
My database is updated frequently all the day and
runs big building process a day.
Almost all the day autovac is ok but
in the big building process autovac annoys it,
so I wished there might be the way to order autovac to do its process.
Hitoshi Harada
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jim C. Nasby
Sent: Tuesday, October 24, 2006 3:36 AM
To: Hitoshi Harada
Cc: 'Tom Lane'; 'Peter Eisentraut'; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuumIf the decision to vacuum based on autovacuum criteria is good enough
for you then I think you should just focus on getting autovac to do what
you want it to do. Perhaps you just need to decrease the sleep time to a
few seconds, so that autovac will quickly detect when something needs to
be vacuumed.The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.On Mon, Oct 23, 2006 at 11:18:39AM +0900, Hitoshi Harada wrote:
Ok,
But my point is, autovacuum may corrupt with vacuum analyze command
on another session. My intention of smartvacuum() is based on this.
Any solution for this??Regards,
Hitoshi Harada
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, October 23, 2006 11:10 AM
To: Hitoshi Harada
Cc: 'Peter Eisentraut'; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum"Hitoshi Harada" <hitoshi_harada@forcia.com> writes:
How is this different from what autovacuum does?
My application needs to do vacuum by itself, while
autovacuum does it as daemon.
The database is updated so frequently that
normal vacuum costs too much and tables to be updated are
not so many as the whole database is vacuumed.
I want to use autovacuum except the feature of daemon,
but want to control when to vacuum and which table to vacuum.
So, nothing is different between autovacuum and smartvacuum(),
but former is daemon and later is user function.This seems completely unconvincing. What are you going to do that
couldn't be done by autovacuum?regards, tom lane
---------------------------(end of
broadcast)---------------------------
Show quoted text
TIP 5: don't forget to increase your free space map settings
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?