VACUUM process running for a long time

Started by fous velkejalmost 16 years ago10 messagesgeneral
Jump to latest
#1fous velkej
honza801@gmail.com

hi

i've got the database (about 300G) and it's still growing.

i am inserting new data (about 2G/day) into the database (there is
only one table there) and i'm also deleting about 2G/day (data older
than month).

the documentation says, one should run VACUUM if there are many
changes in the database, but the vacuumdb never finishes sooner than
the new data should be imported.

is there any technique that can solve this problem?

thanks
fous

#2Adrian von Bidder
avbidder@fortytwo.ch
In reply to: fous velkej (#1)
Re: VACUUM process running for a long time

On Wednesday 14 April 2010 16.01:39 Jan Krcmar wrote:

the documentation says, one should run VACUUM if there are many
changes in the database, but the vacuumdb never finishes sooner than
the new data should be imported.

is there any technique that can solve this problem?

-> vacuum can run concurrently to other stuff, so it's not necessary to
wait before it finishes.
-> in most cases, autovacuum should do the Right Thing(tm) atomatically, so
you should not need to call vacuum manually.

This is with a recent pg version. Do you use a (very) old version with
autovacuum? Is your db server running hot and can't really keep up with
inserting data as soon as vacuum starts running?

Note that the pg documentation contains lots of useful information about
tuning autovacuum. Without knowing how your table looks and how your data
entry happens (in peaks? or always at about the same rate?) we probably
can't help you much more.

cheers
-- vbi

thanks
fous

--
featured product: PostgreSQL - http://postgresql.org

#3John R Pierce
pierce@hogranch.com
In reply to: fous velkej (#1)
Re: VACUUM process running for a long time

Jan Krcmar wrote:

hi

i've got the database (about 300G) and it's still growing.

i am inserting new data (about 2G/day) into the database (there is
only one table there) and i'm also deleting about 2G/day (data older
than month).

the documentation says, one should run VACUUM if there are many
changes in the database, but the vacuumdb never finishes sooner than
the new data should be imported.

is there any technique that can solve this problem?

your table is currently in a messy state, as its apparently not been
vacuumed (what version of postgres is this, anything since 8.1 should
have autovacuum running by default). in theory your table has about
60GB of data in it, the fact that its 300GB indicates there's a lot of
'dead' tuples.

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions

#4fous velkej
honza801@gmail.com
In reply to: John R Pierce (#3)
Re: VACUUM process running for a long time

2010/4/14 John R Pierce <pierce@hogranch.com>:

Jan Krcmar wrote:

hi

i've got the database (about 300G) and it's still growing.

i am inserting new data (about 2G/day) into the database (there is
only one table there) and i'm also deleting about 2G/day (data older
than month).

the documentation says, one should run VACUUM if there are many
changes in the database, but the vacuumdb never finishes sooner than
the new data should be imported.

is there any technique that can solve this problem?

your table is currently in a messy state, as its apparently not been
vacuumed (what version of postgres is this, anything since 8.1 should have

i'm using postgresql-server-8.4.2-1PGDG.rhel5
autovacuum is running, but used space is always rising

autovacuum running by default).    in theory your table has about 60GB of
data in it, the fact that its 300GB indicates there's a lot of 'dead'
tuples.

the database was dumper&recreated&restored about 2 weeks ago (this
removes allocated "empty" space, isn't it?). dump had about 250G

i agree that there should be some 'dead' tuples, but how should i
unallocate them?

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions

this is not really good workaround...

Show quoted text

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Alan Hodgson
ahodgson@reinvent.com
In reply to: fous velkej (#4)
Re: VACUUM process running for a long time

On Wednesday 14 April 2010, Jan Krcmar <honza801@gmail.com> wrote:

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions

this is not really good workaround...

Actually it's a very good workaround, that a lot of people use for exactly
this purpose. It's a lot less disk I/O than delete+vacuum even when you're
not experiencing bloat.

#6Greg Smith
gsmith@gregsmith.com
In reply to: fous velkej (#4)
Re: VACUUM process running for a long time

Jan Krcmar wrote:

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions

this is not really good workaround...

It is in fact the only good workaround for your problem, which you'll
eventually come to realize if you struggle with this class of problem
for long enough. You can continue to fight with autovacuum forever, but
it's a battle you'll never quite win if you're deleting 2GB per day.
Even if you get vacuum running often enough to clean up the space,
you'll still have a constant struggle to keep your indexes working
efficiently.

Or you can partition by day or week and make the entire problem go
away. Dropping an old partition requires no vacuum cleanup and leaves
behind no index issues. It really is the right solution here if you
want to solve this problem once, rather than continuing to fight it a
little every single day forever.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

#7raghavendra t
raagavendra.rao@gmail.com
In reply to: Alan Hodgson (#5)
Re: VACUUM process running for a long time

Hi

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions

this is not really good workaround...

As a First choice, This is a very good workaround for your present
situation.

As a second choice, Setting the maintenance_work_mem will give a performance
boost but we can only increase the memory upto 2GB. This parameter Sets the
limit for the amount that autovacuum, manual vacuum, bulk index build and
other maintenance routines are permitted to use. Setting it to a moderately
high value will increase the efficiency of vacuum and other operations.

But i go with first choice..

Regards
Raghavendra

Show quoted text

You might consider partitioning this table by date, either by day or by
week, and instead of deleting old rows, drop entire old partitions

this is not really good workaround...

Actually it's a very good workaround, that a lot of people use for exactly
this purpose. It's a lot less disk I/O than delete+vacuum even when you're
not experiencing bloat.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8fous velkej
honza801@gmail.com
In reply to: Adrian von Bidder (#2)
Re: VACUUM process running for a long time

hi

2010/4/14 Adrian von Bidder <avbidder@fortytwo.ch>:

 -> vacuum can run concurrently to other stuff, so it's not necessary to
wait before it finishes.
 -> in most cases, autovacuum should do the Right Thing(tm) atomatically, so
you should not need to call vacuum manually.

This is with a recent pg version.  Do you use a (very) old version with
autovacuum?  Is your db server running hot and can't really keep up with
inserting data as soon as vacuum starts running?

Note that the pg documentation contains lots of useful information about
tuning autovacuum.  Without knowing how your table looks and how your data
entry happens (in peaks?  or always at about the same rate?) we probably
can't help you much more.

cheers
-- vbi

i'm doing one big insert per day, and one big delete per day

anyway, i've found, this article
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html

could the partitioning be helpfull for this situation?
i guess, that it does the same as the others advised. inserts are
stored into specific table and obsolete table is deleted and allocated
space is freed. am i right?

f.

#9Adrian von Bidder
avbidder@fortytwo.ch
In reply to: fous velkej (#8)
Re: VACUUM process running for a long time

On Thursday 15 April 2010 15.56:20 Jan Krcmar wrote:

i'm doing one big insert per day, and one big delete per day

anyway, i've found, this article
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html

could the partitioning be helpfull for this situation?

Yes, I'm quite sure that partitioning will be a big help in this scenario.

cheers
-- vbi

--
this email is protected by a digital signature: http://fortytwo.ch/gpg

#10Scott Marlowe
scott.marlowe@gmail.com
In reply to: fous velkej (#1)
Re: VACUUM process running for a long time

On Wed, Apr 14, 2010 at 8:01 AM, Jan Krcmar <honza801@gmail.com> wrote:

hi

i've got the database (about 300G) and it's still growing.

i am inserting new data (about 2G/day) into the database (there is
only one table there) and i'm also deleting about 2G/day (data older
than month).

the documentation says, one should run VACUUM if there are many
changes in the database, but the vacuumdb never finishes sooner than
the new data should be imported.

is there any technique that can solve this problem?

Are you running autovacuum? Has it been tuned to be more aggresive
than the default. I've got some large heavily updated dbs for which
I've had to turn down the autovacuum_vacuum_cost_delay to 2 or 5 ms to
get it to keep up. But I've got a pretty good IO subsystem that can
handle the more aggresive autovacuum.

If you're doing one big insert and one big delete a day, then you
should be able to just kick off a regular vacuum at the end of the
delete, with low cost_delay and higher cost_limit that might keep up.
However, if you're on the edge on your IO subsystem then it isn't
gonna help much because it's gonna slow down the system too much.