Recovering real disk space

Started by Adam Siegelabout 21 years ago6 messagesgeneral

adam@sycamore.us

about 21 years ago

We have a system that archives data to a postgres database. The raw
data is confined to one table. Each record in the table is generally
1500 bytes. Each record is also associated with a volume name. During
normal operations, many millions of rows are written to this table.
After sometime the disk gets close to being full.

We have a program that allows the user to offload rows from a table
based upon volume name to a CD. This is done by doing a select on the
table and then creating a memento record for each row and persisting it
to the CD. Once the rows are written to the CD a delete is performed on
the table for the off-loaded rows.

select * from packets where volume = 'abc';

... process to write to CD ...

delete from packets where volume = 'abc';

We perform a vacuum full after each mass delete. This cycle can happen
many times during over a couple of weeks. We are in a test lab
environment and are generating a lot of data.

One of the problems we have is that the vacuum on the table can take up
to 10 hours. We also expect to see the physical disk space go down, but
this does not happen. If we accidently fill up the disk, then all bets
are off and we are unable to recover. A vacuum never seems to finish
(several days).

How can we physically recover "real" disk space for the rows that were
deleted from the table? I've heard about free space buffers, but am not
really sure how they work. Are there configuration items that can be
tweaked to help with vacuuming large tables?

Thanks!!!!

Wes

wespvp@syntegra.com

about 21 years ago

In reply to: Adam Siegel (#1)

Re: Recovering real disk space

On 3/30/05 12:09 PM, "Adam Siegel" <adam@sycamore.us> wrote:

How can we physically recover "real" disk space for the rows that were
deleted from the table?

vacuum full

Wes

Noname

ptjm@interlog.com

about 21 years ago

In reply to: Adam Siegel (#1)

Re: Recovering real disk space

In article <424AEB5D.2090204@sycamore.us>,
Adam Siegel <adam@sycamore.us> wrote:

[...]

% We perform a vacuum full after each mass delete. This cycle can happen
% many times during over a couple of weeks. We are in a test lab
% environment and are generating a lot of data.
%
% One of the problems we have is that the vacuum on the table can take up
% to 10 hours. We also expect to see the physical disk space go down, but
% this does not happen.

Try vacuum full verbose next time to see what it's doing.

Try reindexing after the vacuum is done.

You may find an ordinary vacuuum is faster and just as useful as vacuum full
assuming you're filling and deleting from the same table all the time. It
won't free up space, but it will allow you to maintain a high-water mark.

Look at the relpages column in pg_class to see which relations are using
up the most space.

If you're clearing out all the data for a set of tables, drop them and
recreate them.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com

Guy Rouillier

guyr@masergy.com

about 21 years ago

In reply to: Noname (#3)

Re: Recovering real disk space

Adam Siegel wrote:

We have a system that archives data to a postgres database. The raw
data is confined to one table. Each record in the table is generally
1500 bytes. Each record is also associated with a volume name.
During normal operations, many millions of rows are written to this
table. After sometime the disk gets close to being full.

We have a program that allows the user to offload rows from a table
based upon volume name to a CD. This is done by doing a select on the
table and then creating a memento record for each row and persisting
it to the CD. Once the rows are written to the CD a delete is
performed on the table for the off-loaded rows.

select * from packets where volume = 'abc';

... process to write to CD ...

delete from packets where volume = 'abc';

We perform a vacuum full after each mass delete. This cycle can
happen many times during over a couple of weeks. We are in a test lab
environment and are generating a lot of data.

One of the problems we have is that the vacuum on the table can take
up to 10 hours. We also expect to see the physical disk space go
down, but this does not happen. If we accidently fill up the disk,
then all bets are off and we are unable to recover. A vacuum never
seems to finish (several days).

How can we physically recover "real" disk space for the rows that were
deleted from the table? I've heard about free space buffers, but am
not really sure how they work. Are there configuration items that
can be tweaked to help with vacuuming large tables?

You mention millions of rows. You realize that attempting to delete all
those rows will use up massive amounts of transaction log space, right?
We have a data collection system that stores about 2 million rows a day.
To avoid this issue, we have separate tables for each month. We want to
keep N months available online. So when a new month starts, we save off
table N-1 to backup and then just drop the table. You may want to
consider something like this; if you don't have too many volumes, have
separate tables by volume by month.

Thanks!!!!

---------------------------(end of
broadcast)--------------------------- TIP 2: you can get off all
lists at once with the unregister command (send "unregister
YourEmailAddressHere" to majordomo@postgresql.org)

--
Guy Rouillier

Import Notes

Resolved by subject fallback

Bruno Wolff III

bruno@wolff.to

about 21 years ago

In reply to: Adam Siegel (#1)

Re: Recovering real disk space

On Wed, Mar 30, 2005 at 13:09:33 -0500,
Adam Siegel <adam@sycamore.us> wrote:

We perform a vacuum full after each mass delete. This cycle can happen
many times during over a couple of weeks. We are in a test lab
environment and are generating a lot of data.

One of the problems we have is that the vacuum on the table can take up
to 10 hours. We also expect to see the physical disk space go down, but
this does not happen. If we accidently fill up the disk, then all bets
are off and we are unable to recover. A vacuum never seems to finish
(several days).

This may mean that there are open transactions pinning the records you
have deleted so that they aren't being removed by the vacuum.
Also, under some circumstances a CLUSTER can be faster than a VACUUM FULL.

Thomas F.O'Connell

tfo@sitening.com

about 21 years ago

In reply to: Bruno Wolff III (#5)

Re: Recovering real disk space

Isn't this also a symptom of inappropriate FSM settings?

Try running a VACUUM VERBOSE and check the FSM settings at the end.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source — Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Apr 3, 2005, at 8:21 AM, Bruno Wolff III wrote:

Show quoted text

On Wed, Mar 30, 2005 at 13:09:33 -0500,
Adam Siegel <adam@sycamore.us> wrote:

We perform a vacuum full after each mass delete. This cycle can
happen
many times during over a couple of weeks. We are in a test lab
environment and are generating a lot of data.

One of the problems we have is that the vacuum on the table can take
up
to 10 hours. We also expect to see the physical disk space go down,
but
this does not happen. If we accidently fill up the disk, then all
bets
are off and we are unable to recover. A vacuum never seems to finish
(several days).

This may mean that there are open transactions pinning the records you
have deleted so that they aren't being removed by the vacuum.
Also, under some circumstances a CLUSTER can be faster than a VACUUM
FULL.