Table bloat and vacuum

Started by Jack Orensteinover 17 years ago4 messagesgeneral
Jump to latest
#1Jack Orenstein
jack.orenstein@hds.com

My application is running on 7.4. We have one huge table that drives
our application, and also a tiny (single-row) table used to maintain
some aggregate information. Each transaction inserts or updates 1-2
rows in the huge table, and updates the tiny table.

We vacuum the entire database once a week, and the tiny table every
2000 transactions.

I'm trying to understand some odd behavior observed regarding the tiny
table: The tiny table's disk file is usually 8K or 16K. During the
weekly vacuum, the tiny table bloats. It's still one row, but the size
of the file grows. I've seen it get as high as 1M. But then after the
vacuum, it returns to its normal size.

1) Why does the tiny table bloat during a vacuum? Is it because the
scan of the huge table is run as a transaction, forcing maintenance of
dead versions of the tiny table's one row?

2) Why does the bloat resolve itself? We're not doing any full
vacuums.

We're in the process of upgrading to 8.3.4, so I'd appreciate any
throughs on whether and how this behavior will change with the newer
release.

Jack Orenstein

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jack Orenstein (#1)
Re: Table bloat and vacuum

Jack Orenstein wrote:

1) Why does the tiny table bloat during a vacuum? Is it because the
scan of the huge table is run as a transaction, forcing maintenance of
dead versions of the tiny table's one row?

Yes.

2) Why does the bloat resolve itself? We're not doing any full
vacuums.

Probably the one live tuple bounces to the first page at some point and
then the rest of the pages are truncated by vacuum.

We're in the process of upgrading to 8.3.4, so I'd appreciate any
throughs on whether and how this behavior will change with the newer
release.

In 8.3, vacuuming the big table will not delay dead tuple removal of the
small table.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: Jack Orenstein (#1)
Re: Table bloat and vacuum

On Wed, Nov 12, 2008 at 8:13 AM, Jack Orenstein <jack.orenstein@hds.com> wrote:

My application is running on 7.4. We have one huge table that drives

SNIP

We're in the process of upgrading to 8.3.4, so I'd appreciate any
throughs on whether and how this behavior will change with the newer
release.

You will not believe how much faster 8.3 is, and how much easier
maintenance is. You'll be like a kid in a candy store for months
looking at and using all the new features in it. The improvements are
enormous. Biggest difference for you is that 8.3 can do vacuums in a
background method (it sleeps x milliseconds between pages), can run 3
or more threads, and autovacuum daemon is on by default. For the most
part, your vacuuming issues will no longer exist.

#4Jack Orenstein
jack.orenstein@hds.com
In reply to: Scott Marlowe (#3)
Re: Table bloat and vacuum

Scott Marlowe wrote:

On Wed, Nov 12, 2008 at 8:13 AM, Jack Orenstein <jack.orenstein@hds.com> wrote:

My application is running on 7.4. We have one huge table that drives

SNIP

We're in the process of upgrading to 8.3.4, so I'd appreciate any
throughs on whether and how this behavior will change with the newer
release.

You will not believe how much faster 8.3 is, and how much easier
maintenance is. You'll be like a kid in a candy store for months
looking at and using all the new features in it. The improvements are
enormous. Biggest difference for you is that 8.3 can do vacuums in a
background method (it sleeps x milliseconds between pages), can run 3
or more threads, and autovacuum daemon is on by default. For the most
part, your vacuuming issues will no longer exist.

Our 7.4 vacuuming strategy has gotten pretty complex:

- Entire database once a week.

- (Eventually) biggest table in database: Vacuumed/analyzed every 1000 updates
until there are 10,000 rows, to ensure that optimizer does the right thing,
(discussed recently on this mailing list).

- Medium-sized table containing single-row concurrency hotspots. Usually less
than 1M rows: vacuumed every 2000 updates.

- Single-row tables - these are permanent hotspots, updated in every
transaction: vacuumed every 2000 updates.

Can you comment on how I'll be able to simplify this vacuum schedule by relying
on autovacuum? Can you point me at a document describing how autovacuum decides
when to vacuum a table?

I've also had some trouble figuring out which VACUUMs should ANALYZE.
Originally, I had every vacuum also run analyze (except for the tiny-table
vacuums). But I ran into the "tuple concurrently updated" problem (see
http://archives.postgresql.org/pgsql-sql/2005-05/msg00148.php), so I've had to
back off from that. Are concurrent analyzes OK in 8.3?

Jack