database file size bloat

Started by Matthew Arnisonalmost 26 years ago34 messagesgeneral

maffew@physics.usyd.edu.au

almost 26 years ago

hi folks,

we are using a fairly simple database, with stuff being retrieved and
updated and inserted via php over the web. we have code in there to check
that queries sent to psql are always less than 8000 bytes.

three times now this week (on two different servers) the raw database on
disk has ballooned in size, from about 10 megs to 50 megs in two cases,
and from about 10 megs to 250 megs in another case.

a VACUUM VERBOSE ANALYZE; cleans it back down to the proper size, but
meanwhile all the queries take half a minute, instead of less than a
second. and our load average skyrockets.

here is a dump from VACUUM:

NOTICE: --Relation webcast--
NOTICE: Pages 5568: Changed 0, Reapped 5553, Empty 0, New 0; Tup 93: Vac
35065, Keep/VTL 0/0, Crash 0, UnUsed 29, MinLen 766, MaxLen
4782; Re-using: Free/Avail. Space 45303704/45303704; EndEmpty/Avail. Pages
0/5553. Elapsed 1/0 sec.
NOTICE: Rel webcast: Pages: 5568 --> 15; Tuple(s) moved: 93. Elapsed
415/0 sec.

we are running postgresql-6.5.2-1, redhat 6.0, pii-350, 384 megs RAM.

i thought of upgrading, but couldn't see anything in the 6.5.3 changelogs
that would help. checked through the mail archives, couldn't find anything
on a quick search, couldn't see anything in the FAQ.

i am wondering if we can prevent this bloat from happening. any help
appreciated! this server could get very busy over the weekend, and we want
it to stay up under the load. :)

cheers,
matthew.

Ed Loehr

eloehr@austin.rr.com

almost 26 years ago

In reply to: Matthew Arnison (#1)

Re: database file size bloat

Matthew Arnison wrote:

three times now this week (on two different servers) the raw database on
disk has ballooned in size, from about 10 megs to 50 megs in two cases,
and from about 10 megs to 250 megs in another case.

a VACUUM VERBOSE ANALYZE; cleans it back down to the proper size, but
meanwhile all the queries take half a minute, instead of less than a
second. and our load average skyrockets.

Hi Matthew,

I have no explanation for the bloat, but it is a well-known "postgresqlism"
that you should consider running vacuum analyze at least nightly, possibly
more frequently. [I run it hourly.]

Also, there are about 300 reasons to consider upgrading to 7.0, most having
to do with bug fixes and performance improvements. Unfortunately, there
may be a few incompatibilities (particularly in some pl/pgsql
incantations), so don't assume a seamless upgrade.

Regards,
Ed Loehr

lec

englim@pc.jaring.my

almost 26 years ago

In reply to: Matthew Arnison (#1)

Postgresqlism & Vacuum?

Ed Loehr wrote:

... it is a well-known "postgresqlism"
that you should consider running vacuum analyze at least nightly, possibly
more frequently. [I run it hourly.]

I think there must be something wrong with the optimiser that it's
"postgresqlism" that you must vacuum analyze frequently. Just as an example,
for Clipper (dBase compiler), it's Clipperism that you must re-index if you
cannot locate some records just because the indexing module screws up.

For large 24x7 installations, it's impossible to vacuum nightly because when
postgresql is vacuuming the table is locked up, to the end-user the database
has already hung.

There has been effort to speed up the vacuuming process, but this isn't the
cure. I believe the fault lies on the optimizer.

For eg, in Bruce Momjian's FAQ 4.9:

PostgreSQL does not automatically maintain statistics. One has to make
an explicit vacuum call to update the statistics. After statistics are
updated, the optimizer knows how many rows in the table, and can
better decide if it should use indices. Note that the optimizer does
not use indices in cases when the table is small because a sequential
scan would be faster.

Why save on micro-seconds to use sequential scan when the table is small and
later 'forgets' that the table is now big because you didn't vacuum analyze?
Why can't the optimizer just use indexes when they are there and not
'optimize' for special cases when the table is small to save micro-seconds?

Thomas

Andrew Snow

als@giskard.fl.net.au

almost 26 years ago

In reply to: lec (#3)

Re: Postgresqlism & Vacuum?

On Fri, 14 Apr 2000, Thomas wrote:

For large 24x7 installations, it's impossible to vacuum nightly because when
postgresql is vacuuming the table is locked up, to the end-user the database
has already hung.

That's right. I complained about this in a discussion with a Postgresql
developer, who assured me they were working towards a fix. I really don't
care whether the vacuuming is fixed so that it does not lock the table
completely, or that vacuuming becomes say, a once-a-month or less frequent
operation. For some reason everyone who is used to working with PostgreSQL
accepts the fact that you have to vacuum nightly - to outsiders it seems
like a major flaw with the system.

There has been effort to speed up the vacuuming process, but this isn't the
cure. I believe the fault lies on the optimizer.

Sure, the vacuum process speed is fine for small tables, but what about the
big ones where the table gets locked for 5 minutes? What a joke!

Why save on micro-seconds to use sequential scan when the table is small and
later 'forgets' that the table is now big because you didn't vacuum analyze?
Why can't the optimizer just use indexes when they are there and not
'optimize' for special cases when the table is small to save micro-seconds?

Well its more than microseconds I presume, as opening indexes involves
opening files, which takes milliseconds rather than microseconds.

Andrew.

Andrew Snow

als@giskard.fl.net.au

almost 26 years ago

In reply to: Andrew Snow (#4)

Re: Postgresqlism & Vacuum?

On Fri, 14 Apr 2000, Thomas wrote:

For large 24x7 installations, it's impossible to vacuum nightly because when
postgresql is vacuuming the table is locked up, to the end-user the database
has already hung.

There has been effort to speed up the vacuuming process, but this isn't the
cure. I believe the fault lies on the optimizer.

Sure, the vacuum process speed is fine for small tables, but what about the
big ones where the table gets locked for 5 minutes? What a joke!

Why save on micro-seconds to use sequential scan when the table is small and
later 'forgets' that the table is now big because you didn't vacuum analyze?
Why can't the optimizer just use indexes when they are there and not
'optimize' for special cases when the table is small to save micro-seconds?

Well its more than microseconds I presume, as opening indexes involves
opening files, which takes milliseconds rather than microseconds.

Andrew.

Import Notes

Resolved by subject fallback

Matthew Arnison

matthewa@physics.usyd.edu.au

almost 26 years ago

In reply to: Ed Loehr (#2)

Re: database file size bloat

the bloat is a big problem. i just checked it again, and the db has
balloooned to 20 megs again, with i think 2650 unused pages. this is after
vacuuming it last night. i guess we need to setup the vacuum script to run
every hour. i am worried about this locking out users during the
vacuuming, although i guess if it happens more often, it should take less
time.

meanwhile, as for upgrading, i think i'll try 6.5.3 first.

version 7 is still in beta. is it atleast as stable as 6.5.3? is it
atleast as fast as 6.5.3?

this is a live site allright.

thanks for your advice,
matthew.

On Thu, 13 Apr 2000, Ed Loehr wrote:

Show quoted text

Matthew Arnison wrote:

three times now this week (on two different servers) the raw database on
disk has ballooned in size, from about 10 megs to 50 megs in two cases,
and from about 10 megs to 250 megs in another case.

a VACUUM VERBOSE ANALYZE; cleans it back down to the proper size, but
meanwhile all the queries take half a minute, instead of less than a
second. and our load average skyrockets.

Hi Matthew,

I have no explanation for the bloat, but it is a well-known "postgresqlism"
that you should consider running vacuum analyze at least nightly, possibly
more frequently. [I run it hourly.]

Also, there are about 300 reasons to consider upgrading to 7.0, most having
to do with bug fixes and performance improvements. Unfortunately, there
may be a few incompatibilities (particularly in some pl/pgsql
incantations), so don't assume a seamless upgrade.

Regards,
Ed Loehr

Brian Neal

aceface@bellsouth.net

almost 26 years ago

In reply to: Matthew Arnison (#6)

Re: database file size bloat

Maybe you might want to try out MySQL? A little while ago, I compared both
MySQL and PostgreSQL to see how they stacked up (for my purposes, anyway). I
came to the conclusion that while MySQL is a very fast read-only database, it
doesn't support transactions, row-level locks, stored-procedures, sub-selects,
etc. PostgreSQL has a lot more basic database support, but it is harder to
install and maintain (in my opinion), has worse documentation, and a number of
interesting quirks...for example, the fixed-size row limitation that can only be
changed by a recompilation, or the VACUUM problem described here. Other issues
I had included the way the backend seemed to work...it is certainly very
demanding when it comes to shared memory, and I had concerns about the process
pool (whether or not pg-sql could handle enough connections) instead of threads,
which most other databases seem to use.

MySQL is an easier installation, requires less maintenance, doesn't have
row-size limitations, and is fully threaded. PostgreSQL supports a great deal
of basic SQL functionality that MySQL doesn't. MySQL is good for read-only
databases because it seems to be rather ineffective when it comes to concurrent
writes to the same table (either lock the whole table or lock nothing at all)
and no commit/rollback. PostgreSQL seems to offer what MySQL lacks, but in
reality it also lacks a lot of what MySQL has.

In my case, I am still looking, but maybe there is a more immediate solution out
there for you. ;)

-Brian

Date: Fri, 14 Apr 2000 16:29:23 +1000 (EST)
From: Matthew Arnison <matthewa@physics.usyd.edu.au>
To: Ed Loehr <eloehr@austin.rr.com>
cc: pgsql-general@postgresql.org, Rabble-Rouser <rabble@protest.net>, Manse

Jacobi <jacobi@freespeech.org>

Show quoted text

Subject: Re: [GENERAL] database file size bloat
MIME-Version: 1.0
X-Mailing-List: pgsql-general@postgresql.org
X-UIDL: da0ddbd3341ee90e18bd247f40f6bffe

the bloat is a big problem. i just checked it again, and the db has
balloooned to 20 megs again, with i think 2650 unused pages. this is after
vacuuming it last night. i guess we need to setup the vacuum script to run
every hour. i am worried about this locking out users during the
vacuuming, although i guess if it happens more often, it should take less
time.

meanwhile, as for upgrading, i think i'll try 6.5.3 first.

version 7 is still in beta. is it atleast as stable as 6.5.3? is it
atleast as fast as 6.5.3?

this is a live site allright.

thanks for your advice,
matthew.

On Thu, 13 Apr 2000, Ed Loehr wrote:

Matthew Arnison wrote:

three times now this week (on two different servers) the raw database on
disk has ballooned in size, from about 10 megs to 50 megs in two cases,
and from about 10 megs to 250 megs in another case.

a VACUUM VERBOSE ANALYZE; cleans it back down to the proper size, but
meanwhile all the queries take half a minute, instead of less than a
second. and our load average skyrockets.

Hi Matthew,

I have no explanation for the bloat, but it is a well-known "postgresqlism"
that you should consider running vacuum analyze at least nightly, possibly
more frequently. [I run it hourly.]

Also, there are about 300 reasons to consider upgrading to 7.0, most having
to do with bug fixes and performance improvements. Unfortunately, there
may be a few incompatibilities (particularly in some pl/pgsql
incantations), so don't assume a seamless upgrade.

Regards,
Ed Loehr

Import Notes

Resolved by subject fallback

Lincoln Yeoh

lylyeoh@mecomb.com

almost 26 years ago

In reply to: lec (#3)

Re: Postgresqlism & Vacuum?

At 01:13 PM 14-04-2000 +0800, Thomas wrote:

There has been effort to speed up the vacuuming process, but this isn't the
cure. I believe the fault lies on the optimizer.

For eg, in Bruce Momjian's FAQ 4.9:

PostgreSQL does not automatically maintain statistics. One has to make
an explicit vacuum call to update the statistics. After statistics are
updated, the optimizer knows how many rows in the table, and can
better decide if it should use indices. Note that the optimizer does
not use indices in cases when the table is small because a sequential
scan would be faster.

Is it too difficult/expensive for Postgresql to keep track of how many
committed rows there are in each table? Then count(*) of the whole table
could be faster too.

Since it's just for optimization it could perhaps keep a rough track of how
many rows would be selected for the past X indexed searches of a table, so
as to better decide which index to use first. Right now it seems like the
optimizer can't learn a thing till the database takes a nap and dreams
about statistics. I prefer the database to be able to learn a few things
before having to take a nap. And then maybe it will only need to take a nap
once every few weeks/months.

Also it's better for the optimizer to be good at figuring which index to
use, than figure whether to use indexes at all. Because in most cases the
people creating indexes on tables _should_ know whether to use indexes at
all. So if there's an index use it. So what if it's a bit slower when
things are small. I put in indexes to make sure that things are still ok
when things get big!

How many people care about the "slow down" when things are small? It's
still fast! If things are going to stay small, then the database admin
should just drop the index.

Often predictable degradation is more useful than academically optimum.

Cheerio,

Link.

Andy Lewis

alewis@recruitersonline.com

almost 26 years ago

In reply to: Lincoln Yeoh (#8)

Re: Postgresqlism & Vacuum?

I'd also like to hear from anyone on the original posters topic of the "24
hour shop".

I too am in that same boat. I have a DB with 7-8 million records on a Dual
550 with 512Meg Ram and 1gig swap and it takes vacuum 10 - 15 minutes each
evening to run.

Users think the site is hosed and management isn't exactly happy about it
either.

There is one DB on the machine that has two tables, one table has 2
columns and the other has about 25 columns.

I'd think some how there could be a way to vacuum without having to lock
up the entire DB.

Andy

On Fri, 14 Apr 2000, Lincoln Yeoh wrote:

Show quoted text

At 01:13 PM 14-04-2000 +0800, Thomas wrote:

There has been effort to speed up the vacuuming process, but this isn't the
cure. I believe the fault lies on the optimizer.

For eg, in Bruce Momjian's FAQ 4.9:

PostgreSQL does not automatically maintain statistics. One has to make
an explicit vacuum call to update the statistics. After statistics are
updated, the optimizer knows how many rows in the table, and can
better decide if it should use indices. Note that the optimizer does
not use indices in cases when the table is small because a sequential
scan would be faster.

Is it too difficult/expensive for Postgresql to keep track of how many
committed rows there are in each table? Then count(*) of the whole table
could be faster too.

Since it's just for optimization it could perhaps keep a rough track of how
many rows would be selected for the past X indexed searches of a table, so
as to better decide which index to use first. Right now it seems like the
optimizer can't learn a thing till the database takes a nap and dreams
about statistics. I prefer the database to be able to learn a few things
before having to take a nap. And then maybe it will only need to take a nap
once every few weeks/months.

Also it's better for the optimizer to be good at figuring which index to
use, than figure whether to use indexes at all. Because in most cases the
people creating indexes on tables _should_ know whether to use indexes at
all. So if there's an index use it. So what if it's a bit slower when
things are small. I put in indexes to make sure that things are still ok
when things get big!

How many people care about the "slow down" when things are small? It's
still fast! If things are going to stay small, then the database admin
should just drop the index.

Often predictable degradation is more useful than academically optimum.

Cheerio,

Link.

#10

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: lec (#3)

Re: Postgresqlism & Vacuum?

Ed Loehr wrote:

... it is a well-known "postgresqlism"
that you should consider running vacuum analyze at least nightly, possibly
more frequently. [I run it hourly.]

I think there must be something wrong with the optimiser that it's
"postgresqlism" that you must vacuum analyze frequently. Just as an example,
for Clipper (dBase compiler), it's Clipperism that you must re-index if you
cannot locate some records just because the indexing module screws up.

Vacuum collects stats on table size on every run. Vacuum analyze every
night is a waste unless the tables are really changing dramatically
every day.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#11

Peter Eisentraut

peter_e@gmx.net

almost 26 years ago

In reply to: lec (#3)

Re: Postgresqlism & Vacuum?

On Fri, 14 Apr 2000, Thomas wrote:

I think there must be something wrong with the optimiser that it's
"postgresqlism" that you must vacuum analyze frequently.

One thing that is not widely known is that vacuum actually has two
orthogonal tasks: garbage collection and statistics collection (only when
you ANALYZE). The fact that this is combined in one command is a
historical artifact, and there are some voices that want to separate the
commands.

The way I see it, if you have enough disk space you never have to run
vacuum to garbage collect. It might lead to obvious problems when the heap
files get so large that it takes more time to physically access them. The
alternative is to garbage collect on each transaction commit but that
bears its own set of performance implications.

The analyze part would probably not need an exclusive lock on the table
but the vacuum certainly does.

--
Peter Eisentraut Sernanders vï¿½g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#12

Frank Bax

fbax@execulink.com

almost 26 years ago

In reply to: Andy Lewis (#9)

Re: Postgresqlism & Vacuum?

At 06:13 AM 4/14/00 -0500, you wrote:

I'd think some how there could be a way to vacuum without having to lock
up the entire DB.

From http://www.postgresql.org/docs/user/sql-vacuum.htm

VACUUM serves two purposes in Postgres as both a means to reclaim
storage and also a means to collect information for the optimizer.

I'm guessing here, but it would seem to me that once the 'reclaim' portion
was written, it probably seemed like a good a place as any to put the
stat-collecting code? As long as the entire database was being scanned
anyway, why not collect statistics.

Perhaps its time for the two functions to be separated - controlled by an
option?
Perhaps VACUUM STATONLY could collect stats, not lock table and not reclaim
space.

Actually, I'm thinking any seq-scan could collect the stats on the way
through?

Frank

#13

Ed Loehr

eloehr@austin.rr.com

almost 26 years ago

In reply to: Bruce Momjian (#10)

Re: Postgresqlism & Vacuum?

Bruce Momjian wrote:

Ed Loehr wrote:

... it is a well-known "postgresqlism"
that you should consider running vacuum analyze at least nightly, possibly
more frequently. [I run it hourly.]

I think there must be something wrong with the optimiser that it's
"postgresqlism" that you must vacuum analyze frequently. Just as an example,
for Clipper (dBase compiler), it's Clipperism that you must re-index if you
cannot locate some records just because the indexing module screws up.

Vacuum collects stats on table size on every run. Vacuum analyze every
night is a waste unless the tables are really changing dramatically
every day.

Agreed. My tables are changing dramatically every day under normal usage.
Ideally, vacuuming would be auto-triggered after so many
inserts/updates/deletes.

[I neglected to mention that I originally started running vacuum hourly
because it was the only way to prevent a number of bugs in 6.5.*.]

Maybe the docs need to be updated?

"We recommend that active production databases be
cleaned nightly, in order to keep statistics relatively
current."

- http://www.postgresql.org/docs/postgres/sql-vacuum.htm

Regards,
Ed Loehr

#14

Ed Loehr

eloehr@austin.rr.com

almost 26 years ago

In reply to: Matthew Arnison (#6)

Re: database file size bloat

Matthew Arnison wrote:

the bloat is a big problem. i just checked it again, and the db has
balloooned to 20 megs again, with i think 2650 unused pages. this is after
vacuuming it last night. i guess we need to setup the vacuum script to run
every hour. i am worried about this locking out users during the
vacuuming, although i guess if it happens more often, it should take less
time.

I should add that my vacuum runs don't take very long (< 10 seconds). I
would have to consider other alternatives if it took much longer...

meanwhile, as for upgrading, i think i'll try 6.5.3 first.

version 7 is still in beta. is it atleast as stable as 6.5.3? is it
atleast as fast as 6.5.3?

Beta3 is more stable and much faster, IMO. Haven't tried beta5.

#15

Ed Loehr

eloehr@austin.rr.com

almost 26 years ago

In reply to: Andrew Snow (#5)

Re: Postgresqlism & Vacuum?

Andrew Snow wrote:

On Fri, 14 Apr 2000, Thomas wrote:

For large 24x7 installations, it's impossible to vacuum nightly because when
postgresql is vacuuming the table is locked up, to the end-user the database
has already hung.

That's right. I complained about this in a discussion with a Postgresql
developer, who assured me they were working towards a fix. I really don't
care whether the vacuuming is fixed so that it does not lock the table
completely, or that vacuuming becomes say, a once-a-month or less frequent
operation. For some reason everyone who is used to working with PostgreSQL
accepts the fact that you have to vacuum nightly - to outsiders it seems
like a major flaw with the system.

The vacuum requirement is a bummer. After working with pg for 9 months
now, I consider it a major opportunity for improvement. I wish I had the
time...

Regards,
Ed Loehr

#16

Stephen J Lombardo

lombardo@mac.com

almost 26 years ago

In reply to: lec (#3)

Re: Postgresqlism & Vacuum?

I think there must be something wrong with the optimiser that it's
"postgresqlism" that you must vacuum analyze frequently. Just as an example,
for Clipper (dBase compiler), it's Clipperism that you must re-index if you
cannot locate some records just because the indexing module screws up.

For large 24x7 installations, it's impossible to vacuum nightly because when
postgresql is vacuuming the table is locked up, to the end-user the database
has already hung.

There has been effort to speed up the vacuuming process, but this isn't the
cure. I believe the fault lies on the optimizer.

For eg, in Bruce Momjian's FAQ 4.9:

PostgreSQL does not automatically maintain statistics. One has to make
an explicit vacuum call to update the statistics. After statistics are
updated, the optimizer knows how many rows in the table, and can
better decide if it should use indices. Note that the optimizer does
not use indices in cases when the table is small because a sequential
scan would be faster.

Why save on micro-seconds to use sequential scan when the table is small and
later 'forgets' that the table is now big because you didn't vacuum analyze?
Why can't the optimizer just use indexes when they are there and not
'optimize' for special cases when the table is small to save micro-seconds?

Because small is a relative term. You will notice that Bruce does not
say "where a table is less than 100 tuples" or something like that. And
because in the end you would probably waste significantly more time than a
few micro-seconds. Consider a table where you have some round number of
tuples, say 100,000. Suppose you had b-tree indexes on two attributes,
employee_id (primary key) and last_name. Now if you were to run a query to
look up an employee by the primary key you would surly want to use the
index. Assume that it would take 3 disk accesses to search the index, and
one to fetch the data page from the heap. So you have a total of 4 disk
accesses to search on primary key and retrieve on row. Now suppose you were
going to run a query that would return a significant number of rows, lets
say half the table (50,000). Now if the optimizer chose to use the index on
that query it would take 4 disk access to locate each and every row (3 to
search index, 1 to grab data page). So if the query ran using the index it
would use 200,000 (50,000 * 4) disk accesses (Worst case scenario of course.
Using CLUSTER could improve the efficiency). Lets assume that the average
size of a tuple is 500k. So PostgreSQL would pack about 16 tuples into a
single page. Therefore doing a sequential search on the table would require
100,000/16, or 6250 disk accesses. Depending on the speed of your drive this
could make a big difference. Suppose the large query was run only 10 times a
day, that would waste around 2 million disk accesses. Now if you were using
a join performance would suffer even more.
The job of the optimizer is to make educated decisions about how to run
a query. Stats will help it out significantly, but it is expensive to
maintain statistics on a running database and it would decrease overall
performace. Instead the answer is to collect statistics periodically. There
is reasoning behind this to. Consider a table where you have 1,000,000
tuples. One of the attributes is called state. Currently there are only 5
states in the database. A query is run like this:

SELECT state FROM table_name WHERE state='NY';

The optimizer will see if it has any statistics on this table. If not it
will make a guess at how many rows are returned. So the optimizer guesses
that 1% of the table, or 10,000 rows, will be returned. Then it will use
that number to asses how to run the query. Now if it had statistics on the
table the optimizer would know that there were only 5 different values in
the states column of the table. So the optimizer would assume that 20% of
the table would be returned from the query. It is likely that the optimizer
will choose a very different plan when it thinks that 200,000 rows will be
returned.
You can be confident that the fine PostgreSQL developers have done a
good job with the optimizer. There are reasons that things are done the way
they are, but they might not be immediatly apparent.

Cheers,
Stephen

#17

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: Frank Bax (#12)

Re: Postgresqlism & Vacuum?

From http://www.postgresql.org/docs/user/sql-vacuum.htm

VACUUM serves two purposes in Postgres as both a means to reclaim
storage and also a means to collect information for the optimizer.

I'm guessing here, but it would seem to me that once the 'reclaim' portion
was written, it probably seemed like a good a place as any to put the
stat-collecting code? As long as the entire database was being scanned
anyway, why not collect statistics.

Yes, that was the idea. While doing one, why not do the other.

Perhaps its time for the two functions to be separated - controlled by an
option?
Perhaps VACUUM STATONLY could collect stats, not lock table and not reclaim
space.

Makes sense.

Actually, I'm thinking any seq-scan could collect the stats on the way
through?

We have thought about that, at least to count the number of rows.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#18

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: Ed Loehr (#13)

Re: Postgresqlism & Vacuum?

Vacuum collects stats on table size on every run. Vacuum analyze every
night is a waste unless the tables are really changing dramatically
every day.

Agreed. My tables are changing dramatically every day under normal usage.
Ideally, vacuuming would be auto-triggered after so many
inserts/updates/deletes.

We have thought of that too. Some vacuum option that would do stats
only if X % of the table had changed.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#19

Bruce Momjian

bruce@momjian.us

almost 26 years ago

In reply to: Ed Loehr (#13)

Re: Postgresqlism & Vacuum?

Maybe the docs need to be updated?

"We recommend that active production databases be
cleaned nightly, in order to keep statistics relatively
current."

- http://www.postgresql.org/docs/postgres/sql-vacuum.htm

Thanks. Updated now.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#20

The Hermit Hacker

scrappy@hub.org

almost 26 years ago

In reply to: Andrew Snow (#4)

Re: Postgresqlism & Vacuum?

On Fri, 14 Apr 2000, Andrew Snow wrote:

On Fri, 14 Apr 2000, Thomas wrote:

For large 24x7 installations, it's impossible to vacuum nightly because when
postgresql is vacuuming the table is locked up, to the end-user the database
has already hung.

That's right. I complained about this in a discussion with a Postgresql
developer, who assured me they were working towards a fix. I really don't
care whether the vacuuming is fixed so that it does not lock the table
completely, or that vacuuming becomes say, a once-a-month or less frequent
operation. For some reason everyone who is used to working with PostgreSQL
accepts the fact that you have to vacuum nightly - to outsiders it seems
like a major flaw with the system.

Okay, this *used* to be a problem way way back, but I definitely don't
vacuum my databases nightly ... most times I don't do it until something
odd comes up that I figure that I may as well vacuum first to see if it
makes a differnece ...

vacuum'ng once a week, unless you one helluva insert/update/delete
intensive table ...