Why vacuum?

Started by bpalmerabout 25 years ago28 messages
#1bpalmer
bpalmer@crimelabs.net

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

Thanks,
- brandon

b. palmer, bpalmer@crimelabs.net
pgp: www.crimelabs.net/bpalmer.pgp5

#2Martin A. Marques
martin@math.unl.edu.ar
In reply to: bpalmer (#1)
Re: Why vacuum?

El Mi� 13 Dic 2000 16:41, bpalmer escribi�:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

I know nothing about Oracle, but I can tell you that Informix has an update
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during
updates and deletes, non the less, the tables get shrincked, so searches get
faster.

Saludos... :-)

--
System Administration: It's a dirty job,
but someone told I had to do it.
-----------------------------------------------------------------
Mart�n Marqu�s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------

#3Alfred Perlstein
bright@wintelcom.net
In reply to: Martin A. Marques (#2)
Re: Why vacuum?

* Martin A. Marques <martin@math.unl.edu.ar> [001213 15:15] wrote:

El Mi� 13 Dic 2000 16:41, bpalmer escribi�:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

I know nothing about Oracle, but I can tell you that Informix has an update
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during
updates and deletes, non the less, the tables get shrincked, so searches get
faster.

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete. If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

http://people.freebsd.org/~alfred/vacfix/

It has some tarballs that have patches to speed up vacuum depending
on how you access your tables you can see up to a 20x reduction in
vacuum time.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#4Daniele Orlandi
daniele@orlandi.com
In reply to: bpalmer (#1)
Re: Why vacuum?

bpalmer wrote:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

Hi,

I'm one of the people beeing slightly bitten by the current vacuum
behaviour :), so i take the chance to add my suggestions to this
question.

FWIW, my thought is about a vacuumer process that, in background, scans
each table for available blocks (for available I mean a block full of
deleted rows whose tid is commited) and fills a cache of those blocks
available to the backends.

Whenever a backend needs to allocate a new block it looks for a free
block in the cache, if it finds any, it can use it, else it proceeds as
usual appending the block at the tail.

The vacuumer would run with a very low priority, so that it doesn't suck
precious CPU and I/O when the load on the machine is high.

A small flag on each table would avoid the vacuumer to scan the table if
no empty block is found and no tuple has been deleted.

Ok, now tell me where this is badly broken :))

Just my .02 euro :)

Bye!

--
Daniele Orlandi

#5xuyifeng
xyf@stocke.com
In reply to: bpalmer (#1)
Re: Why vacuum?

I have this nasty problem too, in early time, I don't know the problem, but we used it for a while,
than we found our table growing too fast without insert any record( we use update), this behaviour
most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record
space and full fill your hard disk after several hours, the nasty vaccum block any other users to operate
on table, this is a big problem for a large table, because it will block tooo long to let other user to run
query. we have a project affected by this problem, and sadly we decide to use closure source database
- SYBASE on linux, we havn't any other selections. :(

note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds,
not block any other user, this is pretty. ya, what's good technology!

XuYifeng

----- Original Message -----
From: Martin A. Marques <martin@math.unl.edu.ar>
To: bpalmer <bpalmer@crimelabs.net>; <pgsql-hackers@postgresql.org>
Sent: Thursday, December 14, 2000 7:08 AM
Subject: Re: [HACKERS] Why vacuum?

El Mi� 13 Dic 2000 16:41, bpalmer escribi�:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

I know nothing about Oracle, but I can tell you that Informix has an update
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during
updates and deletes, non the less, the tables get shrincked, so searches get
faster.

Saludos... :-)

--
System Administration: It's a dirty job,
but someone told I had to do it.
-----------------------------------------------------------------
Mart�n Marqu�s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------

#6xuyifeng
jamexu@telekbird.com.cn
In reply to: bpalmer (#1)
Re: Why vacuum?

I have this nasty problem too, in early time, I don't know the problem, but we used it for a while,
than we found our table growing too fast without insert any record( we use update), this behaviour
most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record
space and full fill your hard disk after several hours, the nasty vaccum block any other users to operate
on table, this is a big problem for a large table, because it will block tooo long to let other user to run
query. we have a project affected by this problem, and sadly we decide to use closure source database
- SYBASE on linux, we havn't any other selections. :(

note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds,
not block any other user, this is pretty. ya, what's good technology!

XuYifeng

----- Original Message -----
From: Martin A. Marques <martin@math.unl.edu.ar>
To: bpalmer <bpalmer@crimelabs.net>; <pgsql-hackers@postgresql.org>
Sent: Thursday, December 14, 2000 7:08 AM
Subject: Re: [HACKERS] Why vacuum?

El Mi� 13 Dic 2000 16:41, bpalmer escribi�:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

I know nothing about Oracle, but I can tell you that Informix has an update
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during
updates and deletes, non the less, the tables get shrincked, so searches get
faster.

Saludos... :-)

--
System Administration: It's a dirty job,
but someone told I had to do it.
-----------------------------------------------------------------
Mart�n Marqu�s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------

#7Alfred Perlstein
bright@wintelcom.net
In reply to: xuyifeng (#5)
Re: Why vacuum?

* xuyifeng <xyf@stocke.com> [001213 18:54] wrote:

I have this nasty problem too, in early time, I don't know the problem, but we used it for a while,
than we found our table growing too fast without insert any record( we use update), this behaviour
most like M$ MSACCESS database I had used a long time ago which don't reuse deleted record
space and full fill your hard disk after several hours, the nasty vaccum block any other users to operate
on table, this is a big problem for a large table, because it will block tooo long to let other user to run
query. we have a project affected by this problem, and sadly we decide to use closure source database
- SYBASE on linux, we havn't any other selections. :(

note that SYBASE and Informix both have 'update statistics' command, but they run it fast in seconds,
not block any other user, this is pretty. ya, what's good technology!

http://people.freebsd.org/~alfred/vacfix/

-Alfred

#8Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Alfred Perlstein (#7)
RE: Why vacuum?

But why? I don't know of other databases that need to be 'vacuum'ed. Do
all others just do it internaly on a regular basis?

What am I missing here?

Plenty of other databases need to be 'vacuumed'. For instance, if you have
an ms access database with 5 MB of data in it, and then delete all the data,
leaving only the forms, etc - you will be left with a 5MB mdb file still!

If you then run 'Compact Database' (which is another word for 'vacuum'), the
mdb file will be reduced down to 500k...

Chris

#9bpalmer
bpalmer@crimelabs.net
In reply to: Alfred Perlstein (#3)
Re: Why vacuum?

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete. If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

But why? I don't know of other databases that need to be 'vacuum'ed. Do
all others just do it internaly on a regular basis?

What am I missing here?

b. palmer, bpalmer@crimelabs.net
pgp: www.crimelabs.net/bpalmer.pgp5

#10The Hermit Hacker
scrappy@hub.org
In reply to: bpalmer (#9)
Re: Why vacuum?

On Wed, 13 Dec 2000, bpalmer wrote:

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete. If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

But why? I don't know of other databases that need to be 'vacuum'ed. Do
all others just do it internaly on a regular basis?

What am I missing here?

PgSQL's storage manager is currently such that it doesn't overwrite
'deleted' records, but just keeps appending to the end of the table
... so, for instance, a client of ours whose table had 5 records in it
that are updated *alot* grew a table to 64Meg that only contains ~8k worth
of data ...

vacuum'ng cleans out the cruft and truncates the file ...

vadim, for v7.2, is planning on re-writing the storage manager to do
proper overwriting of deleted space, which will reduce the requirement for
vacuum to almost never ...

#11Tim Allen
tim@proximity.com.au
In reply to: Christopher Kings-Lynne (#8)
RE: Why vacuum?

On Thu, 14 Dec 2000, Christopher Kings-Lynne wrote:

Plenty of other databases need to be 'vacuumed'. For instance, if you have
an ms access database with 5 MB of data in it, and then delete all the data,
leaving only the forms, etc - you will be left with a 5MB mdb file still!

If you then run 'Compact Database' (which is another word for 'vacuum'), the
mdb file will be reduced down to 500k...

Ooh... Hope MS Access isn't going to be taken seriously as a benchmark
here :-). The same is also true of MapInfo, by the way, but I'm not
holding that up as a benchmark either ;-).

Chris

Tim

--
-----------------------------------------------
Tim Allen tim@proximity.com.au
Proximity Pty Ltd http://www.proximity.com.au/
http://www4.tpg.com.au/users/rita_tim/

#12Andrew McMillan
Andrew@catalyst.net.nz
In reply to: Tim Allen (#11)
Re: Why vacuum?

Tim Allen wrote:

On Thu, 14 Dec 2000, Christopher Kings-Lynne wrote:

Plenty of other databases need to be 'vacuumed'. For instance, if you have
an ms access database with 5 MB of data in it, and then delete all the data,
leaving only the forms, etc - you will be left with a 5MB mdb file still!

If you then run 'Compact Database' (which is another word for 'vacuum'), the
mdb file will be reduced down to 500k...

Ooh... Hope MS Access isn't going to be taken seriously as a benchmark
here :-). The same is also true of MapInfo, by the way, but I'm not
holding that up as a benchmark either ;-).

:-)

I think that the non-overwriting storage manager actually bought a lot
more for PostgreSQL than it does for MS Access.

In earlier versions of PostgreSQL it was possible to "time travel" your
database and so run your query agains the database as it was at a
particular time / date. This advanced feature turns out to be useful in
very few situations, and is very expensive in terms of storage.

Still, "if it works, don't fix it" also applies. The PostgreSQL storage
manager is quite efficient as it is now, and most of us do have quiet
periods when we can safely vacuum the database, which is why it has had
to wait until now.

This will be quite a big change for 7.2, and getting the performance
right will no doubt challenge these hackers whom we are all greatly
indebted to.

Cheers,
Andrew.
--
_____________________________________________________________________
Andrew McMillan, e-mail: Andrew@catalyst.net.nz
Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington
Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267

#13Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Andrew McMillan (#12)
AW: Why vacuum?

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete. If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

But why? I don't know of other databases that need to be
'vacuum'ed. Do
all others just do it internaly on a regular basis?

What am I missing here?

They all have an overwriting storage manager. The current storage manager
of PostgreSQL is non overwriting, which has other advantages.

There seem to be 2 answers to the problem:
1. change to an overwrite storage manager
2. make vacuum concurrent capable

The tendency here seems to be towards an improved smgr.
But, it is currently extremely cheap to calculate where a new row
needs to be located physically. This task is *a lot* more expensive
in an overwrite smgr. It needs to maintain a list of pages with free slots,
which has all sorts of concurrency and persistence problems.

Andreas

#14Denis Perchine
dyp@perchine.com
In reply to: Daniele Orlandi (#4)
Re: Why vacuum?

Hello,

Another question about vacuum. Will vacuum/drop/create deadlocks be fixed in
7.0.x branch? That's really annoying. I cannot run vacuum automatically due
to this. Just a patch will be really great. Is it so hard to fix?

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

#15Martin A. Marques
martin@math.unl.edu.ar
In reply to: xuyifeng (#5)
Re: Why vacuum?

El Mi� 13 Dic 2000 22:24, xuyifeng escribi�:

I have this nasty problem too, in early time, I don't know the problem,
but we used it for a while, than we found our table growing too fast
without insert any record( we use update), this behaviour most like M$
MSACCESS database I had used a long time ago which don't reuse deleted
record space and full fill your hard disk after several hours, the nasty
vaccum block any other users to operate on table, this is a big problem
for a large table, because it will block tooo long to let other user to run
query. we have a project affected by this problem, and sadly we decide to
use closure source database - SYBASE on linux, we havn't any other
selections. :(

note that SYBASE and Informix both have 'update statistics' command, but
they run it fast in seconds, not block any other user, this is pretty. ya,
what's good technology!

I have to say that 'update statistics' does not take a few seconds if the
databases have grownto be a bit large. At least thats what I have seen.

Saludos... :-)

--
System Administration: It's a dirty job,
but someone told I had to do it.
-----------------------------------------------------------------
Mart�n Marqu�s email: martin@math.unl.edu.ar
Santa Fe - Argentina http://math.unl.edu.ar/~martin/
Administrador de sistemas en math.unl.edu.ar
-----------------------------------------------------------------

#16mlw
markw@mohawksoft.com
In reply to: bpalmer (#1)
Re: Why vacuum?

"Martin A. Marques" wrote:

El Mi� 13 Dic 2000 16:41, bpalmer escribi�:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

I know nothing about Oracle, but I can tell you that Informix has an update
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during
updates and deletes, non the less, the tables get shrincked, so searches get
faster.

While I would like Postgres to perform statistics, one and a while, on
it own. I like vacuum in general.

I would rather trade unused disk space for performace. The last thing
you need during high loads is the database thinking that it is time to
clean up.

--
http://www.mohawksoft.com

#17Hannu Krosing
hannu@tm.ee
In reply to: The Hermit Hacker (#10)
Re: Why vacuum?

The Hermit Hacker wrote:

On Wed, 13 Dec 2000, bpalmer wrote:

Yes, postgresql requires vacuum quite often otherwise queries and
updates start taking ungodly amounts of time to complete. If you're
having problems because vacuum locks up your tables for too long
you might want to check out:

But why? I don't know of other databases that need to be 'vacuum'ed. Do
all others just do it internaly on a regular basis?

What am I missing here?

PgSQL's storage manager is currently such that it doesn't overwrite
'deleted' records, but just keeps appending to the end of the table
... so, for instance, a client of ours whose table had 5 records in it
that are updated *alot* grew a table to 64Meg that only contains ~8k worth
of data ...

vacuum'ng cleans out the cruft and truncates the file ...

vadim, for v7.2, is planning on re-writing the storage manager to do
proper overwriting of deleted space, which will reduce the requirement for
vacuum to almost never ...

I hope that he does it in a way that allows it to retain the old
behaviour
for some tables if there is need for it.

Also, vacuum and analyze should be separated (i.e. one should be able to
analyze a table without vacuuming it.)

Maybe use "ALTER TABLE/DATABASE UPDATE STATISTICS" for VACUUM ANALYZE as
syntax.

Time travel is/was an useful feature that is difficult to emulate
efficiently
using "other" means like rules/triggers

------------
Hannu

#18Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Zeugswetter Andreas SB (#13)
Re: Why vacuum?

On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:

They all have an overwriting storage manager. The current storage manager
of PostgreSQL is non overwriting, which has other advantages.

There seem to be 2 answers to the problem:
1. change to an overwrite storage manager
2. make vacuum concurrent capable

The tendency here seems to be towards an improved smgr.
But, it is currently extremely cheap to calculate where a new row
needs to be located physically. This task is *a lot* more expensive
in an overwrite smgr. It needs to maintain a list of pages with free slots,
which has all sorts of concurrency and persistence problems.

Not to mention the recent thread here about people recovering data that
was accidently deleted, or from damaged db files: the old tuples serve
as redundant backup, in a way. Not a real compelling reason to keep a
non-overwriting smgr, but still a surprise bonus for those who need it.

Ross

#19Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Hannu Krosing (#17)
Re: Why vacuum?

On Thu, Dec 14, 2000 at 01:16:20PM +0000, Hannu Krosing wrote:

The Hermit Hacker wrote:

<snip>

vadim, for v7.2, is planning on re-writing the storage manager to do
proper overwriting of deleted space, which will reduce the requirement for
vacuum to almost never ...

I hope that he does it in a way that allows it to retain the old
behaviour
for some tables if there is need for it.

Here as well. The framework is still mostly there for multiple storage
managers: I hope Vadim takes advantage of it.

<snip>

Time travel is/was an useful feature that is difficult to emulate
efficiently using "other" means like rules/triggers

I've actually been doing this very thing this week. It's not _that_
horibble, but does interact really poorly with RI constraints: suddenly,
all those unique PK columns aren't so unique! This is probably the biggest
reason to do time travel in the backend. Having it on a per-table basis
would be cool.

Hmm, seems the biggest problem to doing it per table would be needing
a couple optional system attributes (e.g. tt_start and tt_stop),
and different indices, that know how to skip not-current tuples. Extra
syntax to do queries at a particular time in the past would be nice, but
not an inital requirement.

Sounds like there's something in common here with the per tuple CRC
discusson, as well as optional OID: a generic need for optional system
attributes.

Ross
--
Open source code is like a natural resource, it's the result of providing
food and sunshine to programmers, and then staying out of their way.
[...] [It] is not going away because it has utility for both the developers
and users independent of economic motivations. Jim Flynn, Sunnyvale, Calif.

#20Alfred Perlstein
bright@wintelcom.net
In reply to: Ross J. Reedstrom (#18)
Re: Why vacuum?

* Ross J. Reedstrom <reedstrm@rice.edu> [001214 07:57] wrote:

On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:

They all have an overwriting storage manager. The current storage manager
of PostgreSQL is non overwriting, which has other advantages.

There seem to be 2 answers to the problem:
1. change to an overwrite storage manager
2. make vacuum concurrent capable

The tendency here seems to be towards an improved smgr.
But, it is currently extremely cheap to calculate where a new row
needs to be located physically. This task is *a lot* more expensive
in an overwrite smgr. It needs to maintain a list of pages with free slots,
which has all sorts of concurrency and persistence problems.

Not to mention the recent thread here about people recovering data that
was accidently deleted, or from damaged db files: the old tuples serve
as redundant backup, in a way. Not a real compelling reason to keep a
non-overwriting smgr, but still a surprise bonus for those who need it.

One could make vacuum optional such that it either:

1) always overwrites
2) will not overwrite data until a vacuum is called (perhaps with
a date option to specify how much deleted data you wish to
reclaim) data can be marked free but not free for re-use
until vacuum is run.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#21Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Alfred Perlstein (#20)
AW: Why vacuum?

The tendency here seems to be towards an improved smgr.
But, it is currently extremely cheap to calculate where a new row
needs to be located physically. This task is *a lot* more expensive
in an overwrite smgr.

I don't agree. If (as I have proposed) the search is made in the
background by a low priority process, you just have to lookup a cache
entry to find out where to write.

If the priority is too low you will end up with the same behavior as current,
because the cache will be emptied by high priority multiple new rows,
thus writing to the end anyways. Conclusio: In those cases where overwrite would
be most advantageous (high volume modified table) your system won't work,
unless you resort to my concern and make it *very* expensive (=high priority).

Andreas

#22Daniele Orlandi
daniele@orlandi.com
In reply to: Zeugswetter Andreas SB (#13)
Re: Why vacuum?

"Ross J. Reedstrom" wrote:

Not to mention the recent thread here about people recovering data that
was accidently deleted, or from damaged db files: the old tuples serve
as redundant backup, in a way. Not a real compelling reason to keep a
non-overwriting smgr, but still a surprise bonus for those who need it.

The optimal would be a configurable behaviour. I wouldn't enable it on a
users table, neither on a log-type table (the former is a slowly
changing table, the second is a table with few updates/deletes), but a
fast-changing table like an active sessions table would benefit a lot.

Currently, my active sessions table grows by 100K every 20 seconds, I
have to constantly vacuum it to keep the things reasonable. Other tables
would benefit a lot, pg_listener for example.

Bye!

#23Alfred Perlstein
bright@wintelcom.net
In reply to: Zeugswetter Andreas SB (#21)
Re: Why vacuum?

* Daniele Orlandi <daniele@orlandi.com> [001214 09:10] wrote:

Zeugswetter Andreas SB wrote:

If the priority is too low you will end up with the same behavior as current,

Yes, and it is the intended behaviour. I'd use idle priority for it.

If you're talking about vacuum, you really don't want to do this,
what's going to happen is that since you have an exclusive lock on
the file during your vacuum and no way to do priority lending you
can deadlock.

because the cache will be emptied by high priority multiple new rows,
thus writing to the end anyways.

Yes, but this only happens when you don't have enought spare idle CPU
time. If you are in such situation for long periods, there's nothing you
can do, you already have problems.

My approach in winning here because it allows you to have bursts of CPU
utilization without being affected by the overhead of a overwriting smgr
that (without hacks) will always try to find available slots, even in
high load situations.

Conclusio: In those cases where overwrite would be most advantageous (high
volume modified table) your system won't work

Why ? I have plenty of CPU time available on my server, even if one of
my table is highly volatile, fast-changing.

When your table grows to be very large you'll see what we're talking
about.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#24Daniele Orlandi
daniele@orlandi.com
In reply to: Zeugswetter Andreas SB (#13)
Re: Why vacuum?

"Ross J. Reedstrom" wrote:

On Thu, Dec 14, 2000 at 12:07:00PM +0100, Zeugswetter Andreas SB wrote:

The tendency here seems to be towards an improved smgr.
But, it is currently extremely cheap to calculate where a new row
needs to be located physically. This task is *a lot* more expensive
in an overwrite smgr.

I don't agree. If (as I have proposed) the search is made in the
background by a low priority process, you just have to lookup a cache
entry to find out where to write.

It needs to maintain a list of pages with free slots,
which has all sorts of concurrency and persistence problems.

Concurrency is a problem, but a spinlock on a shared-memory table should
suffice in the majority of the cases[1]I believe that already there's a similar problem to synchronize the backends when the want to append a new page.. I may be wrong... but I think
it should be discussed.

[1]: I believe that already there's a similar problem to synchronize the backends when the want to append a new page.
backends when the want to append a new page.

Bye!

#25Alfred Perlstein
bright@wintelcom.net
In reply to: mlw (#16)
Re: Why vacuum?

* mlw <markw@mohawksoft.com> [001214 09:30] wrote:

"Martin A. Marques" wrote:

El Mi� 13 Dic 2000 16:41, bpalmer escribi�:

I noticed the other day that one of my pg databases was slow, so I ran
vacuum on it, which brought a question to mind: why the need? I looked
at my oracle server and we aren't doing anything of the sort (that I can
find), so why does pg need it? Any info?

I know nothing about Oracle, but I can tell you that Informix has an update
statistics, which I don't know if it's similar to vacuum, but....
What vacuum does is clean the database from rows that were left during
updates and deletes, non the less, the tables get shrincked, so searches get
faster.

While I would like Postgres to perform statistics, one and a while, on
it own. I like vacuum in general.

I would rather trade unused disk space for performace. The last thing
you need during high loads is the database thinking that it is time to
clean up.

Even worse is having to scan a file that has grown 20x the size
because you havne't vacuum'd in a while.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#26Daniele Orlandi
daniele@orlandi.com
In reply to: Zeugswetter Andreas SB (#21)
Re: AW: Why vacuum?

Zeugswetter Andreas SB wrote:

If the priority is too low you will end up with the same behavior as current,

Yes, and it is the intended behaviour. I'd use idle priority for it.

because the cache will be emptied by high priority multiple new rows,
thus writing to the end anyways.

Yes, but this only happens when you don't have enought spare idle CPU
time. If you are in such situation for long periods, there's nothing you
can do, you already have problems.

My approach in winning here because it allows you to have bursts of CPU
utilization without being affected by the overhead of a overwriting smgr
that (without hacks) will always try to find available slots, even in
high load situations.

Conclusio: In those cases where overwrite would be most advantageous (high
volume modified table) your system won't work

Why ? I have plenty of CPU time available on my server, even if one of
my table is highly volatile, fast-changing.

Bye!

#27Daniele Orlandi
daniele@orlandi.com
In reply to: Zeugswetter Andreas SB (#21)
Re: Why vacuum?

Alfred Perlstein wrote:

If you're talking about vacuum, you really don't want to do this,

No, I'm not talking about vacuum as it is intended now, it's only a
process that scans tables to find available blocks/tuples. It is
virtually optional, if it doesn't run, the database will behave just
like now.

what's going to happen is that since you have an exclusive lock on
the file during your vacuum and no way to do priority lending you
can deadlock.

No exclusive lock, it's just a reader.

When your table grows to be very large you'll see what we're talking
about.

I see this as an optimization issue. If the scanner isn't smart and
loses time scanning areas of the table that have not been emptied, you
go back to the current behaviour.

Bye!

#28Zeugswetter Andreas SB
ZeugswetterA@wien.spardat.at
In reply to: Daniele Orlandi (#27)
AW: AW: Why vacuum?

because the cache will be emptied by high priority multiple new rows,
thus writing to the end anyways.

Yes, but this only happens when you don't have enought spare idle CPU
time. If you are in such situation for long periods, there's nothing you
can do, you already have problems.

I think such a process would not need a lot of CPU, but a lot of IO. The common
schedulers do not take IO into account (at least not in the here needed sense),
thus you cannot use the process priority mechanism here :-(

An idea could be to only fill the freepage cache from pages that currently reside in the
page buffer. This would also somehow improve cache efficiency, since pages
that are often accessed would get a higher fill level.
A problem with this is, that an empty page (basically an optimal candidate for the list)
would not get into the freelist unless somebody does a seq scan on the table.

Andreas