Vacuum only with 20% old tuples
I suggest that we change vacuum to only move remove tuples if there is
more than 20% expired tuples.
When we do vacuum, we drop all indexes and recreate them.
This fixes the complaint about vacuum slowness when there are many
expired rows in the table. We know this is causes by excessive index
updates. It allows indexes to shrink (Jan pointed this out to me.) And
it fixes the TOAST problem with TOAST values in indexes.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I suggest that we change vacuum to only move remove tuples if there is
more than 20% expired tuples.
When we do vacuum, we drop all indexes and recreate them.
This fixes the complaint about vacuum slowness when there are many
expired rows in the table. We know this is causes by excessive index
updates. It allows indexes to shrink (Jan pointed this out to me.) And
it fixes the TOAST problem with TOAST values in indexes.
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...
I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.
regards, tom lane
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.
Well, I think we should do a sequential scan before starting vacuum to
find the number of expired rows.
Now that we are removing indexes, doing that to remove a few tuples is a
major waste. The user can not really know if the table is worth
vacuuming in normal use. They are just going to use the default. Now,
I think a FORCE option would be good, or the ability to change the 20%
default.
Remember, commercial db's don't even return unused space if you remove
all the rows in a table. At least Informix doesn't and I am sure there
are others. I like vacuum, but let's not make it do major hurtles for
small gain.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-----Original Message-----
From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
Behalf Of Tom LaneBruce Momjian <pgman@candle.pha.pa.us> writes:
I suggest that we change vacuum to only move remove tuples if there is
more than 20% expired tuples.When we do vacuum, we drop all indexes and recreate them.
This fixes the complaint about vacuum slowness when there are many
expired rows in the table. We know this is causes by excessive index
updates. It allows indexes to shrink (Jan pointed this out to me.) And
it fixes the TOAST problem with TOAST values in indexes.We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...
Is this topic independent of WAL in the first place ?
Regards.
Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...
Is this topic independent of WAL in the first place ?
Sure, unless Vadim sees some clever way of using WAL to eliminate
the need for versioned relations. But as far as I've seen in the
discussions, versioned relations are independent of WAL.
Basically what I want here is to build the new index relation as
a new file (set of files, if large) and then atomically commit it
as the new version of the index.
If we only want to solve the problem of rebuilding indexes, it's
probably not necessary to have true versions, because nothing outside
of pg_index refers to an index. You could build a complete new index
(new OID, new pg_class and pg_attribute entries, the whole nine yards)
as a new set of files, and delete the old index, and your commit of
this transaction would atomically replace the index. (Vacuuming
pg_index's own indexes this way might be a tad tricky though...)
But that approach doesn't solve the problem of making a CLUSTER
operation that really works the way it should. So I'd rather see us
put the effort into doing relation versions.
regards, tom lane
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...Is this topic independent of WAL in the first place ?
Sure, unless Vadim sees some clever way of using WAL to eliminate
the need for versioned relations. But as far as I've seen in the
discussions, versioned relations are independent of WAL.Basically what I want here is to build the new index relation as
a new file (set of files, if large) and then atomically commit it
as the new version of the index.
Hmm,your plan seems to need WAL.
We must postpone to build indexes until the end of tuple moving
in vacuum. Once tuple moving started,the consistency between
heap and indexes would be broken. Currently(without WAL) this
inconsistency could never be recovered in case of rollback.
If we only want to solve the problem of rebuilding indexes, it's
probably not necessary to have true versions, because nothing outside
of pg_index refers to an index. You could build a complete new index
(new OID, new pg_class and pg_attribute entries, the whole nine yards)
as a new set of files, and delete the old index, and your commit of
this transaction would atomically replace the index. (Vacuuming
pg_index's own indexes this way might be a tad tricky though...)
??? Don't pg_class and pg_attribute needs tricky handling either ?
Seems pg_class alone needs to be tricky when we use rel versioning.
Anyway we couldn't rely on indexes of currently vacuuming table.
I don't think it's easy to maintain indexes of pg_class,pg_indexes,
pg_atribute all together properly.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
Basically what I want here is to build the new index relation as
a new file (set of files, if large) and then atomically commit it
as the new version of the index.
Hmm,your plan seems to need WAL.
We must postpone to build indexes until the end of tuple moving
in vacuum. Once tuple moving started,the consistency between
heap and indexes would be broken. Currently(without WAL) this
inconsistency could never be recovered in case of rollback.
Why? The same commit that makes the new index valid would make the
tuple movements valid. Actually, the way VACUUM currently works,
the tuple movements have been committed before we start freeing index
entries anyway. (One reason VACUUM is so inefficient with indexes
is that there is a peak index usage where there are index entries for
*both* old and new tuple positions. I don't feel a need to change
that, as long as the duplicate entries are in the old index that
we're hoping to get rid of.)
this transaction would atomically replace the index. (Vacuuming
pg_index's own indexes this way might be a tad tricky though...)
??? Don't pg_class and pg_attribute needs tricky handling either ?
Seems pg_class alone needs to be tricky when we use rel versioning.
Could be. I think working through how we handle system tables and
indexes is the key stumbling block we've got to get past to have
versioning. I don't know quite how to do it, yet.
regards, tom lane
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
Basically what I want here is to build the new index relation as
a new file (set of files, if large) and then atomically commit it
as the new version of the index.Hmm,your plan seems to need WAL.
We must postpone to build indexes until the end of tuple moving
in vacuum. Once tuple moving started,the consistency between
heap and indexes would be broken. Currently(without WAL) this
inconsistency could never be recovered in case of rollback.Why? The same commit that makes the new index valid would make the
tuple movements valid.
Oops,I rememered I wasn't correct. Certainly it's not so dangerous as
I wrote. But there remains a possibilty that index tuples would point to
cleaned heap blocks unless we delete index tuples for those heap blocks.
Cleaned blocks would be reused by UPDATE operation.
Regards.
Hiroshi Inoue
Tom Lane wrote:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...Is this topic independent of WAL in the first place ?
Sure, unless Vadim sees some clever way of using WAL to eliminate
the need for versioned relations. But as far as I've seen in the
discussions, versioned relations are independent of WAL.Basically what I want here is to build the new index relation as
a new file (set of files, if large) and then atomically commit it
as the new version of the index.
What implicitly says we need to vacuum the toast relation
AFTER beeing completely done with the indices - in contranst
to what you said before. Otherwise, the old index (the
active one) would still refer to entries that don't exist any
more.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I suggest that we change vacuum to only move remove tuples if there is
more than 20% expired tuples.When we do vacuum, we drop all indexes and recreate them.
This fixes the complaint about vacuum slowness when there are many
expired rows in the table. We know this is causes by excessive index
updates. It allows indexes to shrink (Jan pointed this out to me.) And
it fixes the TOAST problem with TOAST values in indexes.We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.
I can see some value in having a _configurable_ threshold %age of
deletes before vacuum kicked in and attempted to shrink table/index
on-disk file sizes. This would let the end-user decide, and 20% is
probably a reasonable default, but if it isn't then changing a default
is easier to do down the track.
I can also see that it could be done with (perhaps) a modification to
VACUUM syntax, say:
VACUUM [VERBOSE] [SHRINK] ...
And I believe that the whole thing will go better if ANALYZE is taken
_out_ of vacuum, as was discussed on this list a month or two ago.
Cheers,
Andrew.
--
_____________________________________________________________________
Andrew McMillan, e-mail: Andrew@cat-it.co.nz
Catalyst IT Ltd, PO Box 10-225, Level 22, 105 The Terrace, Wellington
Me: +64 (21) 635 694, Fax: +64 (4) 499 5596, Office: +64 (4) 499 2267
I can see some value in having a _configurable_ threshold %age of
deletes before vacuum kicked in and attempted to shrink table/index
on-disk file sizes. This would let the end-user decide, and 20% is
probably a reasonable default, but if it isn't then changing a default
is easier to do down the track.I can also see that it could be done with (perhaps) a modification to
VACUUM syntax, say:
VACUUM [VERBOSE] [SHRINK] ...And I believe that the whole thing will go better if ANALYZE is taken
_out_ of vacuum, as was discussed on this list a month or two ago.
The analayze process no longer locks the table exclusively. It will be
made a separate command in 7.1, though an ANALYZE option will still be
avaiable in VACUUM.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Tue, 11 Jul 2000, Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I suggest that we change vacuum to only move remove tuples if there is
more than 20% expired tuples.When we do vacuum, we drop all indexes and recreate them.
This fixes the complaint about vacuum slowness when there are many
expired rows in the table. We know this is causes by excessive index
updates. It allows indexes to shrink (Jan pointed this out to me.) And
it fixes the TOAST problem with TOAST values in indexes.We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.
I wouldn't mind seeing some automagic vacuum happen *if* >20% expired
... but don't understand the limit when I tell it to vacuum either ...
how about leaving vacuum as is, but extend REINDEX so that it
drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that
now? From reading \h REINDEX, my thought is that it doesn't, but ...
On Tue, 11 Jul 2000, Bruce Momjian wrote:
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.Well, I think we should do a sequential scan before starting vacuum to
find the number of expired rows.Now that we are removing indexes, doing that to remove a few tuples is a
major waste. The user can not really know if the table is worth
vacuuming in normal use. They are just going to use the default. Now,
I think a FORCE option would be good, or the ability to change the 20%
default.Remember, commercial db's don't even return unused space if you remove
all the rows in a table. At least Informix doesn't and I am sure there
are others. I like vacuum, but let's not make it do major hurtles for
small gain.-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.I wouldn't mind seeing some automagic vacuum happen *if* >20% expired
... but don't understand the limit when I tell it to vacuum either ...
I am confused by your comment.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 12 Jul 2000, Bruce Momjian wrote:
I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.I wouldn't mind seeing some automagic vacuum happen *if* >20% expired
... but don't understand the limit when I tell it to vacuum either ...I am confused by your comment.
Make the backend reasonably intelligent ... periodically do a scan, as
you've suggested would be required for your above 20% idea, and if >20%
are expired records, auto-start a vacuum (settable, of course) ...
On Wed, 12 Jul 2000, Bruce Momjian wrote:
I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.I wouldn't mind seeing some automagic vacuum happen *if* >20% expired
... but don't understand the limit when I tell it to vacuum either ...I am confused by your comment.
Make the backend reasonably intelligent ... periodically do a scan, as
you've suggested would be required for your above 20% idea, and if >20%
are expired records, auto-start a vacuum (settable, of course) ...
Would be good if we could to vacuum without locking. We could find a
table when things are mostly idle, and it then.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 12 Jul 2000, Bruce Momjian wrote:
On Wed, 12 Jul 2000, Bruce Momjian wrote:
I do not see what your 20% idea has to do with this, though, nor
why it's a good idea. If I've told the thing to vacuum I think
it should vacuum. 20% of a big table could be a lot of megabytes,
and I don't want some arbitrary decision in the code about whether
I can reclaim that space or not.I wouldn't mind seeing some automagic vacuum happen *if* >20% expired
... but don't understand the limit when I tell it to vacuum either ...I am confused by your comment.
Make the backend reasonably intelligent ... periodically do a scan, as
you've suggested would be required for your above 20% idea, and if >20%
are expired records, auto-start a vacuum (settable, of course) ...Would be good if we could to vacuum without locking. We could find a
table when things are mostly idle, and it then.
Definitely :)
We can't "drop and recreate" without a solution to the relation
versioning issue (unless you are prepared to accept a nonfunctional
database after a failure partway through index rebuild on a system
table). I think we should do this, but it's not all that simple...Is this topic independent of WAL in the first place ?
Sure, unless Vadim sees some clever way of using WAL to eliminate
the need for versioned relations. But as far as I've seen in the
discussions, versioned relations are independent of WAL.
WAL can solve the versioned relations problem.
Remember that a sure new step in postmaster startup will be a rollforward of
the WAL,
since that will have the only sync write of our last txn's. Thus in this
step it can also
do any pending rename or delete of files. If a rename or delete fails we
bail out, since we don't want postmaster running under such circumstances
anyway.
Andreas
Import Notes
Resolved by subject fallback
-----Original Message-----
From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
Behalf Of The Hermit Hackerhow about leaving vacuum as is, but extend REINDEX so that it
drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that
now? From reading \h REINDEX, my thought is that it doesn't, but ...
As for user tables,REINDEX could do it already,i.e
REINDEX TABLE table_name FORCE; is possible under psql.
If REINDEX fails,PostgreSQL just ignores the indexes of the table
(i.e Indexscan is never applied) and REINDEX/VACUUM would
recover the state. Yes,VACUUM already has a hidden functionality
to reindex.
As for system indexes,you must shutdown postmaster and
invoke standalone postgres with -P option.
REINDEX DATABASE database_name FORCE; would
reindex(shrink) all system tables of the database.
It may be possible even under postmaster if REINDEX
never fails.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
Hiroshi Inoue wrote:
-----Original Message-----
From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
Behalf Of The Hermit Hackerhow about leaving vacuum as is, but extend REINDEX so that it
drops/rebuilds all indices on a TABLE | DATABASE? Or does it do that
now? From reading \h REINDEX, my thought is that it doesn't, but ...As for user tables,REINDEX could do it already,i.e
REINDEX TABLE table_name FORCE; is possible under psql.
If REINDEX fails,PostgreSQL just ignores the indexes of the table
(i.e Indexscan is never applied) and REINDEX/VACUUM would
recover the state. Yes,VACUUM already has a hidden functionality
to reindex.
Sorry, but there seem to be problems with that.
pgsql=# delete from t2;
DELETE 0
pgsql=# vacuum;
VACUUM
pgsql=# reindex table t2 force;
REINDEX
pgsql=# \c
You are now connected to database pgsql as user pgsql.
pgsql=# insert into t2 select * from t1;
FATAL 1: btree: failed to add item to the page
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
Happens too if I don't reconnect to the database between
REINDEX and INSERT. Also if I drop connection and restart
postmaster, so it shouldn't belong to old blocks hanging
aroung in the cache.
The interesting thing is that the btree index get's reset to
2 blocks. Need to dive into...
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #