For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
The manual is vague. Several threads about this, in language that is
ambiguous to me.
So a YES/NO question:
Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
reindex/cluster indexes?
Thanks.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Phoenix Kiula wrote:
The manual is vague. Several threads about this, in language that is
ambiguous to me.So a YES/NO question:
Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
reindex/cluster indexes?
If you overrun your max_fsm_pages, no:
else yes;
Sincerely,
Joshua D. Drake
Thanks.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
- --
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997 http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG717QATb/zqfZUUQRAh6uAJ9CGXbA2BxXvMbSZP9Gv8gI9QBkXgCePhqe
6aS3fp60g7YrWECspTVcxyE=
=u2o/
-----END PGP SIGNATURE-----
On Mon, 2007-09-17 at 22:14 -0700, Joshua D. Drake wrote:
Phoenix Kiula wrote:
So a YES/NO question:
Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
reindex/cluster indexes?If you overrun your max_fsm_pages, no:
else yes;
Maybe my english suck, but I don't understand the above answer.
If I overrun my Max_FSM_pages then a vacuum analyse is enough to return
it back to normal.
If I _didn't_ overrun my fsm, then a reindex/cluster is necessary.
Did I get that right? (I feel it's wrong and a reindex/cluster is needed
only when I overrun my max_fsm)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ow Mun Heng wrote:
On Mon, 2007-09-17 at 22:14 -0700, Joshua D. Drake wrote:
Phoenix Kiula wrote:
So a YES/NO question:
Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I
reindex/cluster indexes?If you overrun your max_fsm_pages, no:
else yes;Maybe my english suck, but I don't understand the above answer.
If I overrun my Max_FSM_pages then a vacuum analyse is enough to return
it back to normal.
If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
the issue.
Joshua D. Drake
If I _didn't_ overrun my fsm, then a reindex/cluster is necessary.
Did I get that right? (I feel it's wrong and a reindex/cluster is needed
only when I overrun my max_fsm)
- --
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997 http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG72kwATb/zqfZUUQRAqNMAJsFjqWirgGF+VlEIwaVDnxBAefeSwCfesD1
osqiudjcEY/tyibvNZRJ/UU=
=apjz
-----END PGP SIGNATURE-----
2007/9/18, Joshua D. Drake <jd@commandprompt.com>:
If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
the issue.
Are you sure? I have a situation where above is no true. postgres
version 8.1.8. while vacuum verbose says:
INFO: free space map contains 2329221 pages in 490 relations
DETAIL: A total of 2345744 page slots are in use (including overhead).
2345744 page slots are required to track all free space.
Current limits are: 10000000 page slots, 1000 relations, using 58698 KB.
... and we have constant problem with index bloat and need to REINDEX
frequently.
the database is very redundant and has quite hight data retention rate
(it's an ecommerce site)
--
Filip Rembiałkowski
"Filip Rembiałkowski" <plk.zuber@gmail.com> wrote:
2007/9/18, Joshua D. Drake <jd@commandprompt.com>:
If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If
you do not overrun your max_fsm_pages, yes vacuum analyze can deal with
the issue.Are you sure? I have a situation where above is no true. postgres
version 8.1.8. while vacuum verbose says:INFO: free space map contains 2329221 pages in 490 relations
DETAIL: A total of 2345744 page slots are in use (including overhead).
2345744 page slots are required to track all free space.
Current limits are: 10000000 page slots, 1000 relations, using 58698 KB.... and we have constant problem with index bloat and need to REINDEX
frequently.the database is very redundant and has quite hight data retention rate
(it's an ecommerce site)
I've been involved in a number of the discussions on this, and I think
part of the confusion stems from the fact that "index bloat" is an
ambiguous term.
If the index gets large enough that it no longer fits in shared memory,
and reindexing it will reduce its size to where it _will_ fit in shared
memory, then the index _could_ be said to be "bloated".
However, an equally valid solution to that problem is to increase the
amount of shared memory available (possibly by adding RAM).
Unfortunately, folks like Phoenix are looking for yes/no answers, and
with many of these questions, the _correct_ answer is "it depends on
your workload"
If you find that reindexing improves performance, then you should
investigate further. Depending on the exact nature of the problem,
there are many possible solutions, three that come to mind:
* Add RAM/SHM
* REINDEX on a regular schedule
* (with newer version) reduce the fill factor and REINDEX
--
Bill Moran
http://www.potentialtech.com
On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote:
* (with newer version) reduce the fill factor and REINDEX
What is fill factor?
On 9/18/07, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote:
On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote:
* (with newer version) reduce the fill factor and REINDEX
What is fill factor?
See "Index Storage Parameters":
http://www.postgresql.org/docs/8.2/static/sql-createindex.html
Unfortunately, folks like Phoenix are looking for yes/no answers, and
with many of these questions, the _correct_ answer is "it depends on
your workload"
I wanted merely to simplify the advice that gets dispensed on this
list, often conflicting to novice ears like mine. So I appreciate your
notes very much.
If you find that reindexing improves performance, then you should
investigate further. Depending on the exact nature of the problem,
there are many possible solutions, three that come to mind:
* Add RAM/SHM
Can I add SHM with merely by managing the entry in sysctl.conf? My
current values:
kernel.shmmax = 536870912
kernel.shmall = 536870912
My "shared_buffers" in postgresql.conf is "20000". From the website
http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
should be sharedbuffer*8192, so I suppose my shmmax can be much lower
than the above, but I raised it for performance. Am I wrong to do so?
* REINDEX on a regular schedule
This is sadly not really feasible, because we need to offer a 100%
availability website. REINDEX does not work concurrently so it is not
really an option for us. My max_fsm_pages and max_fsm_relations are
way above the numbers that come up after the VACUUM ANALYZE VERBOSE
run.
But still, the autovacuum stuff seems like it is not working at all.
Some related entries in the conf file:
autovacuum = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay = 20
autovacuum_naptime = 30
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 80
autovacuum_analyze_threshold = 80
And yet, the db often slows down, at which point I manually login and
run a manual VACUUM ANALYZE and it seems fine for some more time.
Sometimes, I also restart pgsql and that seems to help for a while.
Another advice on these forums is to see "vmstat 1", without actually
specifying how to draw inferences from it. The "free" version of it is
coming up at decent rates, as follows:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 29124 110760 108980 3467736 0 1 206 140 0 4 2 1 85 12
0 0 29124 110632 108980 3467736 0 0 0 0 1052 108 0 0 100 0
2 0 29124 108840 108980 3467736 0 0 0 0 1112 299 1 1 98 0
1 0 29124 109288 108980 3467736 0 0 0 0 1073 319 2 1 98 0
.....
* (with newer version) reduce the fill factor and REINDEX
I think some of my tables are updated very frequently so a smaller
fill factor will be nice. How can I find the current fill factor on my
tables? Also, is there some method or science to calculating a decent
fill factor -- size of table, number of indexes, frequency of updates,
and such? We have one major table which faces a lot of INSERTs and
UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10
million).
Thanks.
Hi,
Can I add SHM with merely by managing the entry in sysctl.conf? My
current values:kernel.shmmax = 536870912
kernel.shmall = 536870912My "shared_buffers" in postgresql.conf is "20000". From the website
http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
should be sharedbuffer*8192, so I suppose my shmmax can be much lower
than the above, but I raised it for performance. Am I wrong to do so?
You need to configure the kernel so it allows processes to use more shared
memory. This does not mean that a process automatically uses it. For
PostgreSQL you will need to increase shared_buffers to make it use the extra
available shared memory. With your shared memory settings you can probably
increase shared_buffers to about 65000.
With the 'ipcs' command you can see how much shared memory PostgreSQL uses.
Look under 'Shared Memory Segments' to memory owned by user postgres.
- Sander
On 18/09/2007, Sander Steffann <s.steffann@computel.nl> wrote:
Hi,
Can I add SHM with merely by managing the entry in sysctl.conf? My
current values:kernel.shmmax = 536870912
kernel.shmall = 536870912My "shared_buffers" in postgresql.conf is "20000". From the website
http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
should be sharedbuffer*8192, so I suppose my shmmax can be much lower
than the above, but I raised it for performance. Am I wrong to do so?You need to configure the kernel so it allows processes to use more shared
memory. This does not mean that a process automatically uses it. For
PostgreSQL you will need to increase shared_buffers to make it use the extra
available shared memory. With your shared memory settings you can probably
increase shared_buffers to about 65000.
Thanks, the IPCS command shows me this:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 6782976 postgres 600 176668672 2
Now, I can merrily increase the shared_buffers, but the manual warns
me against increasing the value too much because it is "per
transaction" value.
So here's the conflict for a novice like me:
1. Do not increase shared_buffer too much because it is per-transaction.
2. Do increase the SHM for performance, but it is only useful if you
also increase shared_buffer.
So which is it?
Would it help to increase the effective_cache_size? It is currently at
"512000".
I have 4GB ram on the machine, but am willing to devote about 2GB to pgsql.
Thanks!
Import Notes
Reply to msg id not found: -7089649154899150529@unknownmsgid
Hi,
Now, I can merrily increase the shared_buffers, but the manual warns
me against increasing the value too much because it is "per
transaction" value.
Shared_buffers is not per-transaction. Where did you find this information?
- Sander
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>:
If you find that reindexing improves performance, then you should
investigate further. Depending on the exact nature of the problem,
there are many possible solutions, three that come to mind:
* Add RAM/SHMCan I add SHM with merely by managing the entry in sysctl.conf? My
current values:kernel.shmmax = 536870912
kernel.shmall = 536870912
These values define the max allowed. They exist to keep poorly written
applications from sucking up all the available memory. Setting them
higher than is needed does not cause any problems, unless a greedy or
poorly-written application grabs all that memory.
My "shared_buffers" in postgresql.conf is "20000". From the website
http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax
should be sharedbuffer*8192, so I suppose my shmmax can be much lower
than the above, but I raised it for performance. Am I wrong to do so?
It's completely impossible to tell without knowing more about your
physical hardware. The rule of thumb is 1/3 physical RAM to start, then
adjust if more or less seems to help. That advice is for versions of
PG >= 8. If you're still running a 7.X version, upgrade.
How much RAM does this system have in it? Unless you have other
applications running on this system using RAM, you should allocate
more of it to shared_buffers. If 160M is 1/3 your RAM, you probably
need to add more RAM.
How big is your database? If it's possible to fit it all in
shared_buffers, that will give you the best performance.
* REINDEX on a regular schedule
This is sadly not really feasible, because we need to offer a 100%
availability website. REINDEX does not work concurrently so it is not
really an option for us. My max_fsm_pages and max_fsm_relations are
way above the numbers that come up after the VACUUM ANALYZE VERBOSE
run.
Hence my comment about "depending on your workload" and "investigating
the situation" to determine the best solution.
But still, the autovacuum stuff seems like it is not working at all.
Some related entries in the conf file:autovacuum = on
autovacuum_vacuum_cost_delay = 20
vacuum_cost_delay = 20
autovacuum_naptime = 30
stats_start_collector = on
stats_row_level = on
autovacuum_vacuum_threshold = 80
autovacuum_analyze_threshold = 80And yet, the db often slows down, at which point I manually login and
run a manual VACUUM ANALYZE and it seems fine for some more time.
Sometimes, I also restart pgsql and that seems to help for a while.
You don't mention *_scale_factor settings. Those are going to be
important as well. Based on your symptoms, it sounds like autovacuum
is not getting those tables vacuumed enough. I recommend raising the
debug level and watching the logs to see if autovacuum is actually
getting tables vacuumed. Consider lowering your *_scale_factor values
if not. Or even reducing autovacuum_naptime.
Another advice on these forums is to see "vmstat 1", without actually
specifying how to draw inferences from it. The "free" version of it is
coming up at decent rates, as follows:procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 29124 110760 108980 3467736 0 1 206 140 0 4 2 1 85 12
0 0 29124 110632 108980 3467736 0 0 0 0 1052 108 0 0 100 0
2 0 29124 108840 108980 3467736 0 0 0 0 1112 299 1 1 98 0
1 0 29124 109288 108980 3467736 0 0 0 0 1073 319 2 1 98 0
.....
Explaining how to interpret the output of this command and determine
what to do with it is not something easily done in a short paragraph.
However, it looks like you've got a lot of RAM being used for the disk
cache. That memory would probably be better used as shared_buffers, so
I suggest you increase that value considerably.
* (with newer version) reduce the fill factor and REINDEX
I think some of my tables are updated very frequently so a smaller
fill factor will be nice. How can I find the current fill factor on my
tables? Also, is there some method or science to calculating a decent
fill factor -- size of table, number of indexes, frequency of updates,
and such? We have one major table which faces a lot of INSERTs and
UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10
million).
I'm not sure how to find the current value, but a smaller fill factor
on busy tables should lead to less fragmentation, thus more efficient
indexes over time. Keep in mind that a smaller fill factor will also
lead to larger indexes initially.
--
Bill Moran
http://www.potentialtech.com
Thanks for a very informative post! One question:
I'm not sure how to find the current value, but a smaller fill factor
on busy tables should lead to less fragmentation, thus more efficient
indexes over time. Keep in mind that a smaller fill factor will also
lead to larger indexes initially.
What constitutes a "small fill factor"? Would 70 be good? I guess my
current must have been the default, which the manual says is 100. Or
did you mean really small fill factor like 20? In this context, what
is "packing" in the manual -- is that some kind of compression?
In response to "Phoenix Kiula" <phoenix.kiula@gmail.com>:
Thanks for a very informative post! One question:
I'm not sure how to find the current value, but a smaller fill factor
on busy tables should lead to less fragmentation, thus more efficient
indexes over time. Keep in mind that a smaller fill factor will also
lead to larger indexes initially.What constitutes a "small fill factor"? Would 70 be good?
Unfortunately, I can't say. I have not yet had the opportunity to
experiment with different fillfactors, so I can only speak in vague
estimations on this topic.
I guess my
current must have been the default, which the manual says is 100.
I expect it's at the default, but the docs say that is 90%:
http://www.postgresql.org/docs/8.2/static/sql-createindex.html
Where did you see 100?
Or
did you mean really small fill factor like 20? In this context, what
is "packing" in the manual -- is that some kind of compression?
Hopefully, someone more knowledgeable will chime in with some wise
suggestions. Barring that, I can only suggest you experiment to find
what works for your workload, but don't rule out the possibility that
extremely low fillfactor values might work well for you.
--
Bill Moran
http://www.potentialtech.com
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
What constitutes a "small fill factor"? Would 70 be good? I guess my
current must have been the default, which the manual says is 100.
On the following link:
I found this:
"B-trees use a default fillfactor of 90, but any value from 10 to 100 can be selected."
Regards,
Richard Broersma Jr.
Sorry for top-posting -- challenged reader.
Can't speak directly to PostgreSQL but in Informix the fill factor is useful for tweaking indexes. A very high fill factor is useful for tables that are static -- any inserts or changes to the index trigger a *lot* of moving of b-tree branches. But the high fill factor means that each page has more useful data references in it. A very low fill factor means that pages are "sparse" and so inserts and updates are less likely to trigger massive b-tree rebalancings.
I've never used it on PostgreSQL (yet!) but am looking forward to it.
Beware of premature optimization!
HTH,
Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company
Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
(My corporate masters made me say this.)
-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Richard Broersma Jr
Sent: Tue 9/18/2007 10:29 AM
To: Phoenix Kiula; Bill Moran
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
What constitutes a "small fill factor"? Would 70 be good? I guess my
current must have been the default, which the manual says is 100.
On the following link:
I found this:
"B-trees use a default fillfactor of 90, but any value from 10 to 100 can be selected."
Regards,
Richard Broersma Jr.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--- Gregory Williamson <Gregory.Williamson@digitalglobe.com> wrote:
A very low fill factor means that pages are
"sparse" and so inserts and updates are less likely to trigger massive b-tree rebalancings.
I take it that "massive b-tree rebalancings" could cause a problem with the performance of disk
writing though-put from UPDATEs and INSERTs?
Regards,
Richard Broersma Jr.
Richard Broersma Jr wrote:
--- Gregory Williamson <Gregory.Williamson@digitalglobe.com> wrote:A very low fill factor means that pages are
"sparse" and so inserts and updates are less likely to trigger massive b-tree rebalancings.I take it that "massive b-tree rebalancings" could cause a problem with the performance of disk
writing though-put from UPDATEs and INSERTs?Regards,
Richard Broersma Jr.
Precisely -- even if it can keep everything in RAM it can occupy quite a
few cycles to rebalance a large b-tree. And eventually those changes do
need to get written to disk so the next checkpoint (I think) will also
have more work.
G
On 19/09/2007, Gregory Williamson <Gregory.Williamson@digitalglobe.com> wrote:
...
Can't speak directly to PostgreSQL but in Informix the fill factor is
useful for tweaking indexes. A very high fill factor is useful for tables
that are static -- any inserts or changes to the index trigger a *lot* of
moving of b-tree branches. But the high fill factor means that each page has
more useful data references in it. A very low fill factor means that pages
are "sparse" and so inserts and updates are less likely to trigger massive
b-tree rebalancings.
Assuming pgsql's fill factor is similar to Informix' (yes, a highly
suspect assumption), could we say:
1. A small fill factor such as 10 or 20 would be good for the index
size and will not trigger massive btree rebalancings? (I'm first
playing with a value of 60 for now and seeing how it works out...seems
ok at the moment!)
2. Is this fill factor enough to have on the table, or should I also
do a fill factor for specific indexes? Or both the table and the
index? (I have four btree indexes on the table)
Thanks