How often do I need to reindex tables?

Started by Dhaval Shahabout 19 years ago22 messagesgeneral
Jump to latest
#1Dhaval Shah
dhaval.shah.m@gmail.com

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

Also with 8.2, I do not have to do vacuum anymore or that is what I
understand. Does it do auto-vacuum?

Thanks in advance
Dhaval Shah

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Dhaval Shah (#1)
Re: How often do I need to reindex tables?

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

No, you should very rarely if ever need to do it.

If you're really concerned, I suggest monitoring average tuples per
index page; something like

SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
relpages > 1000;

That'll show tuples/page for all indexes over 8MB in size.

Also with 8.2, I do not have to do vacuum anymore or that is what I
understand. Does it do auto-vacuum?

You still need to enable autovacuum. See autovacuum_enable.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#3Jimmy Zhang
crackeur@comcast.net
In reply to: Dhaval Shah (#1)
[ANN]VTD-XML 2.0

XimpleWare is proud to announce the release of version 2.0 of
VTD-XML, the next generation XML parser/indexer. The new
features introduced in this version are:

* VTD+XML version 1.0: the world's first true native XML index
that is simple, general-purpose and back-compatible with XML.
* NodeRecorder Class that saves VTDNav's cursor location for
later sequential access.
* Overwrite capability
* Lexically comparisons between VTD and strings

To download the software, please go to
http://sourceforge.net/project/showfiles.php?group_id=110612

To read the latest benchmark report please go to
http://vtd-xml.sf.net/benchmark1.html

To get the latest API overview
http://www.ximpleware.com/vtd-xml_intro.pdf

----- Original Message -----
From: "Jim C. Nasby" <jim@nasby.net>
To: "Dhaval Shah" <dhaval.shah.m@gmail.com>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, February 27, 2007 11:56 AM
Subject: Re: [GENERAL] How often do I need to reindex tables?

Show quoted text

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

No, you should very rarely if ever need to do it.

If you're really concerned, I suggest monitoring average tuples per
index page; something like

SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
relpages > 1000;

That'll show tuples/page for all indexes over 8MB in size.

Also with 8.2, I do not have to do vacuum anymore or that is what I
understand. Does it do auto-vacuum?

You still need to enable autovacuum. See autovacuum_enable.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#4Ron Johnson
ron.l.johnson@cox.net
In reply to: Dhaval Shah (#1)
Re: How often do I need to reindex tables?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/27/07 13:26, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Moderate?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFF5MVmS9HxQb37XmcRAu3PAJ9BwYSpuENbeJKweBn4arApxqyiKACgg8pg
1wExzokHE3tLSj5o4MjEaK4=
=GAEs
-----END PGP SIGNATURE-----

#5Ezequias Rodrigues da Rocha
ezequias.rocha@gmail.com
In reply to: Jim Nasby (#2)
Re: How often do I need to reindex tables?

This select doesn't return any row. What does it mean ?

Ezequias.

2007/2/27, Jim C. Nasby <jim@nasby.net>:

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

No, you should very rarely if ever need to do it.

If you're really concerned, I suggest monitoring average tuples per
index page; something like

SELECT relname, reltuples/relpages FROM pg_class WHERE relkind = 'i' AND
relpages > 1000;

That'll show tuples/page for all indexes over 8MB in size.

Also with 8.2, I do not have to do vacuum anymore or that is what I
understand. Does it do auto-vacuum?

You still need to enable autovacuum. See autovacuum_enable.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Atenciosamente (Sincerely)
Ezequias Rodrigues da Rocha
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A pior das democracias ainda é melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships
http://ezequiasrocha.blogspot.com/

#6Bill Moran
wmoran@collaborativefusion.com
In reply to: Ezequias Rodrigues da Rocha (#5)
Re: How often do I need to reindex tables?

In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>:

2007/2/27, Jim C. Nasby <jim@nasby.net>:

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

No, you should very rarely if ever need to do it.

I don't agree. I think that regular indexing is mandatory under some
workloads. Example:
bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname;
relname | relpages
-------------------------------+----------
basefiles_pkey | 1
cdimages_pkey | 1
client_name_idx | 2
client_pkey | 2
counters_pkey | 1
device_pkey | 1
file_fp_idx | 41212
[...]

bacula=# reindex database bacula;
[...]

relname | relpages
-------------------------------+----------
basefiles_pkey | 1
cdimages_pkey | 1
client_name_idx | 2
client_pkey | 2
counters_pkey | 1
device_pkey | 1
file_fp_idx | 21367
[...]

There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Please note that the database you're looking at is reindexed _weekly_ by a
cron job, which means the index bloat you're seeing in the above example is
the result of normal activity since last Saturday.

I've brought this up before, and I want to point it out again. I really
think there are certain workloads that require reindexing. Luckily for
this particular workload, it's easy to schedule a job to do so, since I
know when the backups aren't running :)

--
Bill Moran
Collaborative Fusion Inc.

#7Joshua D. Drake
jd@commandprompt.com
In reply to: Bill Moran (#6)
Re: How often do I need to reindex tables?

Bill Moran wrote:

In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>:

2007/2/27, Jim C. Nasby <jim@nasby.net>:

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

No, you should very rarely if ever need to do it.

I don't agree. I think that regular indexing is mandatory under some
workloads.

Bill, you are right but I believe Jim was speaking from a general
perspective. Generally speaking you should not have to reindex, or if
you do very rarely.

I too have a couple of databases we manage that require a reindex more
often than what would be considered normal, but a reindex is far from
the norm itself.

Joshua D. Drake

Example:

bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname;
relname | relpages
-------------------------------+----------
basefiles_pkey | 1
cdimages_pkey | 1
client_name_idx | 2
client_pkey | 2
counters_pkey | 1
device_pkey | 1
file_fp_idx | 41212
[...]

bacula=# reindex database bacula;
[...]

relname | relpages
-------------------------------+----------
basefiles_pkey | 1
cdimages_pkey | 1
client_name_idx | 2
client_pkey | 2
counters_pkey | 1
device_pkey | 1
file_fp_idx | 21367
[...]

There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Please note that the database you're looking at is reindexed _weekly_ by a
cron job, which means the index bloat you're seeing in the above example is
the result of normal activity since last Saturday.

I've brought this up before, and I want to point it out again. I really
think there are certain workloads that require reindexing. Luckily for
this particular workload, it's easy to schedule a job to do so, since I
know when the backups aren't running :)

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#6)
Re: How often do I need to reindex tables?

Bill Moran <wmoran@collaborativefusion.com> writes:

I don't agree. I think that regular indexing is mandatory under some
workloads. Example:
...
There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Can you describe the usage pattern of that index? I'm curious why it
doesn't maintain reasonably static size. How often is the underlying
table vacuumed?

regards, tom lane

#9Bill Moran
wmoran@collaborativefusion.com
In reply to: Joshua D. Drake (#7)
Re: How often do I need to reindex tables?

In response to "Joshua D. Drake" <jd@commandprompt.com>:

Bill Moran wrote:

In response to "Ezequias Rodrigues da Rocha" <ezequias.rocha@gmail.com>:

2007/2/27, Jim C. Nasby <jim@nasby.net>:

On Tue, Feb 27, 2007 at 11:26:02AM -0800, Dhaval Shah wrote:

I am planning to use 8.2 and the average inserts/deletes and updates
across all tables is moderate. That is, it is a moderate sized
database with moderate usage of tables.

Given that, how often do I need to reindex the tables? Do I need to do
it everyday?

No, you should very rarely if ever need to do it.

I don't agree. I think that regular indexing is mandatory under some
workloads.

Bill, you are right but I believe Jim was speaking from a general
perspective. Generally speaking you should not have to reindex, or if
you do very rarely.

I too have a couple of databases we manage that require a reindex more
often than what would be considered normal, but a reindex is far from
the norm itself.

Well, I hope I didn't come across as confrontation or anything, as that
wasn't my intent. The only point I was trying to make is that the need to
reindex probably shouldn't be written off lightly until one has monitored
the indexes for a spell to see if they need it or not.

Example:

bacula=# select relname, relpages from pg_class where relkind='i' and relname not like 'pg_%' order by relname;
relname | relpages
-------------------------------+----------
basefiles_pkey | 1
cdimages_pkey | 1
client_name_idx | 2
client_pkey | 2
counters_pkey | 1
device_pkey | 1
file_fp_idx | 41212
[...]

bacula=# reindex database bacula;
[...]

relname | relpages
-------------------------------+----------
basefiles_pkey | 1
cdimages_pkey | 1
client_name_idx | 2
client_pkey | 2
counters_pkey | 1
device_pkey | 1
file_fp_idx | 21367
[...]

There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Please note that the database you're looking at is reindexed _weekly_ by a
cron job, which means the index bloat you're seeing in the above example is
the result of normal activity since last Saturday.

I've brought this up before, and I want to point it out again. I really
think there are certain workloads that require reindexing. Luckily for
this particular workload, it's easy to schedule a job to do so, since I
know when the backups aren't running :)

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

#10Iannsp
iannsp@gmail.com
In reply to: Bill Moran (#9)
About PostgreSQL Block Size

Hi,
Some one people have one report/benchmark about using postgresql block
size modified?
What is the difference?

thanks.

--
Ivo Nascimento
Iann tech - Desenvolvendo solu��es com performance e seguran�a
http://www.ianntech.com.br

#11Bill Moran
wmoran@collaborativefusion.com
In reply to: Tom Lane (#8)
Re: How often do I need to reindex tables?

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

I don't agree. I think that regular indexing is mandatory under some
workloads. Example:
...
There are some additional indexes that I've snipped from the output that also
saw some benefit from reindexing, but let's just focus on file_fp_idx.

Can you describe the usage pattern of that index? I'm curious why it
doesn't maintain reasonably static size. How often is the underlying
table vacuumed?

bacula=# \d file
Table "public.file"
Column | Type | Modifiers
------------+---------+-------------------------------------------------------
fileid | integer | not null default nextval('file_fileid_seq'::regclass)
fileindex | integer | not null default 0
jobid | integer | not null
pathid | integer | not null
filenameid | integer | not null
markid | integer | not null default 0
lstat | text | not null
md5 | text | not null
Indexes:
"file_pkey" PRIMARY KEY, btree (fileid)
"file_fp_idx" btree (filenameid, pathid)
"file_jobid_idx" btree (jobid)

Now, that table stores a record for each file that is backed up (i.e. there's
a unique tuple for each time a file is backed up) To save space in the
database, the file name and file path are stored in separate tables and
referenced by an ID.

This particular server has the following characteristics:
bacula=# select count(*) from file;
count
---------
8068956
(1 row)

bacula=# select count(*) from filename;
count
--------
151954
(1 row)

bacula=# select count(*) from path;
count
-------
49642
(1 row)

There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job
runs twice a day. So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day. With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.

Because I know exactly when database activity is occurring on this system,
I have autovacuum disabled, and I manually run a vacuum analyze on this
database twice a day: once at 8:00 AM and again at 4:00 PM. I had to bump
max_fsm_pages up to 60000 to keep vacuum effective.

Note that the index under discussion is the only one in this database that
shows significant bloat. I could probably just reindex that one on a
regular schedule, but since I know when the database is quiescent, there's
no reason I can think of not to reindex the whole thing.

Anything else I can provide that would be useful?

--
Bill Moran
Collaborative Fusion Inc.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#11)
Re: How often do I need to reindex tables?

Bill Moran <wmoran@collaborativefusion.com> writes:

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Can you describe the usage pattern of that index? I'm curious why it
doesn't maintain reasonably static size. How often is the underlying
table vacuumed?

...
There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job
runs twice a day. So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day. With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.
...
Note that the index under discussion is the only one in this database that
shows significant bloat.

Yeah, and there's no obvious reason in what you say why this one should
bloat either. Can you say anything about the distribution of the index
columns --- are you working with a fairly static set of filenameids, or
does that change over time? How about the pathids? How does the
combination of filenameid x pathid behave?

A bit of quick arithmetic says that the minimum possible size of that
index (at 100% fill factor) would be about 20K pages. What you showed
us was that it had expanded to 40-some K pages, or a bit under 50% fill
factor. This is low but not totally out of line; the traditional rule
of thumb is that the steady state fill factor will be about 2/3rds for a
heavily updated btree. If you leave it go, does it continue to get
larger, or stay around 40K?

regards, tom lane

#13Bill Moran
wmoran@collaborativefusion.com
In reply to: Tom Lane (#12)
Re: How often do I need to reindex tables?

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Can you describe the usage pattern of that index? I'm curious why it
doesn't maintain reasonably static size. How often is the underlying
table vacuumed?

...
There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job
runs twice a day. So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day. With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.
...
Note that the index under discussion is the only one in this database that
shows significant bloat.

Yeah, and there's no obvious reason in what you say why this one should
bloat either. Can you say anything about the distribution of the index
columns --- are you working with a fairly static set of filenameids, or
does that change over time? How about the pathids? How does the
combination of filenameid x pathid behave?

My expectation would be that the distribution stays fairly constant and
that filenameids and pathids don't get added in any great number. Most
of the servers that are being backed up are not going to see the file
names or paths change very much, just the contents of those files.

A bit of quick arithmetic says that the minimum possible size of that
index (at 100% fill factor) would be about 20K pages. What you showed
us was that it had expanded to 40-some K pages, or a bit under 50% fill
factor. This is low but not totally out of line; the traditional rule
of thumb is that the steady state fill factor will be about 2/3rds for a
heavily updated btree. If you leave it go, does it continue to get
larger, or stay around 40K?

I don't remember how big it was getting before I added that cron job. I'll
remove the cron job and replace it with one that emails me the page size
of that index every week. I'll let it go for a few weeks and see how
it manages.

--
Bill Moran
Collaborative Fusion Inc.

#14Bill Moran
wmoran@collaborativefusion.com
In reply to: Tom Lane (#12)
Re: How often do I need to reindex tables?

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Can you describe the usage pattern of that index? I'm curious why it
doesn't maintain reasonably static size. How often is the underlying
table vacuumed?

...
There are 21 jobs, each ranging in size from 2000 - 5000 files. Each job
runs twice a day. So you're looking at about 60,000 new rows at midnight
and 60,000 new rows at noon each day. With the purge cycle, about the
same number of rows are being deleted as are being added, so the table
size stays pretty constant.
...
Note that the index under discussion is the only one in this database that
shows significant bloat.

Yeah, and there's no obvious reason in what you say why this one should
bloat either. Can you say anything about the distribution of the index
columns --- are you working with a fairly static set of filenameids, or
does that change over time? How about the pathids? How does the
combination of filenameid x pathid behave?

A bit of quick arithmetic says that the minimum possible size of that
index (at 100% fill factor) would be about 20K pages. What you showed
us was that it had expanded to 40-some K pages, or a bit under 50% fill
factor. This is low but not totally out of line; the traditional rule
of thumb is that the steady state fill factor will be about 2/3rds for a
heavily updated btree. If you leave it go, does it continue to get
larger, or stay around 40K?

Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation. I never did actual
measurements, but it was on the order of "Bill, why is restoring taking
such a long time?" from other systems people. At the time, I poked around
and tried some stuff here and there and found that reindex restored
performance. I didn't look at actual size at that time.

Anyway, I'll report back in a few weeks as to what the numbers look like.

--
Bill Moran
Collaborative Fusion Inc.

#15Vick Khera
vivek@khera.org
In reply to: Bill Moran (#14)
Re: How often do I need to reindex tables?

On Feb 28, 2007, at 5:35 PM, Bill Moran wrote:

Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation. I never did
actual
measurements, but it was on the order of "Bill, why is restoring
taking
such a long time?" from other systems people. At the time, I poked
around
and tried some stuff here and there and found that reindex restored
performance. I didn't look at actual size at that time.

I have two huge tables (one tracks messages sent, one tracks URL
click-throughs from said messages) from which I purge old data every
few weeks. The primary key indexes on these get bloated after a few
months and performance goes way down like you observe. A reindex
fixes up the performance issues pretty well on those tables, and
often shaves off a few gigs of disk space too.

We have to manually run the reindex because it has to be timed such
that the service is not impacted (ie, run on major holiday weekends)
and we have to take down part of the service and point other parts to
backup servers, etc. Not an easy chore...

This is on Pg 8.1. Don't even ask me how it was in the 7.4 days when
we have maybe 10% of the data! :-)

Attachments:

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bill Moran (#14)
Re: How often do I need to reindex tables?

Bill Moran <wmoran@collaborativefusion.com> writes:

Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation. I never did actual
measurements, but it was on the order of "Bill, why is restoring taking
such a long time?" from other systems people. At the time, I poked around
and tried some stuff here and there and found that reindex restored
performance. I didn't look at actual size at that time.

A reindex might improve performance for reasons other than bloat --- to
wit, that a freshly-built index is in perfect physical order, which
tends to get degraded over time by page splits. How important that is
depends on your usage patterns. If this is what the story is for your
situation, then what might fix it (in 8.2) is to create the index with
FILLFACTOR 50 or so, so that it's already at the steady state density
and won't need many page splits.

Anyway, I'll report back in a few weeks as to what the numbers look like.

Yeah, please for the moment just watch what happens with the default
behavior.

regards, tom lane

#17Bill Moran
wmoran@potentialtech.com
In reply to: Tom Lane (#16)
Re: How often do I need to reindex tables?

In response to Tom Lane <tgl@sss.pgh.pa.us>:

Bill Moran <wmoran@collaborativefusion.com> writes:

Just an FYI ... I remembered what prompted the cron job.

We were seeing significant performance degradation. I never did actual
measurements, but it was on the order of "Bill, why is restoring taking
such a long time?" from other systems people. At the time, I poked around
and tried some stuff here and there and found that reindex restored
performance. I didn't look at actual size at that time.

A reindex might improve performance for reasons other than bloat --- to
wit, that a freshly-built index is in perfect physical order, which
tends to get degraded over time by page splits. How important that is
depends on your usage patterns.

This goes back to the heart of the original question, which was: "how often
do I need to reindex." The answer was "rarely, if ever" and "if you're
really worried about this, you can monitor _size_ via these queries ..."

I guess I focused too much on size in my response. As you point out,
bloat isn't the only indicator that an index would benefit from being
rebuilt.

If this is what the story is for your
situation, then what might fix it (in 8.2) is to create the index with
FILLFACTOR 50 or so, so that it's already at the steady state density
and won't need many page splits.

Interesting. So a major factor in performance degradation is when the
index has to split pages. I read about FILLFACTOR in the docs, but it
didn't click as to what use it was until your statement.

We're still evaluating 8.2. We've hit a few issues with our application
and plpgsql, but I think those are minor. As far as deploying it for
our Bacula systems -- I just need to find the time to be sure that it
doesn't introduce any problems, but I suspect there's a low chance of
that with Bacula.

Anyway, I'll report back in a few weeks as to what the numbers look like.

Yeah, please for the moment just watch what happens with the default
behavior.

Yup.

--
Bill Moran
http://www.potentialtech.com

#18Jeff Davis
pgsql@j-davis.com
In reply to: Joshua D. Drake (#7)
Re: How often do I need to reindex tables?

On Wed, 2007-02-28 at 09:17 -0800, Joshua D. Drake wrote:

Bill, you are right but I believe Jim was speaking from a general
perspective. Generally speaking you should not have to reindex, or if
you do very rarely.

I too have a couple of databases we manage that require a reindex more
often than what would be considered normal, but a reindex is far from
the norm itself.

Isn't a REINDEX still needed in the case of monotonically increasing
keys, such as in a sequence or timestamp index? I also delete tuples, so
that results in a forward-shifting range of keys.

If this is not normal, I need to re-evaluate my autovacuum settings.

Regards,
Jeff Davis

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#18)
Re: How often do I need to reindex tables?

Jeff Davis <pgsql@j-davis.com> writes:

Isn't a REINDEX still needed in the case of monotonically increasing
keys, such as in a sequence or timestamp index? I also delete tuples, so
that results in a forward-shifting range of keys.

No, that shouldn't be a problem, if you're maintaining a constant key
range width (that is, *all* the old entries get deleted). The only
pattern I'm aware of that causes a problem is if you leave a small
subset of the keys behind, for instance insert every few minutes and
then later delete all but one entry per day. In this situation you may
end up with an index containing as few as one entry per page. We
don't have any mechanism short of REINDEX to collapse nonempty index
pages together, so that way lies bloat. But if you delete all the old
entries then the pages get recycled and there shouldn't be a problem.

regards, tom lane

#20Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#19)
Re: How often do I need to reindex tables?

On Fri, 2007-03-02 at 16:39 -0500, Tom Lane wrote:

Jeff Davis <pgsql@j-davis.com> writes:

Isn't a REINDEX still needed in the case of monotonically increasing
keys, such as in a sequence or timestamp index? I also delete tuples, so
that results in a forward-shifting range of keys.

No, that shouldn't be a problem, if you're maintaining a constant key
range width (that is, *all* the old entries get deleted). The only
pattern I'm aware of that causes a problem is if you leave a small
subset of the keys behind, for instance insert every few minutes and
then later delete all but one entry per day. In this situation you may
end up with an index containing as few as one entry per page. We
don't have any mechanism short of REINDEX to collapse nonempty index
pages together, so that way lies bloat. But if you delete all the old
entries then the pages get recycled and there shouldn't be a problem.

You just described this particular table, so I will need to continue
REINDEXing. It's getting maybe 10-50 inserts per second, and most expire
in an day. However, a small percentage hang around for much longer.

REINDEX isn't a problem for me, because there are periods of low usage.

I think if I really wanted to eliminate REINDEX I could move the few
remaining records into another table and have a view accross them.

Regards,
Jeff Davis

#21Bill Moran
wmoran@collaborativefusion.com
In reply to: Tom Lane (#16)
#22Martin Gainty
mgainty@hotmail.com
In reply to: Dhaval Shah (#1)