Vacuum time degrading

Started by Wesalmost 21 years ago14 messages
#1Wes
wespvp@syntegra.com

We've been doing a vacuum every night. With about 254 million rows, it was
taking abut 0.8 hours. Now a few months later with about 456 million rows
it is taking 4.8 hours. I did check the archives and didn't find anything
like this.

Why is the vacuum time not going up linearly? Since vacuum has to
sequentially read the entire database, I would have expected a linear
increase - about 1.5 hours now.

There are currently no deletes or modifies to the database - only inserts.

This is on PostgreSQL 7.4.5, RedHat ES 3.0.

Wes

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#1)
Re: Vacuum time degrading

Wes <wespvp@syntegra.com> writes:

Why is the vacuum time not going up linearly?

I'm betting that the database is suffering from substantial bloat,
requiring VACUUM to scan through lots of dead space that wasn't there
before. Check your FSM settings (the tail end of the output from a
full-database VACUUM VERBOSE command would give some info about what you
need).

If you are suffering bloat, the fastest route to a solution would
probably be to CLUSTER your larger tables. Although VACUUM FULL
would work, it's likely to be very slow.

There are currently no deletes or modifies to the database - only inserts.

You *certain* about that? It's hard to see how the vacuum time wouldn't
be linear in table size if there's nothing to do and no dead space.

Again, VACUUM VERBOSE info would be informative (it's sufficient to look
at your larger tables for this).

regards, tom lane

#3Wes
wespvp@syntegra.com
In reply to: Tom Lane (#2)
Re: Vacuum time degrading

On 2/28/05 6:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

If you are suffering bloat, the fastest route to a solution would
probably be to CLUSTER your larger tables. Although VACUUM FULL
would work, it's likely to be very slow.

How can there be bloat if there are no deletes or modifies?

Even if there were deletes or modifies (there will be in about another year
and a half), if a vacuum is being performed every night, how can there be
bloat? The vacuum should release the dead space and it should be reused.
Am I missing something?

There are currently no deletes or modifies to the database - only inserts.

You *certain* about that? It's hard to see how the vacuum time wouldn't
be linear in table size if there's nothing to do and no dead space.

Absolutely sure. The only case that would approach a delete is if a batch
load fails, the transaction is rolled back. That very seldom happens.

Why am I running vacuum nightly if I have no deletes or updates, you ask?
Two reasons - to have it in the cron schedule for when there are deletes
(there will never be updates), and as a check on database integrity. If
there is a database problem, vacuum at least has a chance of flagging it
since it reads the entire database. This was instigated after we had a
couple of instances of corruption a while back that went undetected for too
long. I'm also doing a weekly pg_dumpall as an additional check/fallback.

Again, VACUUM VERBOSE info would be informative (it's sufficient to look
at your larger tables for this).

I'll set that up to run tonight and see if it gives any clues. Last night,
vacuum ran over 5 hours.

Wes

#4Wes
wespvp@syntegra.com
In reply to: Tom Lane (#2)
Re: Vacuum time degrading

On 2/28/05 6:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

It's hard to see how the vacuum time wouldn't
be linear in table size if there's nothing to do and no dead space.

I am doing 'vacuum analyze' rather than just 'vacuum'. Could that have
anything to do with the non-linear behavior?

Wes

#5Wes
wespvp@syntegra.com
In reply to: Tom Lane (#2)
Re: Vacuum time degrading

On 2/28/05 6:53 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Again, VACUUM VERBOSE info would be informative (it's sufficient to look
at your larger tables for this).

It took 5.2 hours again tonight to do the vacuum. I don't see anything out
of the ordinary - no explanation for the non-linear increases in vacuum
time.

This is what shows up at the end:

INFO: free space map: 93 relations, 282 pages stored; 1712 total pages
needed
DETAIL: Allocated FSM size: 1000 relations + 1000000 pages = 5920 kB shared
memory.

The tables all show something like:

INFO: "blah": found 0 removable, 366326534 nonremovable row versions in
3241829 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.

Wes

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#5)
Re: Vacuum time degrading

Wes <wespvp@syntegra.com> writes:

It took 5.2 hours again tonight to do the vacuum. I don't see anything out
of the ordinary - no explanation for the non-linear increases in vacuum
time.

Would you post the complete VACUUM VERBOSE log? The CPU/elapsed time lines
would help us identify where the time is going.

This is what shows up at the end:

INFO: free space map: 93 relations, 282 pages stored; 1712 total pages
needed
DETAIL: Allocated FSM size: 1000 relations + 1000000 pages = 5920 kB shared
memory.

Well, you don't have a problem with FSM being too small anyway ;-)

regards, tom lane

#7Wes
wespvp@syntegra.com
In reply to: Tom Lane (#6)
Re: Vacuum time degrading

On 3/2/05 12:16 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Would you post the complete VACUUM VERBOSE log? The CPU/elapsed time lines
would help us identify where the time is going.

I'll send it to you directly - its rather long.

DETAIL: Allocated FSM size: 1000 relations + 1000000 pages = 5920 kB shared
memory.

Well, you don't have a problem with FSM being too small anyway ;-)

Nope... Preparation for when deletes start kicking in down the road. If I
can only do a vacuum once a week, I've got to have lots of space.

Wes

#8Wes
wespvp@syntegra.com
In reply to: Tom Lane (#6)
Re: Vacuum time degrading

On 3/2/05 12:16 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Would you post the complete VACUUM VERBOSE log? The CPU/elapsed time lines
would help us identify where the time is going.

Mailed.

I do see stats like:

CPU 518.88s/25.17u sec elapsed 10825.33 sec.
CPU 884.96s/64.35u sec elapsed 13793.13 sec.
CPU 132.46s/6.66u sec elapsed 2435.42 sec.
CPU 49.25s/4.15u sec elapsed 414.71 sec.

This is a dual CPU hyperthreaded (which probably makes little difference
here) 2.4Ghz RedHat 3.0. The database is on an 8-disk SCSI hardware RAID 5
with 10k rpm disks. Pg_xlog is on a separate volume.

I thought it was a 2.6 kernel, but it looks like it is 2.4.20. I need to
monitor the system when the vacuum is running to see if sar/top show
anything. I wonder if it's hitting the kswapd thrashing problem?

Wes

#9Wes
wespvp@syntegra.com
In reply to: Tom Lane (#6)
Re: Vacuum time degrading

Watching the system as vacuum is running, I can see that we are encountering
the kswapd/kscand problem in the 2.4.20 kernel. This could very well
account for the non-linear increase in vacuum time.

This problem is fixed in the 2.6 kernel, but we can't upgrade because DELL
is dragging their feet in releasing hardware monitoring compatible with 2.6
kernel. So, we're going to try a 2.4.29 kernel and hope that the problem is
fixed there. With any luck, by Friday I'll know if the kswapd problem is
fixed in 2.4.29 and if that solves the excessive vacuum times.

Wes

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#9)
Re: Vacuum time degrading

Wes <wespvp@syntegra.com> writes:

Watching the system as vacuum is running, I can see that we are encountering
the kswapd/kscand problem in the 2.4.20 kernel. This could very well
account for the non-linear increase in vacuum time.

Hmm. Looking at the vacuum verbose output you sent me, it's clear that
the bulk of the time is going into scanning a couple of the larger
indexes. On an index that's been growing for awhile, this involves a
highly nonsequential access pattern (it wants to visit the index leaf
pages in sort order, which will not look much like physical order after
a lot of page splits have occurred). I don't know whether that would
tend to set off the kswapd/kscand problems, but just in terms of
physical I/O time it might be annoying. I was going to suggest
REINDEXing those indexes to see if that cuts the vacuum time at all.

regards, tom lane

#11Wes Palmer
Wesley.R.Palmer@syntegra.com
In reply to: Tom Lane (#10)
Re: Vacuum time degrading

On 3/2/05 3:51 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I was going to suggest
REINDEXing those indexes to see if that cuts the vacuum time at all.

The problem with that is it takes a very long time. I've got a couple of
things to try yet on the kswapd problem. If that doesn't work, maybe I can
rebuild one of the indexes and see how much that one improves. I wasn't
aware that the indexes were scanned non-sequentially. The under one hour
time was probably shortly after a full reload. Any chance of change that
behavior to scan in physical storage order?

The index from the largest table that has:

CPU 216.15s/18.13u sec elapsed 2110.84 sec.

is inserted in sequential order. The index

CPU 518.88s/25.17u sec elapsed 10825.33 sec.

has records inserted in essentially a random order, and is also something
like twice as large (key size).

We're going to try to test the 2.4.29 kernel tomorrow.

Wes

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes Palmer (#11)
Re: Vacuum time degrading

Wes Palmer <Wesley.R.Palmer@syntegra.com> writes:

Any chance of change that
behavior to scan in physical storage order?

It wouldn't be easy --- there are some locking considerations that say
btbulkdelete needs to scan the index in the same order that an ordinary
scan would do. See the nbtree README for details.

regards, tom lane

#13Wes
wespvp@syntegra.com
In reply to: Tom Lane (#12)
Re: Vacuum time degrading

Well, the good news is that the 2.24.29 kernel solved the kswapd problem.
That bad news is that it didn't help the vacuum time. In fact, the vacuum
time is now over 6 hours instead of 5 hours. Whether that is a direct
result of the 2.24.29 kernel, or a coincidence, I don't know at this time.

I guess the next step is to try reindexing a couple of the big indexes and
see if that helps.

Wes

#14Wes
wespvp@syntegra.com
In reply to: Tom Lane (#12)
Re: Vacuum time degrading

On 3/2/05 10:50 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

It wouldn't be easy --- there are some locking considerations that say
btbulkdelete needs to scan the index in the same order that an ordinary
scan would do. See the nbtree README for details.

Just a follow-up on this..

The vacuum time has been steadily increasing at a seemingly increasing rate,
although there are no deletes or updates to the database. The current DB
size is just over 500 million rows. Last week it was up to 6.84 hours to do
a vacuum. Over the weekend I reindexed all the major indexes. The two
largest indexes took about 10 hours to reindex both. After the reindexing,
the vacuum took only 1.44 hours. This is pretty much a linear scaling from
the original vacuum time I reported.

So, the increasing vacuum times would appear to be as Tom suggested - due to
the fact that vacuum processes indexes in index order, not physical disk
order. I guess we add a periodic reindex to our maintenance procedures...

Wes