Vacuum going -D; crash or just impatience?
Folks,
I've a 7.2.4 report-generation database that has been growing for some time,
resuting in the nightly VACUUM FULL ANALYZE taking longer and longer. (most
of the data is copied nightly from other systems, so use of FSM is not very
effective).
The problem is that the nightly admin scripts are programmed to check for a
locked up nightly maintainence, and to "pg_ctl -m fast stop" it. As the
VACUUM FULL now takes over an hour, it falsely detected a lockup and shutdown
the database in the middle of VACUUM.
On restarting the database, I manually VACUUM FULLed it, and the VACUUM would
speed through until hitting the spot where the database was shutdown, at
which point the VACUUM process went "D", and apparently locked up for 10
minutes. No error messages were written to the logs. Unfortunately, I could
not give it longer to see if it recovered because this is a production system
and I had to get it up and running from backup by 9am.
Does this sound like a crash during VACUUM, or just like it needed more time?
If anyone wants to analyze, I have a complete backup of the post-problem
PGDATA directory. The host system is RH Linux 8.0.
--
Josh Berkus
Aglio Database Solutions
San Francisco
Josh Berkus <josh@agliodbs.com> writes:
Does this sound like a crash during VACUUM, or just like it needed more time?
I think it just needed more time. VACUUM goes to great lengths to be
crash-safe. I doubt that a "fast stop" could have left the database
in a corrupted state.
It is entirely likely that an interrupted VACUUM FULL would leave the
next VACUUM FULL with more, not less, work to do --- all of the tuples
the first one tried to move before being killed would now be dead and
need to be cleaned up.
I've a 7.2.4 report-generation database that has been growing for some time,
resuting in the nightly VACUUM FULL ANALYZE taking longer and longer. (most
of the data is copied nightly from other systems, so use of FSM is not very
effective).
Are you saying that you delete most or all of the rows, then vacuum?
You might consider TRUNCATE if you delete all the rows, or CLUSTER
if you delete most, as a substitute for VACUUM FULL. (You'd still want
to run ANALYZE, after you load fresh data.) VACUUM FULL is really
designed for the case where there are not a huge number of dead rows
--- it gets awfully slow if it has to move lots of data.
Also, I think you have probably not given the FSM enough chance.
If the FSM settings are adequate then it should work fine to do
-- delete unwanted rows
VACUUM (not FULL, not ANALYZE)
-- load new rows
ANALYZE
regards, tom lane
Tom,
I think it just needed more time. VACUUM goes to great lengths to be
crash-safe. I doubt that a "fast stop" could have left the database
in a corrupted state.
OK, that's reasuring. I would have liked to give the process more time, but
with users waiting ....
One thing I am puzzled by is the "D" status on the VACUUM process. That would
seem to indicate that VACUUM was waiting for some other process ... but I
can't imagine what it could be. Suggestions?
Are you saying that you delete most or all of the rows, then vacuum? You might consider TRUNCATE if you delete all the rows, or CLUSTER if you delete most, as a substitute for VACUUM FULL. (You'd still want to run ANALYZE, after you load fresh data.) VACUUM FULL is really designed for the case where there are not a huge number of dead rows --- it gets awfully slow if it has to move lots of data.
There are several "holding" tables which are truncated and then re-built. But
the tables that are holding up VACUUM are the permanent ones, which are
experiencing up to 900,000 updates every night.
Also, I think you have probably not given the FSM enough chance.
If the FSM settings are adequate then it should work fine to do
Well, the holdup is the indexes, which are recycling about 500,000 pages and
in 7.2.4 FSM doesn't help me. Unfortunately, dropping the indexes during the
data transformation isn't really an option, because the indexes support some
of the data transform steps.
I'm wondering if I need to REINDEX more often; I think I'll try that next.
--
-Josh Berkus
Aglio Database Solutions
San Francisco