autovacuum: I need some explanation
Hello,
I have a table (accounts) with 600,000,000 rows. A heavy high-concurrent
workload that makes mostly updates on this table generates a lot of dead
tuples in its run, which is expected due to MVCC.
The problem is that even though autovacuum is enabled, the autovacuum worker
does not vacuum this table (I entered custom autovacuum settings for this
table in pg_autovacuum to try to force a situation). Autovacuum is working
for other smaller tables but not for accounts.
<2008-10-29 11:09:03.453 PDT>DEBUG: 00000: accounts: vac: 16697969
(threshold 6000050), anl: 16697969 (threshold 120000048)
<2008-10-29 11:09:05.610 PDT>DEBUG: 00000: accounts: vac: 16699578
(threshold 6000050), anl: 16699578 (threshold 120000048)
<2008-10-29 11:10:03.563 PDT>DEBUG: 00000: accounts: vac: 16735906
(threshold 6000050), anl: 16735906 (threshold 120000048)
please check the first log message: the vacuum threshold is 6,000,050 rows
and the number of dead tuples is 16,697,969. Even though the number of
dead_tuples is greater than the threshold the autovacuum is not being
triggered for this table. So, besides this condition (dead_tuples >
threshold) what else is taken into account by autovacuum?
Thank you,
-Noah
Noah Freire escribi�:
please check the first log message: the vacuum threshold is 6,000,050 rows
and the number of dead tuples is 16,697,969. Even though the number of
dead_tuples is greater than the threshold the autovacuum is not being
triggered for this table. So, besides this condition (dead_tuples >
threshold) what else is taken into account by autovacuum?
That there's no other process doing the same. Did you check
pg_stat_activity to ensure that there's really no autovacuum worker
processing this table?
<2008-10-29 11:09:03.453 PDT>DEBUG: 00000: accounts: vac: 16697969
(threshold 6000050), anl: 16697969 (threshold 120000048)
<2008-10-29 11:09:05.610 PDT>DEBUG: 00000: accounts: vac: 16699578
(threshold 6000050), anl: 16699578 (threshold 120000048)
<2008-10-29 11:10:03.563 PDT>DEBUG: 00000: accounts: vac: 16735906
(threshold 6000050), anl: 16735906 (threshold 120000048)
Are these log entries by the same process? Please add %p to
log_line_prefix to see what's going on.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi Alvaro,
On Wed, Oct 29, 2008 at 3:46 PM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:
Noah Freire escribió:
please check the first log message: the vacuum threshold is 6,000,050
rows
and the number of dead tuples is 16,697,969. Even though the number of
dead_tuples is greater than the threshold the autovacuum is not being
triggered for this table. So, besides this condition (dead_tuples >
threshold) what else is taken into account by autovacuum?That there's no other process doing the same. Did you check
pg_stat_activity to ensure that there's really no autovacuum worker
processing this table?
datid | datname | procpid | usesysid | usename | current_query | waiting |
xact_start | query_start | backend_start | client_addr | client_port
-------+---------+---------+----------+----------+-------------------------------------------------------------------------+---------+-------------------------------+-------------------------------+-------------------------------+-------------+-------------
45974 | pgbench | 14660 | 10 | postgres | autovacuum: VACUUM
public.accounts | f | 2008-10-29 11:09:05.610857-07 | 2008-10-29
11:09:05.610857-07 | 2008-10-29 11:09:03.45083-07 | |
indeed; autovacuum is currently running for accounts. It is running for 1
hour but the number of dead rows keeps increasing. Apparently autovacuum
cannot pace the rate of updates on this table (please check the two
snapshots of pg_stat_user_tables below taken with a 2 seconds interval
between them). It would be better to run vacuum manually than using
autovacuum in this case?
pgbench=# select relname, n_dead_tup from pg_stat_user_tables where
relname='accounts';
relname | n_dead_tup
----------+------------
accounts | 19917490
(1 row)
pgbench=# select relname, n_dead_tup from pg_stat_user_tables where
relname='accounts';
relname | n_dead_tup
----------+------------
accounts | 19923767
(1 row)
Thanks,
-Noah
Noah Freire escribi�:
datid | datname | procpid | usesysid | usename | current_query | waiting |
xact_start | query_start | backend_start | client_addr | client_port
-------+---------+---------+----------+----------+-------------------------------------------------------------------------+---------+-------------------------------+-------------------------------+-------------------------------+-------------+-------------
45974 | pgbench | 14660 | 10 | postgres | autovacuum: VACUUM
public.accounts | f | 2008-10-29 11:09:05.610857-07 | 2008-10-29
11:09:05.610857-07 | 2008-10-29 11:09:03.45083-07 | |indeed; autovacuum is currently running for accounts. It is running for 1
hour but the number of dead rows keeps increasing. Apparently autovacuum
cannot pace the rate of updates on this table (please check the two
snapshots of pg_stat_user_tables below taken with a 2 seconds interval
between them).
Maybe you just need to decrease the vacuum_cost_delay, so that it goes a
bit faster.
Keep in mind that the number of dead tuples only decreases when vacuum
finishes, not while it is working.
It would be better to run vacuum manually than using
autovacuum in this case?
It would be exactly the same (unless it used different cost_limit/delay
settings, but then you can configure them via pg_autovacuum)
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.