Postgresql backend to perform vacuum automatically
From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.
Any comments?
Nicolas Bazin wrote:
From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.
Any comments?
Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Nicolas Bazin wrote:
From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.Any comments?
Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?
There's a TODO item about this already.
* Provide automatic scheduling of background vacuum (Tom)
regards, tom lane
Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?There's a TODO item about this already.
* Provide automatic scheduling of background vacuum (Tom)
I think an good system would have some parameters something like this in
postgresql.conf:
vacuum_update_threshold
= num of operations that cause frags on a table before a vacuum is run, 0 =
no requirement, if followed by a percent (%) sign, indicates percentage of
rows changed, rather than an absolute number
vacuum_idle_threshold
= system load below which the system must be before a vacuum can be
performed. 0 = no requirement
vacuum_time_threshold
= seconds since last vacuum before which another vacuum cannot occur. 0 = no
requirement.
If all 3 are 0, then no auto-vacuuming is performed at all. There's
probably trouble if only the idle threshold is set to zero.
And the same for the 'analyze' command?
If they want it on a per-table basis, then they can just do it themselves
with a cronjob!
Chris
vacuum_idle_threshold
= system load below which the system must be before a vacuum can be
performed. 0 = no requirementvacuum_time_threshold
= seconds since last vacuum before which another vacuum cannot occur. 0 = no
requirement.If all 3 are 0, then no auto-vacuuming is performed at all. There's
probably trouble if only the idle threshold is set to zero.And the same for the 'analyze' command?
If they want it on a per-table basis, then they can just do it themselves
with a cronjob!
Yes, and Jan's statistics stuff has stats on table activity.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Nicolas Bazin wrote:
From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.Any comments?
Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?There's a TODO item about this already.
* Provide automatic scheduling of background vacuum (Tom)
I have been thinking about this. I like the idea, but it may be problematic.
I suggested running a vacuum process on a constant low priority in the
background, and it was pointed out that this may cause some deadlock issues.
For vacuum to run in the background, it needs to be more regulated or targeted.
It needs to be able to know which tables need it. Many tables are static and
never get updated, vacuuming them would be pointless. It needs to be sensitive
to database load, and be tunable to vacuum only when safe or necessary to
reduce load.
Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?
If the info is around, I could whip up something pretty easily, I think.
Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?If the info is around, I could whip up something pretty easily, I think.
Sure, Jan's new statistics tables in 7.2 had just that info.
test=> \d pg_stat_user_tables
View "pg_stat_user_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?If the info is around, I could whip up something pretty easily, I think.
Sure, Jan's new statistics tables in 7.2 had just that info.
test=> \d pg_stat_user_tables
View "pg_stat_user_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |
I have a system running 7.2b2, does it update these fields? Is it an option?
All I am getting is zeros.
relid | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
-----------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
16559 | snf_cache | 0 | 0 | 0
| 0 | 0 | 0 | 0
8689760 | fav_stat | 0 | 0 |
| | 0 | 0 | 0
19174244 | mbr_art_aff | 0 | 0 | 0
| 0 | 0 | 0 | 0
20788376 | artist_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
83345144 | saffweak | 0 | 0 |
| | 0 | 0 | 0
94811871 | pga_queries | 0 | 0 |
| | 0 | 0 | 0
94812980 | pga_forms | 0 | 0 |
| | 0 | 0 | 0
94813425 | pga_scripts | 0 | 0 |
| | 0 | 0 | 0
94813869 | pga_reports | 0 | 0 |
| | 0 | 0 | 0
94814245 | pga_schema | 0 | 0 |
| | 0 | 0 | 0
116675008 | int_song_affinity | 0 | 0 |
| | 0 | 0 | 0
166147508 | favorites | 0 | 0 | 0
| 0 | 0 | 0 | 0
173869647 | song_affinity_orig_t | 0 | 0 |
| | 0 | 0 | 0
176339567 | song_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
178658941 | song_affinity_array | 0 | 0 | 0
| 0 | 0 | 0 | 0
186403716 | mbr_art_aff_t | 0 | 0 |
| | 0 | 0 | 0
Yes, I am seeing only zeros too. Jan?
---------------------------------------------------------------------------
mlw wrote:
Bruce Momjian wrote:
Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?If the info is around, I could whip up something pretty easily, I think.
Sure, Jan's new statistics tables in 7.2 had just that info.
test=> \d pg_stat_user_tables
View "pg_stat_user_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |I have a system running 7.2b2, does it update these fields? Is it an option?
All I am getting is zeros.relid | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
-----------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
16559 | snf_cache | 0 | 0 | 0
| 0 | 0 | 0 | 0
8689760 | fav_stat | 0 | 0 |
| | 0 | 0 | 0
19174244 | mbr_art_aff | 0 | 0 | 0
| 0 | 0 | 0 | 0
20788376 | artist_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
83345144 | saffweak | 0 | 0 |
| | 0 | 0 | 0
94811871 | pga_queries | 0 | 0 |
| | 0 | 0 | 0
94812980 | pga_forms | 0 | 0 |
| | 0 | 0 | 0
94813425 | pga_scripts | 0 | 0 |
| | 0 | 0 | 0
94813869 | pga_reports | 0 | 0 |
| | 0 | 0 | 0
94814245 | pga_schema | 0 | 0 |
| | 0 | 0 | 0
116675008 | int_song_affinity | 0 | 0 |
| | 0 | 0 | 0
166147508 | favorites | 0 | 0 | 0
| 0 | 0 | 0 | 0
173869647 | song_affinity_orig_t | 0 | 0 |
| | 0 | 0 | 0
176339567 | song_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
178658941 | song_affinity_array | 0 | 0 | 0
| 0 | 0 | 0 | 0
186403716 | mbr_art_aff_t | 0 | 0 |
| | 0 | 0 | 0---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Yes, I am seeing only zeros too. Jan?
Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Yes, I am seeing only zeros too. Jan?
Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.
Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Yes, I am seeing only zeros too. Jan?
Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.
This raises the question, by turning these on, does that affect database
performance?
If so, it may not be the answer for a selective vacuum.
If they do not affect performance, then why have them off?
mlw wrote:
Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Yes, I am seeing only zeros too. Jan?
Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.This raises the question, by turning these on, does that affect database
performance?If so, it may not be the answer for a selective vacuum.
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
mlw wrote:
Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Yes, I am seeing only zeros too. Jan?
Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.This raises the question, by turning these on, does that affect database
performance?If so, it may not be the answer for a selective vacuum.
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.
Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?
You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Tuesday, March 05, 2002 12:59 PM
To: mlw
Cc: Tom Lane; Nicolas Bazin; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Postgresql backend to perform vacuum
automatically
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would
be a
win to keep it on all the time, perhaps.
Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?
You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.
How long does it take to vacuum a table with 12 indexes and 100 million
rows in it? This idea is making me very nervous. Suppose (for
instance)
that we have regular updates to some table, and it is constantly getting
vacuum attempts thrown at it.
Now imagine a large systems with many large tables which are frequently
receiving updates. Would 100 simultaneous vacuum operations be a good
thing when .0001% of the table has changed [on average] for each of
them?
I know for sure "update statistics" at regular intervals on some of the
SQL systems I have used would be sheer suicide.
Better make it configurable, that's for sure.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<
Import Notes
Resolved by subject fallback
On Tue, 2002-03-05 at 15:59, Bruce Momjian wrote:
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.
Ick -- polling. The statistics process should be able to wake somebody
up / notify the postmaster when the statistics change such that a vacuum
is required.
Neil
--
Neil Padgett
Red Hat Canada Ltd. E-Mail: npadgett@redhat.com
2323 Yonge Street, Suite #300,
Toronto, ON M4P 2C9
Neil Padgett wrote:
On Tue, 2002-03-05 at 15:59, Bruce Momjian wrote:
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.Ick -- polling. The statistics process should be able to wake somebody
up / notify the postmaster when the statistics change such that a vacuum
is required.
Yes, that would tie that stats collector closer to auto-vacuum, but it
certainly could be done.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Neil Padgett wrote:
On Tue, 2002-03-05 at 15:59, Bruce Momjian wrote:
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.Ick -- polling. The statistics process should be able to wake somebody
up / notify the postmaster when the statistics change such that a vacuum
is required.Yes, that would tie that stats collector closer to auto-vacuum, but it
certainly could be done.
Using an alert can be done, but polling is easier for a proof of concept. I
dont see too much difficulty there. We could use notify.
If they do not affect performance, then why have them off?
I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.Assuming that the statistics get updated:
How often should the sats table be queried?
What sort of configurability would be needed?You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.
I cannot envision querying the stats every 4 seconds, especially if the stats
thread already has most of the info in hand.
I still think, that for best results the vacuums should happen continuously
for single pages based on a hook in wal or the buffer manager. Do I remember
correctly, that the active page (the one receiving the next row) already has
a strategy for slot reuse ? Maybe this strategy should be the followed more
aggressively ?
Seems the worst case is a few row table that permanently get updated,
it should be possible to harness this situation with above method.
Andreas
Import Notes
Resolved by subject fallback