Postgresql backend to perform vacuum automatically

Started by Nicolas Bazinabout 24 years ago29 messageshackers
Jump to latest
#1Nicolas Bazin
nbazin@ingenico.com.au

From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.
Any comments?

#2Bruce Momjian
bruce@momjian.us
In reply to: Nicolas Bazin (#1)
Re: Postgresql backend to perform vacuum automatically

Nicolas Bazin wrote:

From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.

Any comments?

Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Postgresql backend to perform vacuum automatically

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Nicolas Bazin wrote:
From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.

Any comments?

Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?

There's a TODO item about this already.

* Provide automatic scheduling of background vacuum (Tom)

regards, tom lane

#4Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#3)
Re: Postgresql backend to perform vacuum automatically

Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?

There's a TODO item about this already.

* Provide automatic scheduling of background vacuum (Tom)

I think an good system would have some parameters something like this in
postgresql.conf:

vacuum_update_threshold
= num of operations that cause frags on a table before a vacuum is run, 0 =
no requirement, if followed by a percent (%) sign, indicates percentage of
rows changed, rather than an absolute number

vacuum_idle_threshold
= system load below which the system must be before a vacuum can be
performed. 0 = no requirement

vacuum_time_threshold
= seconds since last vacuum before which another vacuum cannot occur. 0 = no
requirement.

If all 3 are 0, then no auto-vacuuming is performed at all. There's
probably trouble if only the idle threshold is set to zero.

And the same for the 'analyze' command?

If they want it on a per-table basis, then they can just do it themselves
with a cronjob!

Chris

#5Bruce Momjian
bruce@momjian.us
In reply to: Christopher Kings-Lynne (#4)
Re: Postgresql backend to perform vacuum automatically

vacuum_idle_threshold
= system load below which the system must be before a vacuum can be
performed. 0 = no requirement

vacuum_time_threshold
= seconds since last vacuum before which another vacuum cannot occur. 0 = no
requirement.

If all 3 are 0, then no auto-vacuuming is performed at all. There's
probably trouble if only the idle threshold is set to zero.

And the same for the 'analyze' command?

If they want it on a per-table basis, then they can just do it themselves
with a cronjob!

Yes, and Jan's statistics stuff has stats on table activity.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6mlw
markw@mohawksoft.com
In reply to: Bruce Momjian (#2)
Re: Postgresql backend to perform vacuum automatically

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Nicolas Bazin wrote:
From what I read in the recent exchanges in the PostgreSQL vs ORACLE thread it would seem a good idea for the backend to keep track of the number of update performed on a database and after a certain threshold start a vacuum in a separate process by itself.

Any comments?

Yes, makes sense to me, especially now that we have a nolocking vacuum.
Tom, do you have any ideas on this?

There's a TODO item about this already.

* Provide automatic scheduling of background vacuum (Tom)

I have been thinking about this. I like the idea, but it may be problematic.

I suggested running a vacuum process on a constant low priority in the
background, and it was pointed out that this may cause some deadlock issues.

For vacuum to run in the background, it needs to be more regulated or targeted.

It needs to be able to know which tables need it. Many tables are static and
never get updated, vacuuming them would be pointless. It needs to be sensitive
to database load, and be tunable to vacuum only when safe or necessary to
reduce load.

Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?

If the info is around, I could whip up something pretty easily, I think.

#7Bruce Momjian
bruce@momjian.us
In reply to: mlw (#6)
Re: Postgresql backend to perform vacuum automatically

Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?

If the info is around, I could whip up something pretty easily, I think.

Sure, Jan's new statistics tables in 7.2 had just that info.

test=> \d pg_stat_user_tables
View "pg_stat_user_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8mlw
markw@mohawksoft.com
In reply to: Bruce Momjian (#7)
Re: Postgresql backend to perform vacuum automatically

Bruce Momjian wrote:

Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?

If the info is around, I could whip up something pretty easily, I think.

Sure, Jan's new statistics tables in 7.2 had just that info.

test=> \d pg_stat_user_tables
View "pg_stat_user_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |

I have a system running 7.2b2, does it update these fields? Is it an option?
All I am getting is zeros.

relid | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
-----------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
16559 | snf_cache | 0 | 0 | 0
| 0 | 0 | 0 | 0
8689760 | fav_stat | 0 | 0 |
| | 0 | 0 | 0
19174244 | mbr_art_aff | 0 | 0 | 0
| 0 | 0 | 0 | 0
20788376 | artist_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
83345144 | saffweak | 0 | 0 |
| | 0 | 0 | 0
94811871 | pga_queries | 0 | 0 |
| | 0 | 0 | 0
94812980 | pga_forms | 0 | 0 |
| | 0 | 0 | 0
94813425 | pga_scripts | 0 | 0 |
| | 0 | 0 | 0
94813869 | pga_reports | 0 | 0 |
| | 0 | 0 | 0
94814245 | pga_schema | 0 | 0 |
| | 0 | 0 | 0
116675008 | int_song_affinity | 0 | 0 |
| | 0 | 0 | 0
166147508 | favorites | 0 | 0 | 0
| 0 | 0 | 0 | 0
173869647 | song_affinity_orig_t | 0 | 0 |
| | 0 | 0 | 0
176339567 | song_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
178658941 | song_affinity_array | 0 | 0 | 0
| 0 | 0 | 0 | 0
186403716 | mbr_art_aff_t | 0 | 0 |
| | 0 | 0 | 0

#9Bruce Momjian
bruce@momjian.us
In reply to: mlw (#8)
Re: Postgresql backend to perform vacuum automatically

Yes, I am seeing only zeros too. Jan?

---------------------------------------------------------------------------

mlw wrote:

Bruce Momjian wrote:

Are there tables that track information that would be useful for guiding
vacuum? Could I write a program which queries some statistical tables and and
knows which tables need to be vacuumed?

If the info is around, I could whip up something pretty easily, I think.

Sure, Jan's new statistics tables in 7.2 had just that info.

test=> \d pg_stat_user_tables
View "pg_stat_user_tables"
Column | Type | Modifiers
---------------+---------+-----------
relid | oid |
relname | name |
seq_scan | bigint |
seq_tup_read | bigint |
idx_scan | numeric |
idx_tup_fetch | numeric |
n_tup_ins | bigint |
n_tup_upd | bigint |
n_tup_del | bigint |

I have a system running 7.2b2, does it update these fields? Is it an option?
All I am getting is zeros.

relid | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
-----------+----------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
16559 | snf_cache | 0 | 0 | 0
| 0 | 0 | 0 | 0
8689760 | fav_stat | 0 | 0 |
| | 0 | 0 | 0
19174244 | mbr_art_aff | 0 | 0 | 0
| 0 | 0 | 0 | 0
20788376 | artist_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
83345144 | saffweak | 0 | 0 |
| | 0 | 0 | 0
94811871 | pga_queries | 0 | 0 |
| | 0 | 0 | 0
94812980 | pga_forms | 0 | 0 |
| | 0 | 0 | 0
94813425 | pga_scripts | 0 | 0 |
| | 0 | 0 | 0
94813869 | pga_reports | 0 | 0 |
| | 0 | 0 | 0
94814245 | pga_schema | 0 | 0 |
| | 0 | 0 | 0
116675008 | int_song_affinity | 0 | 0 |
| | 0 | 0 | 0
166147508 | favorites | 0 | 0 | 0
| 0 | 0 | 0 | 0
173869647 | song_affinity_orig_t | 0 | 0 |
| | 0 | 0 | 0
176339567 | song_affinity | 0 | 0 | 0
| 0 | 0 | 0 | 0
178658941 | song_affinity_array | 0 | 0 | 0
| 0 | 0 | 0 | 0
186403716 | mbr_art_aff_t | 0 | 0 |
| | 0 | 0 | 0

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: Postgresql backend to perform vacuum automatically

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Yes, I am seeing only zeros too. Jan?

Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.

regards, tom lane

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: Postgresql backend to perform vacuum automatically

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Yes, I am seeing only zeros too. Jan?

Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.

Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#12mlw
markw@mohawksoft.com
In reply to: Bruce Momjian (#11)
Re: Postgresql backend to perform vacuum automatically

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Yes, I am seeing only zeros too. Jan?

Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.

Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.

This raises the question, by turning these on, does that affect database
performance?

If so, it may not be the answer for a selective vacuum.

If they do not affect performance, then why have them off?

#13Bruce Momjian
bruce@momjian.us
In reply to: mlw (#12)
Re: Postgresql backend to perform vacuum automatically

mlw wrote:

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Yes, I am seeing only zeros too. Jan?

Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.

Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.

This raises the question, by turning these on, does that affect database
performance?

If so, it may not be the answer for a selective vacuum.

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#14mlw
markw@mohawksoft.com
In reply to: Bruce Momjian (#13)
Re: Postgresql backend to perform vacuum automatically

Bruce Momjian wrote:

mlw wrote:

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Yes, I am seeing only zeros too. Jan?

Did you turn on statistics gathering? See the Admin Guide's discussion
of database monitoring.

Oops, now I remember. Those are off by default and only the query
string is on by default. Thanks.

This raises the question, by turning these on, does that affect database
performance?

If so, it may not be the answer for a selective vacuum.

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

#15Bruce Momjian
bruce@momjian.us
In reply to: mlw (#14)
Re: Postgresql backend to perform vacuum automatically

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#16Dann Corbit
DCorbit@connx.com
In reply to: Bruce Momjian (#15)
Re: Postgresql backend to perform vacuum automatically

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Tuesday, March 05, 2002 12:59 PM
To: mlw
Cc: Tom Lane; Nicolas Bazin; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Postgresql backend to perform vacuum
automatically

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would

be a

win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.

How long does it take to vacuum a table with 12 indexes and 100 million
rows in it? This idea is making me very nervous. Suppose (for
instance)
that we have regular updates to some table, and it is constantly getting

vacuum attempts thrown at it.

Now imagine a large systems with many large tables which are frequently
receiving updates. Would 100 simultaneous vacuum operations be a good
thing when .0001% of the table has changed [on average] for each of
them?

I know for sure "update statistics" at regular intervals on some of the
SQL systems I have used would be sheer suicide.

Better make it configurable, that's for sure.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<<<

#17Neil Padgett
npadgett@redhat.com
In reply to: Bruce Momjian (#15)
Re: Postgresql backend to perform vacuum automatically

On Tue, 2002-03-05 at 15:59, Bruce Momjian wrote:

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.

Ick -- polling. The statistics process should be able to wake somebody
up / notify the postmaster when the statistics change such that a vacuum
is required.

Neil

--
Neil Padgett
Red Hat Canada Ltd. E-Mail: npadgett@redhat.com
2323 Yonge Street, Suite #300,
Toronto, ON M4P 2C9

#18Bruce Momjian
bruce@momjian.us
In reply to: Neil Padgett (#17)
Re: Postgresql backend to perform vacuum automatically

Neil Padgett wrote:

On Tue, 2002-03-05 at 15:59, Bruce Momjian wrote:

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.

Ick -- polling. The statistics process should be able to wake somebody
up / notify the postmaster when the statistics change such that a vacuum
is required.

Yes, that would tie that stats collector closer to auto-vacuum, but it
certainly could be done.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#19mlw
markw@mohawksoft.com
In reply to: Bruce Momjian (#18)
Re: Postgresql backend to perform vacuum automatically

Bruce Momjian wrote:

Neil Padgett wrote:

On Tue, 2002-03-05 at 15:59, Bruce Momjian wrote:

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.

Ick -- polling. The statistics process should be able to wake somebody
up / notify the postmaster when the statistics change such that a vacuum
is required.

Yes, that would tie that stats collector closer to auto-vacuum, but it
certainly could be done.

Using an alert can be done, but polling is easier for a proof of concept. I
dont see too much difficulty there. We could use notify.

#20Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: mlw (#19)
Re: Postgresql backend to perform vacuum automatically

If they do not affect performance, then why have them off?

I think Jan said 2-3%. If we can get autovacuum from it, it would be a
win to keep it on all the time, perhaps.

Assuming that the statistics get updated:

How often should the sats table be queried?
What sort of configurability would be needed?

You could wake up every few minutes and see how the values have changed.
I don't remember if there is a way to clear that stats so you can see
just the changes in the past five minutes. Vacuum the table that had
activity.

I cannot envision querying the stats every 4 seconds, especially if the stats
thread already has most of the info in hand.

I still think, that for best results the vacuums should happen continuously
for single pages based on a hook in wal or the buffer manager. Do I remember
correctly, that the active page (the one receiving the next row) already has
a strategy for slot reuse ? Maybe this strategy should be the followed more
aggressively ?

Seems the worst case is a few row table that permanently get updated,
it should be possible to harness this situation with above method.

Andreas

#21Turbo Fredriksson
turbo@bayour.com
In reply to: Bruce Momjian (#7)
#22mlw
markw@mohawksoft.com
In reply to: Zeugswetter Andreas SB SD (#20)
#23Bruce Momjian
bruce@momjian.us
In reply to: Turbo Fredriksson (#21)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#20)
#25Turbo Fredriksson
turbo@bayour.com
In reply to: Bruce Momjian (#23)
#26Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
#27Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#26)
#28Myron Scott
mkscott@sacadia.com
In reply to: Zeugswetter Andreas SB SD (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#27)