autovacuum next steps

Started by Alvaro Herreraabout 19 years ago64 messageshackers

alvherre@2ndquadrant.com

about 19 years ago

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:

First, we introduce an autovacuum_max_workers parameter, to limit the
total amount of workers that can be running at any time. Use this
number to create extra PGPROC entries, etc, similar to the way we handle
the prepared xacts stuff. The default should be low, say 3 o 4.

The launcher sends a worker into a database just like it does currently.
This worker determines what tables need vacuuming per the pg_autovacuum
settings and pgstat data. If it's more than one table, it puts the
number of tables in shared memory and sends a signal to the launcher.

The launcher then starts
min(autovacuum_max_workers - currently running workers, tables to vacuum - 1)
more workers to process that database. Maybe we could have a
max-workers parameter per-database in pg_database to use as a limit here
as well.

Each worker, including the initial one, starts vacuuming tables
according to pgstat data. They recheck the pgstat data after finishing
each table, so that a table vacuumed by another worker is not processed
twice (maybe problematic: a table with high update rate may be vacuumed
more than once. Maybe this is a feature not a bug).

Once autovacuum_naptime has passed, if the workers have not finished
yet, the launcher wants to vacuum another database. At this point, the
launcher wants some of the workers processing the first database to exit
early as soon as they finish one table, so that they can help vacuuming
the other database. It can do this by setting a flag in shmem that the
workers can check when finished with a table; if the flag is set, they
exit instead of continuing with another table. The launcher then starts
a worker in the second database. The launcher does this until the
number of workers is even among both databases. This can be done till
having one worker per database; so at most autovacuum_max_workers
databases can be under automatic vacuuming at any time, one worker each.

When there are autovacuum_max_workers databases under vacuum, the
launcher doesn't have anything else to do until some worker exits on its
own.

When there is a single worker processing a database, it does not recheck
pgstat data after each table. This is to prevent a high-update-rate
table from starving the vacuuming of other databases.

How does this sound?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Alvaro Herrera (#1)

Re: autovacuum next steps

Alvaro Herrera wrote:

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:

[Snip Detailed Description]

How does this sound?

On first blush, I'm not sure I like this as it doesn't directly attack
the table starvation problem, and I think it could be a net loss of speed.

VACUUM is I/O bound, as such, just sending multiple vacuum commands at a
DB isn't going to make things faster, you are now going to have multiple
processes reading from multiple tables at the same time. I think in
general this is a bad thing (unless we someday account for I/O made
available from multiple tablespaces). In general the only time it's a
good idea to have multiple vacuums running at the same time is when a
big table is starving a small hot table and causing bloat.

I think we can extend the current autovacuum stats to add one more
column that specifies "is hot" or something to that effect. Then when
the AV launcher sends a worker to a DB, it will first look for tables
marked as hot and work on them. While working on hot tables, the
launcher need not send any additional workers to this database, if the
launcher notices that a worker is working on regular tables, it can send
another worker which will look for hot tables to working, if the worker
doesn't find any hot tables that need work, then it exits leaving the
original working to continue plodding along.

Thoughts?

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Matthew T. O'Connor (#2)

Re: autovacuum next steps

Matthew T. O'Connor wrote:

Alvaro Herrera wrote:

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:

[Snip Detailed Description]

How does this sound?

On first blush, I'm not sure I like this as it doesn't directly attack
the table starvation problem, and I think it could be a net loss of speed.

VACUUM is I/O bound, as such, just sending multiple vacuum commands at a
DB isn't going to make things faster, you are now going to have multiple
processes reading from multiple tables at the same time. I think in
general this is a bad thing (unless we someday account for I/O made
available from multiple tablespaces).

Yeah, I understand that. However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available. The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.

I think we can extend the current autovacuum stats to add one more
column that specifies "is hot" or something to that effect. Then when
the AV launcher sends a worker to a DB, it will first look for tables
marked as hot and work on them. While working on hot tables, the
launcher need not send any additional workers to this database, if the
launcher notices that a worker is working on regular tables, it can send
another worker which will look for hot tables to working, if the worker
doesn't find any hot tables that need work, then it exits leaving the
original working to continue plodding along.

How would you define what's a "hot" table?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Bruce Momjian

bruce@momjian.us

about 19 years ago

In reply to: Alvaro Herrera (#1)

Re: autovacuum next steps

In an ideal world I think you want precisely one vacuum process running per
tablespace on the assumption that each tablespace represents a distinct
physical device.

The cases where we currently find ourselves wanting more are where small
tables are due for vacuuming more frequently than the time it takes for a
large table to receive a single full pass.

If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.

One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.

If instead autovacuum could tell vacuum exactly how long to run for (or
calculated how many pages that represented based on cost_delay) then it could
calculate when it will next need to schedule another table in the same
tablespace and try to arrange for the vacuum of the large table to be done by
then.

Once there are no smaller more frequently vacuumed small tables due to be
scheduled it would start vacuum for the large table again and it would resume
from where the first one left off.

This only works if the large tables really don't need to be vacuumed so often
that autovacuum can't keep up. Our current situation is that there is a size
at which this happens. But arranging to have only one vacuum process per
tablespace will only make that less likely to happen rather than more.

I think the changes to vacuum itself are pretty small to get it to remember
where it left off last time and start from mid-table. I'm not sure how easy it
would be to get autovacuum to juggle all these variables though.

Of course users may not create separate tablespaces for physical devices, or
they may set cost_delay so high you really do need more vacuum processes, etc.
So you probably still need a num_vacuum_daemons but the recommended setting
would be the same as the number of physical devices and autovacuum could try
to divide them equally between tablespaces which would amount to the same
thing.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Alvaro Herrera (#3)

Re: autovacuum next steps

Alvaro Herrera wrote:

Matthew T. O'Connor wrote:

On first blush, I'm not sure I like this as it doesn't directly attack
the table starvation problem, and I think it could be a net loss of speed.

VACUUM is I/O bound, as such, just sending multiple vacuum commands at a
DB isn't going to make things faster, you are now going to have multiple
processes reading from multiple tables at the same time. I think in
general this is a bad thing (unless we someday account for I/O made
available from multiple tablespaces).

Yeah, I understand that. However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available. The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.

So you are telling people to choose an autovacuum_delay so high that
they need to run multiple autovacuums at once to keep up? I'm probably
being to dramatic, but it seems inconsistent.

I think we can extend the current autovacuum stats to add one more
column that specifies "is hot" or something to that effect. Then when
the AV launcher sends a worker to a DB, it will first look for tables
marked as hot and work on them. While working on hot tables, the
launcher need not send any additional workers to this database, if the
launcher notices that a worker is working on regular tables, it can send
another worker which will look for hot tables to working, if the worker
doesn't find any hot tables that need work, then it exits leaving the
original working to continue plodding along.

How would you define what's a "hot" table?

I wasn't clear, I would have the Admin specified it, and we can store it
as an additional column in the pg_autovacuum_settings table. Or perhaps
if the table is below some size threshold and autovacuum seems that it
needs to be vacuumed every time it checks it 10 times in a row or
something like that.

Chris Browne

cbbrowne@acm.org

about 19 years ago

In reply to: Alvaro Herrera (#1)

Re: autovacuum next steps

alvherre@commandprompt.com (Alvaro Herrera) writes:

When there is a single worker processing a database, it does not recheck
pgstat data after each table. This is to prevent a high-update-rate
table from starving the vacuuming of other databases.

This case is important; I don't think that having multiple workers
fully alleviates the problem condition.

Pointedly, you need to have a way of picking up tables often enough to
avoid the XID rollover problem. That may simply require that on some
periodic basis, a query is run to queue up tables that are getting
close to having an "XID problem."
--
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://linuxfinances.info/info/finances.html
Rules of the Evil Overlord #189. "I will never tell the hero "Yes I
was the one who did it, but you'll never be able to prove it to that
incompetent old fool." Chances are, that incompetent old fool is
standing behind the curtain." <http://www.eviloverlord.com/>

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Alvaro Herrera (#1)

Re: autovacuum next steps

Alvaro Herrera <alvherre@commandprompt.com> writes:

Each worker, including the initial one, starts vacuuming tables
according to pgstat data. They recheck the pgstat data after finishing
each table, so that a table vacuumed by another worker is not processed
twice (maybe problematic: a table with high update rate may be vacuumed
more than once. Maybe this is a feature not a bug).

How are you going to make that work without race conditions? ISTM
practically guaranteed that all the workers will try to vacuum the same
table.

Once autovacuum_naptime has passed, if the workers have not finished
yet, the launcher wants to vacuum another database.

This seems a rather strange design, as it will encourage concentrations
of workers in a single database. Wouldn't it be better to spread them
out among multiple databases by default?

regards, tom lane

Ron Mayer

rm_pg@cheapcomplexdevices.com

about 19 years ago

In reply to: Alvaro Herrera (#1)

Re: autovacuum next steps

Alvaro Herrera wrote:

Once autovacuum_naptime... autovacuum_max_workers...
How does this sound?

The knobs exposed on autovacuum feel kinda tangential to
what I think I'd really want to control.

IMHO "vacuum_mbytes_per_second" would be quite a bit more
intuitive than cost_delay, naptime, etc.

ISTM I can relatively easily estimate and/or spec out how
much "extra" I/O bandwidth I have per device for vacuum;
and would pretty much want vacuum to be constantly
running on whichever table that needs it the most so
long as it can stay under that bandwith limit.

Could vacuum have a tunable that says "X MBytes/second"
(perhaps per device) and have it measure how much I/O
it's actually doing and try to stay under that limit?

For more fine-grained control a cron job could go
around setting different MBytes/second limits during
peak times vs idle times.

If people are concerned about CPU intensive vacuums
instead of I/O intensive ones (does anyone experience
that? - another tuneable "vacuum_percent_of_cpu" would
be more straightforward than delay_cost, cost_page_hit,
etc. But I'd be a bit surprised if cpu intensive
vacuums are common.

Csaba Nagy

nagy@ecircle-ag.com

about 19 years ago

In reply to: Bruce Momjian (#4)

Re: autovacuum next steps

One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.

From my experience this is not acceptable... I have tables for which the

index cleanup takes hours, so no matter how low I would set the
maintenance_work_mem (in fact I set it high enough so there's only one
iteration), it will take too much time so the queue tables get overly
bloated (not happening either, they get now special "cluster"
treatment).

Cheers,
Csaba.

#10

Galy Lee

lee.galy@oss.ntt.co.jp

about 19 years ago

In reply to: Bruce Momjian (#4)

Re: autovacuum next steps

Gregory Stark wrote:

If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.

I think to make vacuum has this interrupted-resume capability is quite
useful for large table.

It can provide more flexibility for autovacuum to create a good schedule
scheme. Sometimes it takes a whole day to vacuum the large table
(Hundreds-GB table may qualify); setting the cost_delay make it even
lasts for several days. If the system has maintenance time, vacuum task
of the large table can be split to fit into the maintenance time by
interrupted-resume feature.

One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.

Even a single iteration may take a long time, so it is not so much
useful to have a break in the boundary of the iteration. I think it is
not so difficult to get vacuum to remember where it leaves and start
from where it leaves last time. The following is a basic idea.

A typical vacuum process mainly have the following phases:
Phase 1. scan heap
Phase 2. scan and sweep index
Phase 3. sweep heap
Phase 4. update FSM
Phase 5. truncate CLOG

Where vacuum is interrupted, we can just save the collected information
into the disk, and restore it later when vacuum restarts. When vacuum
process is interrupted, we can remember the dead tuple list and the
block number it has scanned in phase 1; the indexes it has cleanup in
phase 2; the tuples it has swept in phase 3. Before exiting from vacuum,
we can also merge the free space information into FSM.

We are working on this feature now. I will propose it latter to discuss
with you.

Best Regards
Galy Lee
--
NTT Open Source Software Center

#11

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Ron Mayer (#8)

Re: autovacuum next steps

I'm wondering if we can do one better...

Since what we really care about is I/O responsiveness for the rest of
the system, could we just time how long I/O calls take to complete? I
know that gettimeofday can have a non-trivial overhead, but do we care
that much about it in the case of autovac?

On Fri, Feb 16, 2007 at 05:37:26PM -0800, Ron Mayer wrote:

Alvaro Herrera wrote:

Once autovacuum_naptime... autovacuum_max_workers...
How does this sound?

The knobs exposed on autovacuum feel kinda tangential to
what I think I'd really want to control.

IMHO "vacuum_mbytes_per_second" would be quite a bit more
intuitive than cost_delay, naptime, etc.

ISTM I can relatively easily estimate and/or spec out how
much "extra" I/O bandwidth I have per device for vacuum;
and would pretty much want vacuum to be constantly
running on whichever table that needs it the most so
long as it can stay under that bandwith limit.

Could vacuum have a tunable that says "X MBytes/second"
(perhaps per device) and have it measure how much I/O
it's actually doing and try to stay under that limit?

For more fine-grained control a cron job could go
around setting different MBytes/second limits during
peak times vs idle times.

If people are concerned about CPU intensive vacuums
instead of I/O intensive ones (does anyone experience
that? - another tuneable "vacuum_percent_of_cpu" would
be more straightforward than delay_cost, cost_page_hit,
etc. But I'd be a bit surprised if cpu intensive
vacuums are common.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#12

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Alvaro Herrera (#1)

Re: autovacuum next steps, take 2

Ok, scratch that :-) Another round of braindumping below.

Launcher starts one worker in each database. This worker is not going
to do vacuum work, just report how much vacuum effort is needed in the
database. "Vacuum effort" is measured as the total number of pages in
need of vacuum, being the sum of relpages of all tables and indexes
needing vacuum. (Note: we weight heap pages the same as index pages.
Is this OK?)

Create a plan for vacuuming all those databases within the constraints
of max_workers. Databases needing the most work are vacuumed first.
One worker per database. Thus max_workers databases are being vacuumed
in parallel at this time. When one database is finished, the launcher
starts a worker in the next database in the list.

When the plan is complete (i.e. the list is empty) we can do the whole
thing again, excluding the databases that are still being vacuumed.

Perhaps we should wait autovacuum_naptime seconds between finishing one
vacuum round in all databases and starting the next. How do we measure
this: do we start sleeping when the last worker finishes, or when the
list is empty?

Perhaps we should reserve a worker for vacuuming hot tables. Launcher
then uses max_workers-1 workers for the above plan, and the spare worker
is continuously connecting to one database, vacuuming hot tables, going
away, the launcher starts it again to connect to the next database.
Definitional problem: how to decide what's a hot table? One idea (the
simplest) is to let the DBA define it.

Thus, at most two workers are on any database: one of them is working on
normal tables, the other on hot tables.

(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables). This may be overkill.)

Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc. He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented. So this "take 2" could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).

Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace. I say we're not :-) If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate. I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.

I'd like to know if this responds to the mentioned people's objections.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#13

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Alvaro Herrera (#12)

Re: autovacuum next steps, take 2

Alvaro Herrera wrote:

Ok, scratch that :-) Another round of braindumping below.

I still think this is solution in search of a problem. The main problem
we have right now is that hot tables can be starved from vacuum. Most
of this proposal doesn't touch that. I would like to see that problem
solved first, then we can talk about adding multiple workers per
database or per tablespace etc...

(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables). This may be overkill.)

I think this is more along the lines of what we need first.

Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc. He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented. So this "take 2" could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).

Agree this is probably a good idea in the long run, but I agree this is
lower on the priority list and should come next.

Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace. I say we're not :-) If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate. I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.

I agree that in the long run it might be better to have multiple workers
with MB/s throttle and tablespace aware, but we don't have any of that
infrastructure right now. I think the piece of low-hanging fruit that
your launcher concept can solve is the hot table starvation.

My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2) worker1 deals with "hot" tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime.
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as
worker1 did above. If worker1 is still working in DB X then worker2
looks for hot tables that are being starved because worker1 got busy.
If worker2 finds no hot tables that need work, then worker2 exits.

This seems a very simple solution (given your launcher work) that can
solve the starvation problem.

Thoughts?

#14

Bruce Momjian

bruce@momjian.us

about 19 years ago

In reply to: Alvaro Herrera (#12)

Re: autovacuum next steps, take 2

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace. I say we're not :-)

I did say that. But your comment about using a high cost_delay was fairly
convincing too. It would be a simpler design and I think you're right. As long
as raise both cost_delay and cost_limit by enough you should get pretty much
the same sequential i/o rate and not step on each others toes too much.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#15

Joshua D. Drake

jd@commandprompt.com

about 19 years ago

In reply to: Alvaro Herrera (#12)

Re: autovacuum next steps, take 2

Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc. He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented. So this "take 2" could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).

vacuum should be a process with the least amount of voodoo. If we can
just have vacuum_delay and vacuum_threshold, where threshold allows an
arbitrary setting of how much bandwidth we will allot to the process,
then that is a beyond wonderful thing.

It is easy to determine how much IO you have, and what you can spare.

Joshua D. Drake

Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace. I say we're not :-) If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate. I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.

I'd like to know if this responds to the mentioned people's objections.

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

#16

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

about 19 years ago

In reply to: Joshua D. Drake (#15)

Re: autovacuum next steps, take 2

vacuum should be a process with the least amount of voodoo.
If we can just have vacuum_delay and vacuum_threshold, where
threshold allows an arbitrary setting of how much bandwidth
we will allot to the process, then that is a beyond wonderful thing.

It is easy to determine how much IO you have, and what you can spare.

The tricky part is what metric to use. Imho "IO per second" would be
good.
In a typical DB scenario that is the IO bottleneck, not the Mb/s.

Andreas

#17

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#13)

Re: autovacuum next steps, take 2

On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:

My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2) worker1 deals with "hot" tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime.
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as
worker1 did above. If worker1 is still working in DB X then worker2
looks for hot tables that are being starved because worker1 got busy.
If worker2 finds no hot tables that need work, then worker2 exits.

Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#18

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Jim Nasby (#17)

Re: autovacuum next steps, take 2

Jim C. Nasby wrote:

On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:

My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2) worker1 deals with "hot" tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime.
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as
worker1 did above. If worker1 is still working in DB X then worker2
looks for hot tables that are being starved because worker1 got busy.
If worker2 finds no hot tables that need work, then worker2 exits.

Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.

So the heuristic would be:
* Launcher fires off workers into a database at a given interval
(perhaps configurable?)
* Each worker works on tables in size order.
* If a worker ever catches up to an older worker, then the younger
worker exits.

This sounds simple and workable to me, perhaps we can later modify this
to include some max_workers variable so that a worker would only exit if
it catches an older worker and there are max_workers currently active.

Thoughts?

#19

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#18)

Re: autovacuum next steps, take 2

On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:

Jim C. Nasby wrote:

On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:

My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2) worker1 deals with "hot" tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime.
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as
worker1 did above. If worker1 is still working in DB X then worker2
looks for hot tables that are being starved because worker1 got busy.
If worker2 finds no hot tables that need work, then worker2 exits.

Rather than required people to manually identify hot tables, what if we
just prioritize based on table size? So if a second autovac process hits
a specific database, it would find the smallest table in need of
vacuuming that it should be able to complete before the next naptime and
vacuum that. It could even continue picking tables until it can't find
one that it could finish within the naptime. Granted, it would have to
make some assumptions about how many pages it would dirty.

ISTM that's a lot easier than forcing admins to mark specific tables.

So the heuristic would be:
* Launcher fires off workers into a database at a given interval
(perhaps configurable?)
* Each worker works on tables in size order.
* If a worker ever catches up to an older worker, then the younger
worker exits.

This sounds simple and workable to me, perhaps we can later modify this
to include some max_workers variable so that a worker would only exit if
it catches an older worker and there are max_workers currently active.

That would likely result in a number of workers running in one database,
unless you limited how many workers per database. And if you did that,
you wouldn't be addressing the frequently update table problem.

A second vacuum in a database *must* exit after a fairly short time so
that we can go back in and vacuum the important tables again (well or
the 2nd vacuum has to periodically re-evaluate what tables need to be
vacuumed).
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#20

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Zeugswetter Andreas SB SD (#16)

Re: autovacuum next steps, take 2

On Thu, Feb 22, 2007 at 09:35:45AM +0100, Zeugswetter Andreas ADI SD wrote:

vacuum should be a process with the least amount of voodoo.
If we can just have vacuum_delay and vacuum_threshold, where
threshold allows an arbitrary setting of how much bandwidth
we will allot to the process, then that is a beyond wonderful thing.

It is easy to determine how much IO you have, and what you can spare.

The tricky part is what metric to use. Imho "IO per second" would be
good.
In a typical DB scenario that is the IO bottleneck, not the Mb/s.

Well, right now they're one in the same... but yeah, IO/sec probably
does make more sense.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#21

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

about 19 years ago

In reply to: Jim Nasby (#20)

#22

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Jim Nasby (#19)

#23

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#22)

#24

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Jim Nasby (#23)

#25

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Alvaro Herrera (#24)

#26

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Jim Nasby (#25)

#27

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Alvaro Herrera (#26)

#28

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Matthew T. O'Connor (#27)

#29

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Alvaro Herrera (#28)

#30

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Matthew T. O'Connor (#29)

#31

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Alvaro Herrera (#30)

#32

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Alvaro Herrera (#30)

#33

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#32)

#34

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Tom Lane (#32)

#35

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Alvaro Herrera (#28)

#36

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#29)

#37

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Jim Nasby (#35)

#38

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Jim Nasby (#34)

#39

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Jim Nasby (#34)

#40

Alvaro Herrera

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Tom Lane (#32)

#41

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Jim Nasby (#36)

#42

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Alvaro Herrera (#40)

#43

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#38)

#44

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#42)

#45

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Matthew T. O'Connor (#43)

#46

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Matthew T. O'Connor (#44)

#47

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#45)

#48

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#46)

#49

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Tom Lane (#46)

#50

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Alvaro Herrera (#37)

#51

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Jim Nasby (#49)

#52

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#41)

#53

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Tom Lane (#51)

#54

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Jim Nasby (#52)

#55

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#51)

#56

Tom Lane

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Matthew T. O'Connor (#55)

#57

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Tom Lane (#56)

#58

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#57)

#59

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#54)

#60

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Alvaro Herrera (#37)

#61

Matthew T. O'Connor

matthew@zeut.net

about 19 years ago

In reply to: Jim Nasby (#60)

#62

Jim Nasby

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Matthew T. O'Connor (#61)

#63

Casey Duncan

casey@pandora.com

about 19 years ago

In reply to: Alvaro Herrera (#26)

#64

Galy Lee

lee.galy@oss.ntt.co.jp

about 19 years ago

In reply to: Tom Lane (#56)