pg_autovacuum next steps

Started by Matthew T. O'Connorabout 22 years ago38 messageshackers
Jump to latest
#1Matthew T. O'Connor
matthew@zeut.net

Lately I have been thinking about the next steps for the pg_autovacuum
daemon. I have written up a document that describes what I'm planning
to do next. Please read the attached and response as I would really
like some feedback.

Thanks,

Matthew O'Connor

Attachments:

pg_autovacuum_v2_writeup.rtfapplication/rtf; name=pg_autovacuum_v2_writeup.rtfDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew T. O'Connor (#1)
Re: pg_autovacuum next steps

"Matthew T. O'Connor" <matthew@zeut.net> writes:

Lately I have been thinking about the next steps for the pg_autovacuum
daemon. I have written up a document that describes what I'm planning
to do next. Please read the attached and response as I would really
like some feedback.

[ rtf document ]

Please repost in some less proprietary format. Plain text is generally
considered the thing to use on this list.

regards, tom lane

#3Markus Bertheau
twanger@bluetwanger.de
In reply to: Tom Lane (#2)
Re: pg_autovacuum next steps

В Пнд, 22.03.2004, в 00:12, Tom Lane пишет:

Please repost in some less proprietary format. Plain text is generally
considered the thing to use on this list.

--
Markus Bertheau <twanger@bluetwanger.de>

Attachments:

pg_autovacuum_v2_writeup.txttext/plain; charset=UTF-8; name=pg_autovacuum_v2_writeup.txtDownload
#4Peter Eisentraut
peter_e@gmx.net
In reply to: Matthew T. O'Connor (#1)
Re: pg_autovacuum next steps

Matthew T. O'Connor wrote:

Lately I have been thinking about the next steps for the
pg_autovacuum daemon. I have written up a document that describes
what I'm planning to do next. Please read the attached and response
as I would really like some feedback.

I think these configuration issues will become a lot easier if you make
the autovacuum daemon a subprocess of the postmaster (like, say, the
checkpoint process). Then you have access to a host of methods for
storing state, handling configuration, etc.

#5Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Peter Eisentraut (#4)
Re: pg_autovacuum next steps

I think these configuration issues will become a lot easier if you make
the autovacuum daemon a subprocess of the postmaster (like, say, the
checkpoint process). Then you have access to a host of methods for
storing state, handling configuration, etc.

Yeah - why delay making it a backend process? :)

Chris

#6Matthew T. O'Connor
matthew@zeut.net
In reply to: Christopher Kings-Lynne (#5)
Re: pg_autovacuum next steps

On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:

I think these configuration issues will become a lot easier if you make
the autovacuum daemon a subprocess of the postmaster (like, say, the
checkpoint process). Then you have access to a host of methods for
storing state, handling configuration, etc.

Yeah - why delay making it a backend process? :)

Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated. Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.

B) Perhaps people like the idea of it being a client app (I don't think
so.)

C) Most importantly, I'm not backend hacker. If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there. A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

So the first question big question is: Do we want to make it a backend
subprocess now?

Secondly, are there any other features that people are interested in
that were not mentioned in my document?

Matthew O'Connor

#7Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#2)
Re: pg_autovacuum next steps

On Sun, 2004-03-21 at 18:12, Tom Lane wrote:

"Matthew T. O'Connor" <matthew@zeut.net> writes:

[ rtf document ]

Please repost in some less proprietary format. Plain text is generally
considered the thing to use on this list.

I don't think RTF is proprietary but I should have just posted inline
anyway so here is a copy:

pg_autovacuum Version 2
Design Document:

Exec Summary:
pg_autovacuum was initially released as a contrib module in PostgreSQL
v7.4. The version in 7.4 is by design very simple. No configuration
is required, and very little configuration is possible. Despite these
limitations it was voted the most popular new feature of PostgreSQL v7.4
according to the survey held on postgresql.org
(http://www.postgresql.org/survey.php?View=1&amp;SurveyID=23).

Despite it's popularity there is much room for improvement. This
document sets out to define the most important improvements that would
help pg_autovacuum to become a truly powerful asset to the suite of
tools that come with PostgreSQL.

Current Problems & Limitations:

Based on user feedback from people using pg_autovacuum in the field, and
my own observations, there are a number of problems and limitation with
pg_autovacuum. They are:

* Inability to customize thresholds on a per table basis
* Inability to set default thresholds on a per database basis
* Inability to exclude specific databases / tables from pg_autovacuum
monitoring
* Inability to schedule vacuums during off-peak times
* Lack of integration related to startup and shutdown
* Ignorance of VACUUM and ANALYZE operations performed outside
pg_autovacuum (requires backend integration? or can listen / notify can
be used?)
* Lack of logging options / syslog integration / log rotation options
* Create table fails because template1 is busy

I'm not sure how to address all of these concerns, or that they all
should be addressed right now. One of my big questions is backend
integration. I am leaning towards leaving pg_autovacuum as a client
application in contrib for one more release. During this time, I can
continue to tweak and improve pg_autovacuum so that we will have a very
good idea what the final product should be before we make it a standard
backend process.

For PostgreSQL 7.5, I plan to implement these new features:

1.Per database defaults and per table thresholds (including total
exclusion)
2.Persistent data
3.Single-Pass Mode (external scheduling from cron etc...)
4.Off peak scheduling

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this
feature. The primary debate is where to save the configuration data. I
see three options:

1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.

2.Use a config file. This would require some additional coding to add
the required parsing, but is possible.

3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.

Since many people do not like tools that clutter their databases by
adding tables, I think option 1 (adding a pg_autovacuum table to
existing databases) is right out. Using a config file would be Ok, but
would require additional parsing code. My preference is option 3.
Since pg_autovacuum will (hopefully) eventually become an integrated
part of the backend, it will eventually be able to add required data to
the system catalogs. Given these two premises, as long as pg_autovacuum
remains a contrib module it could use it's own database to mimic having
system tables. If this database exists, it will be used, if it does not
exist, then pg_autovacuum will work just as it did in the 7.4 release
with very limited options available to it. The user will be able to
specify a non-default database.

Table Structure for database specific defaults and table specific
thresholds:

databases_defaults: (will reference the pg_class system table)
id serial primary key
exclude_database boolean
default_vacuum_scaling_factor float
default_vacuum_base_value int
default_analyze_scaling_factor float
default_analyze_base_value int
dboid oid references pg_database.oid

table_thresholds
id serial primary key
exclude_table boolean (exclude this table)
vacuum_scaling_factor float (equivalent to -v)
vacuum_base_value int (equivalent to -V)
vacuum_threshold float (if > 0, use this threshold)
analyze_scaling_factor float (equivalent to -a)
analyze_base_value int (equivalent to -A)
analyze_threshold float (if > 0 use this threshold)
relid oid references pg_classs.relid

2.Persistent pg_autovacuum Data:

Right now pg_autovacuum has no memory of what was going on the last time
it was run. So if significant changes have happened while pg_autovacuum
is not running, they will not be counted in the analysis of when to
perform a vacuum or analyze operation which can result in under
vacuuming. So, pg_autovacuum should occasionally write down it's
numbers to the database. The data will be stored in an additional
table called table_data

table_data
id serial primary key
CountAtLastAnalyze long
CountAtLastVacuum long
table_thresholds_id int references table_thresholds

3.Single-Pass Mode (External Scheduling):

I have received requests to be able to run pg_autovacuum only on request
(not as a daemon) making only one pass over all the tables (not looping
indefinately). The advantage being that it will operate more like the
current vacuum command except that it will only vacuum tables that need
to be vacuumed. This feature could be useful as long as pg_autovacuum
exists outside the backend. If pg_autovacuum gets integrated into the
backend and gets automatically started as a daemon during startup, then
this option will no longer make sense.

Once we have persistent data (Step 2) then we can easily operate in
Single-Pass Mode.

4.Off-Peak Scheduling:

A fundamental advantage of our vacuum system is that the work required
to reclaim table space is taken out of the critical path and can be
moved to and off-peak time when cycles are less precious. One of the
drawbacks of the current pg_autovacuum is that it doesn't have any way
to factor this in.

In it's simplest form (which I will implement first) I would add the
ability to add a second set of thresholds that will be active only
during an “off-peak” time that can be specified in the pg_autovacuum
database, perhaps in a general_settings table.

A few notes about things I'm not planning on working on at the moment.

Better logging options:

An additional logging could would be added to the pg_autovacuum database
and will log all activity (vacuums and analyzes) along with their
corresponding duration.

Syslog support. I'm not sure this is really needed, but a simple patch
was submitted by one user and perhaps that can be reviewed / improved
and applied.

#8Gavin Sherry
swm@linuxworld.com.au
In reply to: Matthew T. O'Connor (#6)
Re: pg_autovacuum next steps

On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:

On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:

I think these configuration issues will become a lot easier if you make
the autovacuum daemon a subprocess of the postmaster (like, say, the
checkpoint process). Then you have access to a host of methods for
storing state, handling configuration, etc.

Yeah - why delay making it a backend process? :)

Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated. Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.

I was talking to Jan about some other work on VACUUM related to more
intelligent vacuuming. Namely, maintaining a map (outside of shared
memory) of blocks which have been pushed out of the free space map for
VACUUM to visit (which requires a backend process) and being aware of load
restrictions (ie, allowing user to only vacuum when the load average is
less than X, for example) and some other leveling stuff to ensure that
availability is consistent. Whilst this doesn't related to pg_autovacuum
specifically, it'd be great if they could be released at the same time, I
think.

B) Perhaps people like the idea of it being a client app (I don't think
so.)

I'd like to see it as part of the backend.

C) Most importantly, I'm not backend hacker. If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there. A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

I'd be happy to help you out.

Gavin

#9Bruce Momjian
bruce@momjian.us
In reply to: Gavin Sherry (#8)
Re: pg_autovacuum next steps

B) Perhaps people like the idea of it being a client app (I don't think
so.)

I'd like to see it as part of the backend.

C) Most importantly, I'm not backend hacker. If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there. A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

I'd be happy to help you out.

Agreed.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10Matthew T. O'Connor
matthew@zeut.net
In reply to: Bruce Momjian (#9)
Re: pg_autovacuum next steps

On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:

C) Most importantly, I'm not backend hacker. If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there. A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

I'd be happy to help you out.

Agreed.

Ok, thanks for the offer to help, but I think I understated things above
when I said I'll need a "little" help :-)

I have a few big picture questions. Once pg_autovacuum is launched as a
postmaster sub-process, what changes? That is, currently pg_autovacuum
uses libpq to connect to a database and issue queries including a vacuum
/ analyze command when needed. After becoming a subprocess will
(should) it still use libpq to connect to the databases, I don't think
so, is it even possible to do that? If not, how will it checkout the
stats of all the different databases? I guess it should fork() a new
backend, connect to it somehow, and use it to query the database, but
I'm really not sure how this works.

I'm looking through the backend startup code to see how the stats
collector and the bgwriter work since they are probably two semi-close
examples of what I'll have to do. I think checkpoints does something
similar in that it issues a checkpoint command.

Thanks again...

Matthew

#11Gavin Sherry
swm@linuxworld.com.au
In reply to: Matthew T. O'Connor (#10)
Re: pg_autovacuum next steps

On Mon, 22 Mar 2004, Matthew T. O'Connor wrote:

On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:

C) Most importantly, I'm not backend hacker. If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there. A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

I'd be happy to help you out.

Agreed.

Ok, thanks for the offer to help, but I think I understated things above
when I said I'll need a "little" help :-)

I haven't looked at the code but...

I have a few big picture questions. Once pg_autovacuum is launched as a
postmaster sub-process, what changes? That is, currently pg_autovacuum
uses libpq to connect to a database and issue queries including a vacuum
/ analyze command when needed. After becoming a subprocess will
(should) it still use libpq to connect to the databases, I don't think

It could use libpq but most definately shouldn't.

so, is it even possible to do that? If not, how will it checkout the
stats of all the different databases? I guess it should fork() a new
backend, connect to it somehow, and use it to query the database, but
I'm really not sure how this works.

It can interact with the stats collector (seperate backend) in the same
way that existing backends interact: through a domain socket.

I'm looking through the backend startup code to see how the stats
collector and the bgwriter work since they are probably two semi-close
examples of what I'll have to do. I think checkpoints does something
similar in that it issues a checkpoint command.

The vacuum backend will call vacuum() (or something very like it)
directly. I imagine that when it gets called and on which table will be
based upon the existing algorithm.

Thanks,

Gavin

#12Gavin Sherry
swm@linuxworld.com.au
In reply to: Gavin Sherry (#11)
Re: pg_autovacuum next steps

Ok, thanks for the offer to help, but I think I understated things above
when I said I'll need a "little" help :-)

I haven't looked at the code but...

I have a few big picture questions. Once pg_autovacuum is launched as a
postmaster sub-process, what changes? That is, currently pg_autovacuum
uses libpq to connect to a database and issue queries including a vacuum
/ analyze command when needed. After becoming a subprocess will
(should) it still use libpq to connect to the databases, I don't think

It could use libpq but most definately shouldn't.

so, is it even possible to do that? If not, how will it checkout the
stats of all the different databases? I guess it should fork() a new
backend, connect to it somehow, and use it to query the database, but
I'm really not sure how this works.

It can interact with the stats collector (seperate backend) in the same
way that existing backends interact: through a domain socket.

I'm looking through the backend startup code to see how the stats
collector and the bgwriter work since they are probably two semi-close
examples of what I'll have to do. I think checkpoints does something
similar in that it issues a checkpoint command.

The vacuum backend will call vacuum() (or something very like it)
directly. I imagine that when it gets called and on which table will be
based upon the existing algorithm.

One point is this: vacuum() assumes that you are running in a fully
fledged backend. There'd be a fair bit of work involved in allowing a
single process to call vacuum() against multiple databases. As such, I
think that a vacuum backend for a specific database should be forked upon
the first connect. Also, the backend might like to try and workout if
there are any active backends for its database every so often and if not,
perform a final vacuum (if necessary) and exit, so that we don't have lots
of idle processes sitting around.

Is there a better approach than this?

Gavin

#13Karel Zak
zakkr@zf.jcu.cz
In reply to: Matthew T. O'Connor (#10)
Re: pg_autovacuum next steps

On Mon, Mar 22, 2004 at 02:35:37AM -0500, Matthew T. O'Connor wrote:

On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote:

C) Most importantly, I'm not backend hacker. If someone wants to do the
initial work of getting it running as a backend process, I can take it
from there. A while ago, Bruce offered to help me with any backend
issues I might have, so perhaps with a little help I can take a run at
it.

I'd be happy to help you out.

Agreed.

Ok, thanks for the offer to help, but I think I understated things above
when I said I'll need a "little" help :-)

I have a few big picture questions. Once pg_autovacuum is launched as a
postmaster sub-process, what changes?

All. It's important do it as backend process. Because libpq has very,
very limited and slow resources for work with backend stuff.

The base should be the standard backend with different "main loop" that
will instead socket checks some shared information about tables and
calls directly vacuum stuff. In this case you can omit work with
connections, parser etc.

I thought about it in last days and I found perfect Tom's idea about
FSM tables usage:

What I had in the back of my mind was: each backend counts
attempted insertions and deletions in its relcache entries (an
update adds to both counts). At transaction commit or abort, we
know which of these two counts represents the number of dead
tuples added to each relation, so while we scan the relcache for
post-xact cleanup (which we will be doing anyway) we can
transfer the correct count into the shared FSM entry for the
relation. This gives us a reasonably accurate count in shared
memory of all the tuple obsoletions since bootup, at least for
heavily-used tables. (The FSM might choose to forget about
lightly-used tables.) The auto vacuumer could look at the FSM
numbers to decide which tables are highest priority to vacuum.
(2002-09-03 08:10:32)

I looked at the code and I think extend FSM tables will pretty simple,
but I unsure how relcache counters Tom thought. Tom?

Karel

--
Karel Zak <zakkr@zf.jcu.cz>
http://home.zf.jcu.cz/~zakkr/

#14Richard Huxton
dev@archonet.com
In reply to: Matthew T. O'Connor (#7)
Re: pg_autovacuum next steps

On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote:

1. Per Database defaults and Per table Thresholds:

There are differing opinions as to the best way to providing these this
feature. The primary debate is where to save the configuration data. I
see three options:

1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.

2.Use a config file. This would require some additional coding to add
the required parsing, but is possible.

3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.

Since many people do not like tools that clutter their databases by
adding tables, I think option 1 (adding a pg_autovacuum table to
existing databases) is right out. Using a config file would be Ok, but
would require additional parsing code. My preference is option 3.

I've nothing against #3 as a default, but can I put in a suggestion for 1 & 3,
or rather some setting definable at runtime/build-time that lets you select
database + schema for autovacuum to find its config data.

I might be wrong, but it strikes me as the sort of thing people running shared
environments will want to choose for themselves.

--
Richard Huxton
Archonet Ltd

#15Jan Wieck
JanWieck@Yahoo.com
In reply to: Gavin Sherry (#8)
Re: pg_autovacuum next steps

Gavin Sherry wrote:

On Sun, 21 Mar 2004, Matthew T. O'Connor wrote:

On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote:

I think these configuration issues will become a lot easier if you make
the autovacuum daemon a subprocess of the postmaster (like, say, the
checkpoint process). Then you have access to a host of methods for
storing state, handling configuration, etc.

Yeah - why delay making it a backend process? :)

Ok, well this was part of the reason to have this conversation.

My reasons:
A) I wasn't sure if people really thought this was ready to be
integrated. Tom had said a while ago, that it was a good to keep it as
a contrib module while it's still actively being developed.

I was talking to Jan about some other work on VACUUM related to more
intelligent vacuuming. Namely, maintaining a map (outside of shared
memory) of blocks which have been pushed out of the free space map for
VACUUM to visit (which requires a backend process) and being aware of load
restrictions (ie, allowing user to only vacuum when the load average is
less than X, for example) and some other leveling stuff to ensure that
availability is consistent. Whilst this doesn't related to pg_autovacuum
specifically, it'd be great if they could be released at the same time, I
think.

I don't recall the "outside of shared memory" part. Anyhow, the whole
story goes like this:

Maintain 2 bits per block that tell if the block has been vaccumed of
all dead tuples since the last time it was dirtied, and if all its
tuples are completely frozen. If those two conditions are true, there is
no need to vacuum that block at all (Red Flag!!! On further thinking I
realized that this assumes that the FSM is loss free).

With a default 8K blocksize, this means 32K per 1GB segment, making 4
additional blocks. I actually think that these extra blocks should be
somehow part of the heap files, so that they are subject to the regular
buffer management.

To keep the lock contention on them low, vacuum and backends will
set/clear new flags in the bufhdr flags member. That way, the bgwriter
and checkpointer will be the usual suspects to set/clear these flags in
the shared bitmap array stored in the extra blocks.

As to where to store these blocks, some block number arithmetic magic
comes to mind. That way a blocks relnode and blockno automatically lead
to the bits, even in the case of blind writes.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#16Matthew T. O'Connor
matthew@zeut.net
In reply to: Gavin Sherry (#12)
Re: pg_autovacuum next steps

On Mon, 2004-03-22 at 03:36, Gavin Sherry wrote:

One point is this: vacuum() assumes that you are running in a fully
fledged backend. There'd be a fair bit of work involved in allowing a
single process to call vacuum() against multiple databases.

I can't imagine we want to do that.

As such, I
think that a vacuum backend for a specific database should be forked upon
the first connect. Also, the backend might like to try and workout if
there are any active backends for its database every so often and if not,
perform a final vacuum (if necessary) and exit, so that we don't have lots
of idle processes sitting around.

I don't understand, are you saying that upon postmaster startup a vacuum
backend should be forked for every database in the cluster?

Is there a better approach than this?

One thought I had is that there wouldn't be a persistent daemon type
process, rather the auto_vacuum process, like the check point process,
would be kicked off periodically, it would initially be attached to
template1, then when it's done there it will figure out the next
database to take a look at and fork, the existing template1 process will
exit, and the new process will be connected to the next database, just
repeat this forking loop until done.

Thoughts?

#17Matthew T. O'Connor
matthew@zeut.net
In reply to: Karel Zak (#13)
Re: pg_autovacuum next steps

On Mon, 2004-03-22 at 04:23, Karel Zak wrote:

All. It's important do it as backend process. Because libpq has very,
very limited and slow resources for work with backend stuff.

Agreed.

The base should be the standard backend with different "main loop" that
will instead socket checks some shared information about tables and
calls directly vacuum stuff. In this case you can omit work with
connections, parser etc.

So am I to understand I can start up a postmaster subprocess and then be
able to monitor the activity of all the databases? I guess that makes
sense since I would be talking to the stats collector directly via a
socket. But that doesn't solve the problem of issuing the vacuum to
different databases, I would still create a new backend for every
database that needs a vacuum or analyze issues.

Also, we don't want to launch multiple simultaneous vacuums so we want
the commands to be serialized (I know some people want to be able to do
this if databases are located on different disks, but for now i'm
keeping things simple). I have an idea for this that I just mentioned
in another message to the list.

I thought about it in last days and I found perfect Tom's idea about
FSM tables usage:

There has been lots of discusion of incorporating FSM data into the
auto_vacuum decision process. I am interested in exploring this, but
since I'm already biting off more than I can easily chew, I am going to
try and leave the decision making process the same for now.

BTW I think we need to use both tools (stats and FSM) since not all
tables will be in the FSM, an insert only table still needs to be
analyzed periodically and a lightly updated table will eventually need
to be vacuumed.

#18Matthew T. O'Connor
matthew@zeut.net
In reply to: Richard Huxton (#14)
Re: pg_autovacuum next steps

On Mon, 2004-03-22 at 07:25, Richard Huxton wrote:

On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote:

1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.

2.Use a config file. This would require some additional coding to add
the required parsing, but is possible.

3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.parsing code. My preference is option 3.

I've nothing against #3 as a default, but can I put in a suggestion for 1 & 3,
or rather some setting definable at runtime/build-time that lets you select
database + schema for autovacuum to find its config data.

I might be wrong, but it strikes me as the sort of thing people running shared
environments will want to choose for themselves.

If pg_autovacuum was being designed to live forever as a client app,
then I agree admins having a choice would be good. But as we are going
to eventually move any auto_vacuum data and settings into the system
tables (when autovacuum is part of the system), I don't see the need to
expend the extra cycles, especially since people seem to be pushing hard
for autovacuum to be a backend function sooner rather than later.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew T. O'Connor (#7)
Re: pg_autovacuum next steps

"Matthew T. O'Connor" <matthew@zeut.net> writes:

There are differing opinions as to the best way to providing these this
feature. The primary debate is where to save the configuration data. I
see three options:

1.Store config data inside a special pg_autovacuum table inside
existing databases that wants custom settings.

2.Use a config file. This would require some additional coding to add
the required parsing, but is possible.

3.Create a pg_autovacuum database inside any cluster that wants to
customize their settings.

Since many people do not like tools that clutter their databases by
adding tables, I think option 1 (adding a pg_autovacuum table to
existing databases) is right out. Using a config file would be Ok, but
would require additional parsing code. My preference is option 3.

Personally I like #2. The claim that this requires extra code seems
bogus to me --- when you are working at the C code level, reading a
table takes code too. I think both 1 and 3 represent clutter. Also,
they will introduce entirely new sets of problems that we already know
how to deal with in the case of config files, but have no experience
with for config tables. Some examples: how do you get the daemon to
notice that you changed the config (SIGHUP for config files, ??
otherwise)? Can you easily get at the config data from a daemon process
regardless of which database it's attached to (if any)? If the config
data is in a table, what sorts of interesting problems will arise from
trying to vacuum that table itself?

Basically I see config tables as cutting against the grain of everything
we've done to date in Postgres DBA-level configuration, and I do not see
a reason to justify that sort of reinvention.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#12)
Re: pg_autovacuum next steps

Gavin Sherry <swm@linuxworld.com.au> writes:

One point is this: vacuum() assumes that you are running in a fully
fledged backend. There'd be a fair bit of work involved in allowing a
single process to call vacuum() against multiple databases.

Make that "it isn't going to happen".

As such, I
think that a vacuum backend for a specific database should be forked upon
the first connect. Also, the backend might like to try and workout if
there are any active backends for its database every so often and if not,
perform a final vacuum (if necessary) and exit, so that we don't have lots
of idle processes sitting around.

Lots of idle processes sitting around is right out, too.  Remember that
each one would eat a backend connection slot.  I think we are going to
have to limit this to *one* process at a time.  What that probably means
is that we successively launch an autovacuum process against each
database, it does whatever seems appropriate in that database and then
quits.  We could manage this just like checkpoints are presently managed
--- the only thing the postmaster has to know is the desired idle period
between end of one autovacuum and start of the next.

regards, tom lane

#21Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#20)
#22Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#19)
#23Alex J. Avriette
alex@posixnap.net
In reply to: Matthew T. O'Connor (#1)
#24Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#20)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#24)
#26Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#25)
#27Matthew T. O'Connor
matthew@zeut.net
In reply to: Alex J. Avriette (#23)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew T. O'Connor (#26)
#29Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew T. O'Connor (#29)
#31Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#28)
#32Joe Conway
mail@joeconway.com
In reply to: Matthew T. O'Connor (#7)
#33Matthew T. O'Connor
matthew@zeut.net
In reply to: Joe Conway (#32)
#34Matthew T. O'Connor
matthew@zeut.net
In reply to: Tom Lane (#30)
#35Matthew T. O'Connor
matthew@zeut.net
In reply to: Gavin Sherry (#31)
#36Joe Conway
mail@joeconway.com
In reply to: Matthew T. O'Connor (#27)
#37Joe Conway
mail@joeconway.com
In reply to: Matthew T. O'Connor (#33)
#38Alex J. Avriette
alex@posixnap.net
In reply to: Matthew T. O'Connor (#27)