Second attempt, roll your own autovacuum
Hi all,
I am still trying to roll my own auto vacuum thingy. The goal is to
vacuum on demand in one step just like the old days, but not hit the
tables that never change (we have a lot). The idea now is to use a
combination of SQL and shell scripts to duplicate some of what auto
vacuum does. It actually doesn't seem that difficult. I have some SQL
that produces a list of tables that need vacuuming based on statistics
found in pg_stat_user_tables, and reltuples from pg_class, using the
same basic rules as auto vacuum per the documentation. So far so good.
The SQL actually produces an SQL script containing VACUUM commands,
which I can then feed back into psql. The result is a HUGE savings in
vacuum time at night.
The trouble now is, I don't see how to reset the statistics. My
assumption was that vacuum did it, but that appears to be false. How
does autovacuum do it? Can I do it with SQL?
-Glen
Glen Parker wrote:
The trouble now is, I don't see how to reset the statistics. My
assumption was that vacuum did it, but that appears to be false. How
does autovacuum do it? Can I do it with SQL?
Huh, reset what statistics? Autovacuum does not reset anything. What
statistics are you using? The number of dead tuples _should_ show as
zero on the stat system after a vacuum, certainly.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
Glen Parker wrote:
The trouble now is, I don't see how to reset the statistics. My
assumption was that vacuum did it, but that appears to be false. How
does autovacuum do it? Can I do it with SQL?Huh, reset what statistics? Autovacuum does not reset anything. What
statistics are you using? The number of dead tuples _should_ show as
zero on the stat system after a vacuum, certainly.
pg_stat_user_tables.[n_tup_ins|n_tup_upd|n_tup_del]. Hmm maybe I'm
doing this all wrong then. Is there a way to find the estimated dead
tuples from SQL, the same number autovacuum looks at?
-Glen
Glen Parker <glenebob@nwlink.com> writes:
I am still trying to roll my own auto vacuum thingy.
Um, is this purely for hack value? What is it that you find inadequate
about regular autovacuum? It is configurable through the pg_autovacuum
catalog --- which I'd be the first to agree is a sucky user interface,
but we're not going to set the user interface in concrete until we are
pretty confident it's feature-complete. So: what do you see missing?
regards, tom lane
On Tue, 2006-12-19 at 07:28, Tom Lane wrote:
Glen Parker <glenebob@nwlink.com> writes:
I am still trying to roll my own auto vacuum thingy.
Um, is this purely for hack value? What is it that you find inadequate
about regular autovacuum? It is configurable through the pg_autovacuum
catalog --- which I'd be the first to agree is a sucky user interface,
but we're not going to set the user interface in concrete until we are
pretty confident it's feature-complete. So: what do you see missing?
I'm not sure what the OP had in mind, but the thing which is missing for
us is a time window restriction sort of thing. What I mean is to make
sure a vacuum will never kick in in the main business hours, but only at
night at pre-specified hours, and only if the vacuum threshold was met
for the delete/update counts.
It would be nice if there could be a flexible time window specification,
like specifying only some days, or only weekends, or each night some
specific hours... but just one time window would be a big improvement
already.
Cheers,
Csaba.
Glen Parker wrote:
Alvaro Herrera wrote:
Glen Parker wrote:
The trouble now is, I don't see how to reset the statistics. My
assumption was that vacuum did it, but that appears to be false. How
does autovacuum do it? Can I do it with SQL?Huh, reset what statistics? Autovacuum does not reset anything. What
statistics are you using? The number of dead tuples _should_ show as
zero on the stat system after a vacuum, certainly.pg_stat_user_tables.[n_tup_ins|n_tup_upd|n_tup_del]. Hmm maybe I'm
doing this all wrong then. Is there a way to find the estimated dead
tuples from SQL, the same number autovacuum looks at?
Hmm, I thought the number of dead tuples was being exposed in
pg_stat_user_tables but evidently not. I think this is an oversight
which we could "fix" in 8.3. (For a current release I guess you could
install your own function, it shouldn't be too difficult to code it).
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
In an attempt to throw the authorities off his trail, tgl@sss.pgh.pa.us (Tom Lane) transmitted:
Glen Parker <glenebob@nwlink.com> writes:
I am still trying to roll my own auto vacuum thingy.
Um, is this purely for hack value? What is it that you find inadequate
about regular autovacuum? It is configurable through the pg_autovacuum
catalog --- which I'd be the first to agree is a sucky user interface,
but we're not going to set the user interface in concrete until we are
pretty confident it's feature-complete. So: what do you see missing?
I think that about a year ago I proposed a more sophisticated approach
to autovacuum; one part of it was to set up a "request queue," a table
where vacuum requests would get added.
There's some "producer" side stuff:
- There could be tables you want to vacuum exceedingly frequently;
those could get added periodically via something shaped like cron.
- One could ask for all the tables in a given database to be added to
the queue, so as to mean that all tables would get vacuumed every so
often.
- You might even inject requests 'quasi-manually', asking for the
queue to do work on particular tables.
There's some "policy side" stuff:
- Rules might be put in place to eliminate certain tables from the
queue, providing some intelligence as to what oughtn't get vacuumed
Then there's the "consumer":
- The obvious "dumb" approach is simply to have one connection that
runs through the queue, pulling the eldest entry, vacuuming, and
marking it done.
- The obvious extension is that if a table is listed multiple times in
the queue, it only need be processed once.
- There might be time-based exclusions to the effect that large tables
oughtn't be processed during certain periods (backup time?)
- One might have *two* consumers, one that will only process small
tables, so that those little, frequently updated tables can get
handled quickly, and another consumer that does larger tables.
Or perhaps that knows that it's fine, between 04:00 and 09:00 UTC,
to have 6 consumers, and blow through a lot of larger tables
simultaneously.
After all, changes in 8.2 mean that concurrent vacuums don't block
one another from cleaning out dead content.
I went as far as scripting up the simplest form of this, with
"injector" and queue and the "dumb consumer." Gave up because it
wasn't that much better than what we already had.
--
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/
Minds, like parachutes, only function when they are open.
From my POV, autovacuum is doing a very good job, with the exception of:
- There might be time-based exclusions to the effect that large tables
oughtn't be processed during certain periods (backup time?)
Either (per table!) exception or permission based control of when a
table can be vacuumed is needed to avoid vacuuming big tables during
peek business periods. While this can be alleviated by setting lower
vacuum cost settings, and it won't block anymore other vacuums, it will
still need the multiple vacuum stuff to still process small tables:
- One might have *two* consumers, one that will only process small
tables, so that those little, frequently updated tables can get
handled quickly, and another consumer that does larger tables.
Or perhaps that knows that it's fine, between 04:00 and 09:00 UTC,
to have 6 consumers, and blow through a lot of larger tables
simultaneously.
So one of the 2 might be enough. I guess time-based
exclusion/permissions are not that easy to implement, and also not easy
to set up properly... so what could work well is:
- allow a "priority" setting per table in pg_autovacuum;
- create a vacuum thread for each priority;
- each thread checks it's own tables to be processed based on the
priority setting from pg_autovacuum;
- there have to be a default priority for tables not explicitly set up
in pg_autovacuum;
- possibly set a per priority default vacuum cost and delay;
In 8.2 the different vacuum threads for the different priorities won't
step on each other toes, and the default settings for the priorities can
be used to create some easily manageable settings for vacuuming table
categories with different update/delete patterns.
There could be some preset priorities, but creating new ones would be
useful so the user can create one per table update/delete pattern.
Maybe priority is not the best word for this, but I can't think now on
other better...
Cheers,
Csaba.
Csaba Nagy wrote:
- One might have *two* consumers, one that will only process small
tables, so that those little, frequently updated tables can get
handled quickly, and another consumer that does larger tables.
Or perhaps that knows that it's fine, between 04:00 and 09:00 UTC,
to have 6 consumers, and blow through a lot of larger tables
simultaneously.So one of the 2 might be enough. I guess time-based
exclusion/permissions are not that easy to implement, and also not easy
to set up properly... so what could work well is:
Alternatively, perhaps a threshold so that a table is only considered
for vacuum if:
(table-size * overall-activity-in-last-hour) < threshold
Ideally you'd define your units appropriately so that you could just
define threshold in postgresql.conf as 30% (of peak activity in last 100
hours say).
--
Richard Huxton
Archonet Ltd
Alternatively, perhaps a threshold so that a table is only considered
for vacuum if:
(table-size * overall-activity-in-last-hour) < threshold
Ideally you'd define your units appropriately so that you could just
define threshold in postgresql.conf as 30% (of peak activity in last 100
hours say).
No, this is definitely not enough. The problem scenario is when
autovacuum starts vacuuming a huge table and that keeps it busy 10 hours
and in the meantime the small but frequently updated tables get awfully
bloated...
The only solution to that is to have multiple vacuums running in
parallel, and it would be really nice if those multiple vacuums would be
coordinated by autovacuum too...
Cheers,
Csaba.
Csaba Nagy wrote:
Alternatively, perhaps a threshold so that a table is only considered
for vacuum if:
(table-size * overall-activity-in-last-hour) < threshold
Ideally you'd define your units appropriately so that you could just
define threshold in postgresql.conf as 30% (of peak activity in last 100
hours say).No, this is definitely not enough. The problem scenario is when
autovacuum starts vacuuming a huge table and that keeps it busy 10 hours
and in the meantime the small but frequently updated tables get awfully
bloated...
Ah (lightbulb goes on)! I see what you mean now.
--
Richard Huxton
Archonet Ltd
Csaba Nagy wrote:
Alternatively, perhaps a threshold so that a table is only considered
for vacuum if:
(table-size * overall-activity-in-last-hour) < threshold
Ideally you'd define your units appropriately so that you could just
define threshold in postgresql.conf as 30% (of peak activity in last 100
hours say).No, this is definitely not enough. The problem scenario is when
autovacuum starts vacuuming a huge table and that keeps it busy 10 hours
and in the meantime the small but frequently updated tables get awfully
bloated...The only solution to that is to have multiple vacuums running in
parallel, and it would be really nice if those multiple vacuums would be
coordinated by autovacuum too...
Yes, I agree, having multiple "autovacuum workers" would be useful.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
Csaba Nagy wrote:
Alternatively, perhaps a threshold so that a table is only considered
for vacuum if:
(table-size * overall-activity-in-last-hour) < threshold
Ideally you'd define your units appropriately so that you could just
define threshold in postgresql.conf as 30% (of peak activity in last 100
hours say).No, this is definitely not enough. The problem scenario is when
autovacuum starts vacuuming a huge table and that keeps it busy 10 hours
and in the meantime the small but frequently updated tables get awfully
bloated...The only solution to that is to have multiple vacuums running in
parallel, and it would be really nice if those multiple vacuums would be
coordinated by autovacuum too...Yes, I agree, having multiple "autovacuum workers" would be useful.
Bruce, I think there are a couple of items here that might be worth
adding to the TODO list.
1) Allow multiple "autovacuum workers": Currently Autovacuum is only
capable of ordering one vacuum command at a time, for most work loads
this is sufficient but falls down when a hot (very actively updated
table) goes unvacuumed for a long period of time because a large table
is currently being worked on.
2) Once we can have multiple autovacuum workers: Create the concept of
hot tables that require more attention and should never be ignored for
more that X minutes, perhaps have one "autovacuum worker" per hot table?
(What do people think of this?)
3) Create "Maintenance Windows" for autovacuum: Currently autovacuum
makes all of it's decisions based on a single per-table threshold value,
maintenance windows would allow the setting of a per-window, per-table
threshold. This makes it possible to, for example, forbid (or strongly
discourage) autovacuum from doing maintenance work during normal
business hours either for the entire system or for specific tables.
None of those three items are on the todo list, however I think there is
general consensus that they (at least 1 & 3) are good ideas.
Yes, I think there are these TODO items. I was waiting to see what
additional replies there are before adding them.
---------------------------------------------------------------------------
Matthew O'Connor wrote:
Alvaro Herrera wrote:
Csaba Nagy wrote:
Alternatively, perhaps a threshold so that a table is only considered
for vacuum if:
(table-size * overall-activity-in-last-hour) < threshold
Ideally you'd define your units appropriately so that you could just
define threshold in postgresql.conf as 30% (of peak activity in last 100
hours say).No, this is definitely not enough. The problem scenario is when
autovacuum starts vacuuming a huge table and that keeps it busy 10 hours
and in the meantime the small but frequently updated tables get awfully
bloated...The only solution to that is to have multiple vacuums running in
parallel, and it would be really nice if those multiple vacuums would be
coordinated by autovacuum too...Yes, I agree, having multiple "autovacuum workers" would be useful.
Bruce, I think there are a couple of items here that might be worth
adding to the TODO list.1) Allow multiple "autovacuum workers": Currently Autovacuum is only
capable of ordering one vacuum command at a time, for most work loads
this is sufficient but falls down when a hot (very actively updated
table) goes unvacuumed for a long period of time because a large table
is currently being worked on.2) Once we can have multiple autovacuum workers: Create the concept of
hot tables that require more attention and should never be ignored for
more that X minutes, perhaps have one "autovacuum worker" per hot table?
(What do people think of this?)3) Create "Maintenance Windows" for autovacuum: Currently autovacuum
makes all of it's decisions based on a single per-table threshold value,
maintenance windows would allow the setting of a per-window, per-table
threshold. This makes it possible to, for example, forbid (or strongly
discourage) autovacuum from doing maintenance work during normal
business hours either for the entire system or for specific tables.None of those three items are on the todo list, however I think there is
general consensus that they (at least 1 & 3) are good ideas.---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
Yes, I think there are these TODO items. I was waiting to see what
additional replies there are before adding them.
Speaking of which, I was just looking at the TODO at:
http://www.postgresql.org/docs/faqs.TODO.html
and I think this item:
* Improve xid wraparound detection by recording per-table rather than
per-database
is done and working in 8.2 no?
matthew@zeut.net ("Matthew O'Connor") writes:
2) Once we can have multiple autovacuum workers: Create the concept of
hot tables that require more attention and should never be ignored for
more that X minutes, perhaps have one "autovacuum worker" per hot
table? (What do people think of this?)
One worker per "hot table" seems like overkill to me; you could chew
up a lot of connections that way, which could be a DOS.
That you have a "foot gun" is guaranteed; I think I'd rather that it
come in the form that choosing the "hot list" badly hurts the rate of
vacuuming than that we have a potential to chew up numbers of
connections (which is a relatively non-renewable resource).
--
(format nil "~S@~S" "cbbrowne" "cbbrowne.com")
http://linuxdatabases.info/info/
There are no "civil aviation for dummies" books out there and most of
you would probably be scared and spend a lot of your time looking up
if there was one. :-) -- Jordan Hubbard in c.u.b.f.m
matthew@zeut.net ("Matthew O'Connor") writes:
Bruce Momjian wrote:
Yes, I think there are these TODO items. I was waiting to see what
additional replies there are before adding them.Speaking of which, I was just looking at the TODO at:
http://www.postgresql.org/docs/faqs.TODO.html
and I think this item:
* Improve xid wraparound detection by recording per-table rather than
per-databaseis done and working in 8.2 no?
That's in the 8.2 release notes:
- Track maximum XID age within individual tables, instead of whole
databases (Alvaro)
This reduces the overhead involved in preventing transaction ID
wraparound, by avoiding unnecessary VACUUMs.
--
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/linux.html
The human race will decree from time to time: "There is something at
which it is absolutely forbidden to laugh."
-- Nietzche on Common Lisp
Tom Lane wrote:
Glen Parker <glenebob@nwlink.com> writes:
I am still trying to roll my own auto vacuum thingy.
Um, is this purely for hack value?
Don't be silly ;-)
Honestly I sort of thought the problem was fairly obvious.
What is it that you find inadequate
about regular autovacuum? It is configurable through the pg_autovacuum
catalog --- which I'd be the first to agree is a sucky user interface,
but we're not going to set the user interface in concrete until we are
pretty confident it's feature-complete. So: what do you see missing?
Traditional vacuum does every table in the DB, which is absolutely The
Wrong Thing for us. Vacuum can be fired against individual tables, but
then how do I know which tables need it? Autovacuum is smart about
which tables it hits, but exceedingly stupid about *when* it hits them.
What I want is a way to do all needed vacuuming, in as short a time span
as possible, when I decide it should be done. For us, that's between ~2
AM and ~3 AM each morning. If a vacuum runs past 3 AM, so be it, but
it's better to hit it hard and try to be done by 3 AM than it is to
lolly gag around about it unil 5 AM.
The obvious answer for me is to vacuum all the tables that autovacuum
would hit, but only on demand. Something like "VACUUM CONDITIONAL WHERE
autovacuum_says_so()" :-)
-Glen
Glen Parker wrote:
Tom Lane wrote:
What is it that you find inadequate
about regular autovacuum? It is configurable through the pg_autovacuum
catalog --- which I'd be the first to agree is a sucky user interface,
but we're not going to set the user interface in concrete until we are
pretty confident it's feature-complete. So: what do you see missing?Traditional vacuum does every table in the DB, which is absolutely The
Wrong Thing for us. Vacuum can be fired against individual tables, but
then how do I know which tables need it? Autovacuum is smart about
which tables it hits, but exceedingly stupid about *when* it hits them.What I want is a way to do all needed vacuuming, in as short a time span
as possible, when I decide it should be done. For us, that's between ~2
AM and ~3 AM each morning. If a vacuum runs past 3 AM, so be it, but
it's better to hit it hard and try to be done by 3 AM than it is to
lolly gag around about it unil 5 AM.The obvious answer for me is to vacuum all the tables that autovacuum
would hit, but only on demand. Something like "VACUUM CONDITIONAL WHERE
autovacuum_says_so()" :-)
I believe the correct answer to this problems is to write a cron script
that enables autovacuum at 2AM and disables it at 3AM. I think there is
some talk of this in the archives somewhere.
If you need to hit specific tables more often than that, then you can
have another cron script that vacuums a table ever hour or something or
something along those lines.
Chris Browne wrote:
matthew@zeut.net ("Matthew O'Connor") writes:
Bruce Momjian wrote:
Yes, I think there are these TODO items. I was waiting to see what
additional replies there are before adding them.Speaking of which, I was just looking at the TODO at:
http://www.postgresql.org/docs/faqs.TODO.html
and I think this item:
* Improve xid wraparound detection by recording per-table rather than
per-databaseis done and working in 8.2 no?
That's in the 8.2 release notes:
- Track maximum XID age within individual tables, instead of whole
databases (Alvaro)This reduces the overhead involved in preventing transaction ID
wraparound, by avoiding unnecessary VACUUMs.
Yeah, this is what the TODO item was about, so it certainly is done.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
1) Allow multiple "autovacuum workers": Currently Autovacuum is only
capable of ordering one vacuum command at a time, for most work loads
this is sufficient but falls down when a hot (very actively updated
table) goes unvacuumed for a long period of time because a large table
is currently being worked on.2) Once we can have multiple autovacuum workers: Create the concept of
hot tables that require more attention and should never be ignored for
more that X minutes, perhaps have one "autovacuum worker" per hot table?
(What do people think of this?)3) Create "Maintenance Windows" for autovacuum: Currently autovacuum
makes all of it's decisions based on a single per-table threshold value,
maintenance windows would allow the setting of a per-window, per-table
threshold. This makes it possible to, for example, forbid (or strongly
discourage) autovacuum from doing maintenance work during normal
business hours either for the entire system or for specific tables.None of those three items are on the todo list, however I think there is
general consensus that they (at least 1 & 3) are good ideas.
If it isn't there somewhere already, I would ask to add:
4) Expose all information used by autovacuum to form its decisions.
5) Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.
-Glen
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
4) Expose all information used by autovacuum to form its decisions.
You could argue that this is already there, although not easy to get at
I suppose. But all table threshold settings are available either in the
pg_autovacuum relation or the defaults via GUC variables, that plus a
little math will get the information autovacuum uses to form its decisions.
5) Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.
This might be a nice feature however in the presence of the much talked
about but not yet developed maintenance window concept, I'm not sure how
this should work. That is, during business hours the table doesn't
need vacuuming, but it will when the evening maintenance window opens up.
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
4) Expose all information used by autovacuum to form its decisions.
You could argue that this is already there, although not easy to get at
I suppose. But all table threshold settings are available either in the
pg_autovacuum relation or the defaults via GUC variables, that plus a
little math will get the information autovacuum uses to form its decisions.
No, we currently don't expose the number of dead tuples which autovacuum
uses.
5) Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.This might be a nice feature however in the presence of the much talked
about but not yet developed maintenance window concept, I'm not sure how
this should work. That is, during business hours the table doesn't
need vacuuming, but it will when the evening maintenance window opens up.
I intend to work on the maintenance window idea for 8.3. I'm not sure
if I'll be able to introduce the worker process stuff in there as well.
I actually haven't done much design on the stuff so I can't say.
Now, if you (Matthew, or Glen as well!) were to work on that it'll be
appreciated ;-) and we could team up.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.This might be a nice feature however in the presence of the much
talked about but not yet developed maintenance window concept, I'm not
sure how this should work. That is, during business hours the table
doesn't need vacuuming, but it will when the evening maintenance
window opens up.
Well, what he's saying is, "Not taking into account any time/maintenance
windows, does this table need vacuuming?"
--
erik jones <erik@myemma.com>
software development
emma(r)
Alvaro Herrera wrote:
Chris Browne wrote:
matthew@zeut.net ("Matthew O'Connor") writes:
Bruce Momjian wrote:
Yes, I think there are these TODO items. I was waiting to see what
additional replies there are before adding them.Speaking of which, I was just looking at the TODO at:
http://www.postgresql.org/docs/faqs.TODO.html
and I think this item:
* Improve xid wraparound detection by recording per-table rather than
per-databaseis done and working in 8.2 no?
That's in the 8.2 release notes:
- Track maximum XID age within individual tables, instead of whole
databases (Alvaro)This reduces the overhead involved in preventing transaction ID
wraparound, by avoiding unnecessary VACUUMs.Yeah, this is what the TODO item was about, so it certainly is done.
OK, item removed. Thanks.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Erik Jones wrote:
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.This might be a nice feature however in the presence of the much
talked about but not yet developed maintenance window concept, I'm not
sure how this should work. That is, during business hours the table
doesn't need vacuuming, but it will when the evening maintenance
window opens up.Well, what he's saying is, "Not taking into account any time/maintenance
windows, does this table need vacuuming?"
Correct. IOW, "does it need it?", not "would you actually do it at this
time?"...
-Glen
Alvaro Herrera wrote:
4) Expose all information used by autovacuum to form its decisions.
You could argue that this is already there, although not easy to get at
I suppose. But all table threshold settings are available either in the
pg_autovacuum relation or the defaults via GUC variables, that plus a
little math will get the information autovacuum uses to form its decisions.No, we currently don't expose the number of dead tuples which autovacuum
uses.
I'd prefer to get this working somehow before 8.3. In the mean time, is
this information available at all? I assume a c function could get it,
right? Any easier way?
-Glen
Glen Parker wrote:
Alvaro Herrera wrote:
4) Expose all information used by autovacuum to form its decisions.
You could argue that this is already there, although not easy to get at
I suppose. But all table threshold settings are available either in the
pg_autovacuum relation or the defaults via GUC variables, that plus a
little math will get the information autovacuum uses to form its
decisions.No, we currently don't expose the number of dead tuples which autovacuum
uses.I'd prefer to get this working somehow before 8.3. In the mean time, is
this information available at all? I assume a c function could get it,
right? Any easier way?
A C function would do. I don't think anything else would because we
don't expose it at the SQL level.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote:
Matthew O'Connor wrote:
Glen Parker wrote:
5) Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.This might be a nice feature however in the presence of the much talked
about but not yet developed maintenance window concept, I'm not sure how
this should work. That is, during business hours the table doesn't
need vacuuming, but it will when the evening maintenance window opens up.I intend to work on the maintenance window idea for 8.3. I'm not sure
if I'll be able to introduce the worker process stuff in there as well.
I actually haven't done much design on the stuff so I can't say.Now, if you (Matthew, or Glen as well!) were to work on that it'll be
appreciated ;-) and we could team up.
I would like to get back into working on autovacuum, outside of
discussions on the lists I haven't done anything since you took it from
contrib. Anyway I am interested in helping if I can find the time but
there is no chance of that happening in the next few weeks, but maybe in
January.
I think another thing to consider are good default values for the
autovacuum vacuum delay settings. We talked about this a while ago, but
I don't think we ever settled on anything.
Glen Parker wrote:
Erik Jones wrote:
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring
any time constraints that may be in place.This might be a nice feature however in the presence of the much
talked about but not yet developed maintenance window concept, I'm
not sure how this should work. That is, during business hours the
table doesn't need vacuuming, but it will when the evening
maintenance window opens up.Well, what he's saying is, "Not taking into account any
time/maintenance windows, does this table need vacuuming?"Correct. IOW, "does it need it?", not "would you actually do it at this
time?"...
I understand that, but it's a subjective question. The only question
autovacuum answers is "Am I going to vacuum this table now?", so in the
current setup you probably could create a function that answers your
question, I was just pointing out in the future when maintenance windows
get implemented that this question becomes less clear.
You're saying that the dirtyness of a table is proportional to when you
plan on vacuuming it next. I don't see that connection at all. The
only correlation I might see is if it happens to be 5:59 AM when your DB
decides your table is dirty, and your maintenance window closes at 6:00
AM. Then you have to program the maintenance window to gracefully
unplug the vacuum.
Currently, autovacuum runs every minute and checks to see if any tables
meet the requirements for vacuuming. Are the requirements the amount of
time a vacuum would take, or the raw number of dirty tuples? One might
be a function of the other, for sure, but exactly what does the
autovacuumer use to decide when to clean?
--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Matthew
O'Connor
Sent: Tuesday, December 19, 2006 5:37 PM
To: Glen Parker
Cc: Postgres general mailing list
Subject: Re: [GENERAL] Autovacuum Improvements
Glen Parker wrote:
Erik Jones wrote:
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring
any time constraints that may be in place.This might be a nice feature however in the presence of the much
talked about but not yet developed maintenance window concept, I'm
not sure how this should work. That is, during business hours the
table doesn't need vacuuming, but it will when the evening
maintenance window opens up.Well, what he's saying is, "Not taking into account any
time/maintenance windows, does this table need vacuuming?"Correct. IOW, "does it need it?", not "would you actually do it at
this
time?"...
I understand that, but it's a subjective question. The only question
autovacuum answers is "Am I going to vacuum this table now?", so in the
current setup you probably could create a function that answers your
question, I was just pointing out in the future when maintenance windows
get implemented that this question becomes less clear.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
No, how dirty a table isn't subjective, what is subjective is the
question "Does it need to be vacuumed?". A that is 1% dirty (to use
your term) probably doesn't *need* to be vacuumed, but you might choose
to vacuum it anyway at least you might at night when the system isn't in
use.
See the docs for the current requirements for autovacuum to issue a
vacuum command.
Matt
Brandon Aiken wrote:
Show quoted text
You're saying that the dirtyness of a table is proportional to when you
plan on vacuuming it next. I don't see that connection at all. The
only correlation I might see is if it happens to be 5:59 AM when your DB
decides your table is dirty, and your maintenance window closes at 6:00
AM. Then you have to program the maintenance window to gracefully
unplug the vacuum.Currently, autovacuum runs every minute and checks to see if any tables
meet the requirements for vacuuming. Are the requirements the amount of
time a vacuum would take, or the raw number of dirty tuples? One might
be a function of the other, for sure, but exactly what does the
autovacuumer use to decide when to clean?--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Matthew
O'Connor
Sent: Tuesday, December 19, 2006 5:37 PM
To: Glen Parker
Cc: Postgres general mailing list
Subject: Re: [GENERAL] Autovacuum ImprovementsGlen Parker wrote:
Erik Jones wrote:
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring
any time constraints that may be in place.This might be a nice feature however in the presence of the much
talked about but not yet developed maintenance window concept, I'm
not sure how this should work. That is, during business hours the
table doesn't need vacuuming, but it will when the evening
maintenance window opens up.Well, what he's saying is, "Not taking into account any
time/maintenance windows, does this table need vacuuming?"Correct. IOW, "does it need it?", not "would you actually do it at
this
time?"...
I understand that, but it's a subjective question. The only question
autovacuum answers is "Am I going to vacuum this table now?", so in the
current setup you probably could create a function that answers your
question, I was just pointing out in the future when maintenance windowsget implemented that this question becomes less clear.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Matthew O'Connor wrote:
No, how dirty a table isn't subjective, what is subjective is the
question "Does it need to be vacuumed?". A that is 1% dirty (to use
your term) probably doesn't *need* to be vacuumed, but you might choose
to vacuum it anyway at least you might at night when the system isn't in
use.
This leads me further from wanting to see a simple time contraint added.
I'd like to see something more dynamic.
Perhaps define a "dirtiness" rating, and then allow a minimum
"dirtiness" to be configured. When autovacuum wakes up, it could build
a list of sufficiently dirty tables sorted in "dirtiness" order, and
could call an optional user defined function for each one, passing it
useful bits of information including each table's "dirtiness". The
function could then decide whether to vacuum or not based on whatever
constraints the admin dreamed up.
It would then be a simple matter to expose a function that, given a
table's OID, could report its "dirtiness" level.
I think that explanation leaves room for refinement, but hopefully the
idea makes sense :-)
-Glen
Glen Parker wrote:
Matthew O'Connor wrote:
No, how dirty a table isn't subjective, what is subjective is the
question "Does it need to be vacuumed?". A that is 1% dirty (to use
your term) probably doesn't *need* to be vacuumed, but you might
choose to vacuum it anyway at least you might at night when the system
isn't in use.This leads me further from wanting to see a simple time contraint added.
I'd like to see something more dynamic.Perhaps define a "dirtiness" rating, and then allow a minimum
"dirtiness" to be configured. When autovacuum wakes up, it could build
a list of sufficiently dirty tables sorted in "dirtiness" order, and
could call an optional user defined function for each one, passing it
useful bits of information including each table's "dirtiness". The
function could then decide whether to vacuum or not based on whatever
constraints the admin dreamed up.It would then be a simple matter to expose a function that, given a
table's OID, could report its "dirtiness" level.
The idea that has been discussed in the past is the concept of
maintenance windows, that is for any given period of time, you can set
different vacuum thresholds. So at night you might make the thresholds
very low so that nearly everything gets vacuumed but during the day you
might only vacuum when something really needs it. This accomplishes
what you are asking for in a more general way that can accommodate a
wide variety of usage patterns.
Matthew O'Connor wrote:
Glen Parker wrote:
Matthew O'Connor wrote:
No, how dirty a table isn't subjective, what is subjective is the
question "Does it need to be vacuumed?". A that is 1% dirty (to use
your term) probably doesn't *need* to be vacuumed, but you might
choose to vacuum it anyway at least you might at night when the system
isn't in use.This leads me further from wanting to see a simple time contraint added.
I'd like to see something more dynamic.Perhaps define a "dirtiness" rating, and then allow a minimum
"dirtiness" to be configured. When autovacuum wakes up, it could build
a list of sufficiently dirty tables sorted in "dirtiness" order, and
could call an optional user defined function for each one, passing it
useful bits of information including each table's "dirtiness". The
function could then decide whether to vacuum or not based on whatever
constraints the admin dreamed up.It would then be a simple matter to expose a function that, given a
table's OID, could report its "dirtiness" level.The idea that has been discussed in the past is the concept of
maintenance windows, that is for any given period of time, you can set
different vacuum thresholds. So at night you might make the thresholds
very low so that nearly everything gets vacuumed but during the day you
might only vacuum when something really needs it. This accomplishes
what you are asking for in a more general way that can accommodate a
wide variety of usage patterns.
I wonder if the simple solution is to just have a cron script modify
postgresql.conf and pg_ctl reload. That seems very flexible, or have
two postgresql.conf files and move them into place via cron.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
I wonder if the simple solution is to just have a cron script modify
postgresql.conf and pg_ctl reload. That seems very flexible, or have
two postgresql.conf files and move them into place via cron.
I'd still prefer to vacuum on demand actually. Rather than hope that
autovacuum hit all the nastiest tables, I'd like to be able to record
the fact that tables (x,y,z) were vacuumed and how long it took. I want
the logic autovacuum uses to determine if a table needs vacuuming, but
I'd rather do the actual vacuuming myself.
I'd also like to use some of this information to issue reindex and
cluster commands only when they're needed. In fact, on days when I
cluster, there's no need whatsoever to also vacuum those tables. This
is something that autovacuum won't do at all.
If the best I got was access to the same information autovacuum uses to
make its decisions, I'd be pretty happy.
-Glen
The idea that has been discussed in the past is the concept of
maintenance windows, that is for any given period of time, you can set
different vacuum thresholds. So at night you might make the thresholds
very low so that nearly everything gets vacuumed but during the day you
might only vacuum when something really needs it. This accomplishes
what you are asking for in a more general way that can accommodate a
wide variety of usage patterns.
That really seems like something that, if it's powerful, would also be
very complicated. If the autovacuum system could just call a user
defined function, all the complexity could be dropped back into the
admin's lap (which is fine with me :-).
I am of course ASSuming that it would be simple for autovacuum to call a
function if it exists, or take a default action if it does not...
-Glen
Glen Parker wrote:
I'd also like to use some of this information to issue reindex and
cluster commands only when they're needed. In fact, on days when I
cluster, there's no need whatsoever to also vacuum those tables. This
is something that autovacuum won't do at all.
Well, I'd rather fix CLUSTER so that it reports that there are currently
no dead tuples in the table. This should be an easy patch that, hey,
maybe you could contribute ;-)
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Glen Parker wrote:
The idea that has been discussed in the past is the concept of
maintenance windows, that is for any given period of time, you can set
different vacuum thresholds. So at night you might make the thresholds
very low so that nearly everything gets vacuumed but during the day you
might only vacuum when something really needs it. This accomplishes
what you are asking for in a more general way that can accommodate a
wide variety of usage patterns.That really seems like something that, if it's powerful, would also be
very complicated. If the autovacuum system could just call a user
defined function, all the complexity could be dropped back into the
admin's lap (which is fine with me :-).
I have a quote by Larry Wall about something similar:
"In fact, the basic problem with Perl 5's subroutines is that they're
not crufty enough, so the cruft leaks out into user-defined code
instead, by the Conservation of Cruft Principle."
(Larry Wall, Apocalypse 6)
With the system described above, you can have it very simple by just not
configuring anything. Or you can have a very complex scenario involving
holidays and weekends and off-hours and "the two hours of the month when
you do all the nasty stuff" by doing a very elaborate and complicated
setup. Or you could have a middle ground just defining "off hours"
(weekends and nights) which would be just a couple of commands.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Glen Parker wrote:
I wonder if the simple solution is to just have a cron script modify
postgresql.conf and pg_ctl reload. That seems very flexible, or have
two postgresql.conf files and move them into place via cron.I'd still prefer to vacuum on demand actually. Rather than hope that
autovacuum hit all the nastiest tables, I'd like to be able to record
the fact that tables (x,y,z) were vacuumed and how long it took. I want
the logic autovacuum uses to determine if a table needs vacuuming, but
I'd rather do the actual vacuuming myself.I'd also like to use some of this information to issue reindex and
cluster commands only when they're needed. In fact, on days when I
cluster, there's no need whatsoever to also vacuum those tables. This
is something that autovacuum won't do at all.If the best I got was access to the same information autovacuum uses to
make its decisions, I'd be pretty happy
Well take a look at the autovacuum code, also, before autovacuum was
integrated into core it existed as a libpq based contrib application so
you can look in one of the older branches for that code.
Alvaro Herrera wrote:
Glen Parker wrote:
That really seems like something that, if it's powerful, would also be
very complicated. If the autovacuum system could just call a user
defined function, all the complexity could be dropped back into the
admin's lap (which is fine with me :-).I have a quote by Larry Wall about something similar:
"In fact, the basic problem with Perl 5's subroutines is that they're
not crufty enough, so the cruft leaks out into user-defined code
instead, by the Conservation of Cruft Principle."
(Larry Wall, Apocalypse 6)With the system described above, you can have it very simple by just not
configuring anything. Or you can have a very complex scenario involving
holidays and weekends and off-hours and "the two hours of the month when
you do all the nasty stuff" by doing a very elaborate and complicated
setup. Or you could have a middle ground just defining "off hours"
(weekends and nights) which would be just a couple of commands.
I would go one step further and suggest that when the maintenance window
system gets completed we give it a default setup of midnight to 6AM or
something like that.
Alvaro Herrera wrote:
Glen Parker wrote:
I'd also like to use some of this information to issue reindex and
cluster commands only when they're needed. In fact, on days when I
cluster, there's no need whatsoever to also vacuum those tables. This
is something that autovacuum won't do at all.Well, I'd rather fix CLUSTER so that it reports that there are currently
no dead tuples in the table. This should be an easy patch that, hey,
maybe you could contribute ;-)
I knew that would come up eventually :D. Believe me, I'd love to dig
in, but I haven't had a postgres dev tree on my system in years and I
don't have any time for it right now. (But there's always time to
banter in the mailing list, right? :-) )
Actually what I meant was that autovacuum will always just vacuum. In
some cases I'd rather it cluster instead. But on that note, it would
also be madly useful to give CLUSTER the ability to ANALYZE as well.
-Glen
Brandon Aiken wrote:
You're saying that the dirtyness of a table is proportional to when you
plan on vacuuming it next.
The dirtiness of a table should most certainly have an effect on when it
gets vacuumed in relation to other tables. If dirtiness could be rated,
then the list of vacuumable tables could be sorted, vacuuming really
dirty tables before less dirty ones.
Now, if I could get my hands on that rating for any given table, then I
could write a night time script that would vacuum the dirtiest tables,
in order, until either I run out of dirty tables, or I run out of time.
In fact, if autovacuum did just that, then I might be inclined to attack
the problem with the "update postgresql.conf, pgctl" approach. At least
then I'd know that even though ALL the dirty tables might not get
cleaned every night, at least the worst ones would.
-Glen
Glen Parker wrote:
Alvaro Herrera wrote:
Glen Parker wrote:
I'd also like to use some of this information to issue reindex and
cluster commands only when they're needed. In fact, on days when I
cluster, there's no need whatsoever to also vacuum those tables. This
is something that autovacuum won't do at all.Well, I'd rather fix CLUSTER so that it reports that there are currently
no dead tuples in the table. This should be an easy patch that, hey,
maybe you could contribute ;-)I knew that would come up eventually :D. Believe me, I'd love to dig
in, but I haven't had a postgres dev tree on my system in years and I
don't have any time for it right now. (But there's always time to
banter in the mailing list, right? :-) )
Heh :-)
Actually what I meant was that autovacuum will always just vacuum. In
some cases I'd rather it cluster instead. But on that note, it would
also be madly useful to give CLUSTER the ability to ANALYZE as well.
Well, the problem is that CLUSTER takes an exclusive lock. That's a
no-no for an automatic job I think.
On doing an ANALYZE in conjunction with the CLUSTER, that's would be
certainly interesting, but it'd require that we refactor ANALYZE so as
to be able to "reuse" another scan. It would be nice to do it anyway
and allow it to reuse VACUUM's scan as well, so that it doesn't have to
do a separate one. (And it would also be nicer because it'd would be
able to determine n_distinct more accurately, which is something people
often wish for.)
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On 19/12/06, Chris Browne <cbbrowne@acm.org> wrote:
matthew@zeut.net ("Matthew O'Connor") writes:
2) Once we can have multiple autovacuum workers: Create the concept of
hot tables that require more attention and should never be ignored for
more that X minutes, perhaps have one "autovacuum worker" per hot
table? (What do people think of this?)One worker per "hot table" seems like overkill to me; you could chew
up a lot of connections that way, which could be a DOS.
Sounds like a max workers config varible would work quite well here.
Bit like the max connections varible. If we run out of workers we just
have to wait for one to finish. I think we need one daemon to analyse
what needs vacuuming and then lauch workers to do the actual work..
Peter Childs
Show quoted text
That you have a "foot gun" is guaranteed; I think I'd rather that it
come in the form that choosing the "hot list" badly hurts the rate of
vacuuming than that we have a potential to chew up numbers of
connections (which is a relatively non-renewable resource).
--
(format nil "~S@~S" "cbbrowne" "cbbrowne.com")
http://linuxdatabases.info/info/
There are no "civil aviation for dummies" books out there and most of
you would probably be scared and spend a lot of your time looking up
if there was one. :-) -- Jordan Hubbard in c.u.b.f.m---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Glen Parker wrote:
Brandon Aiken wrote:
You're saying that the dirtyness of a table is proportional to when you
plan on vacuuming it next.The dirtiness of a table should most certainly have an effect on when it
gets vacuumed in relation to other tables. If dirtiness could be rated,
then the list of vacuumable tables could be sorted, vacuuming really
dirty tables before less dirty ones.
Wouldn't it be better to prefer small(er) less dirty tables over large
to huge dirty tables? They'd be done quickly, and as it's inside the
"maintenance window" there is apparently time to. It'd probably need
some logic as to how many of those small tables get vacuumed first (the
total amount should be significantly less than the big table(s)).
My rationale: if the really dirty huge table doesn't get finished in
time anyway, autovacuum might as well spend some time on other tables.
The net performance after the maintenance window will probably be better.
--
Alban Hertroys
alban@magproductions.nl
magproductions b.v.
T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede
// Integrate Your World //
Alvaro Herrera wrote:
Matthew O'Connor wrote:
Glen Parker wrote:
If it isn't there somewhere already, I would ask to add:
4) Expose all information used by autovacuum to form its decisions.
You could argue that this is already there, although not easy to get at
I suppose. But all table threshold settings are available either in the
pg_autovacuum relation or the defaults via GUC variables, that plus a
little math will get the information autovacuum uses to form its decisions.No, we currently don't expose the number of dead tuples which autovacuum
uses.5) Expose a very easy way to discover autovacuum's opinion about a
particular table, for example "table_needs_vacuum(oid)", ignoring any
time constraints that may be in place.This might be a nice feature however in the presence of the much talked
about but not yet developed maintenance window concept, I'm not sure how
this should work. That is, during business hours the table doesn't
need vacuuming, but it will when the evening maintenance window opens up.I intend to work on the maintenance window idea for 8.3. I'm not sure
if I'll be able to introduce the worker process stuff in there as well.
I actually haven't done much design on the stuff so I can't say.
What does a maintenance window mean? I am slightly fearful that it as
other improvements to vacuum are made, it will change it's meaning.
There has been discussion about a bitmap of dirty pages in a relation
for vacuum to clean. Do that effect what maintenance means? eg. Does
maintenance mean I can only scan the whole relation for XID wrap in
maintenance mode and not during non-maintenance time. Does it mean we
don't vacuum at all in non-maintenance mode. Or do we just have a
different set of thresholds during maintenance.
Further to this was a patch a long time ago for partial vacuum, which
only vacuumed part of the relation. It was rejected on grounds of not
helping as the index cleanup is the expensive part. My view on this is,
if with a very large table you should be able to vacuum until you fill
your maintenance work mem. You are then forced to process indexes.
Then that is a good time to stop vacuuming. You would have to start the
process again effectively. This may also change the meaning of
maintenance window. Again, only full relation scans in maintenance
times, possibly something else entirely.
I am unsure of what the long term goal of the maintenance window is. I
understand it's to produce a time when vacuum is able to be more
aggressive on the system. But I do not know what that means in light of
other improvements such as those listed above. Coming up with a method
for maintenance window that just used a separate set of thresholds is
one option. However is that the best thing to do. Some clarity here
from others would probably help. But I also think we need to consider
the big picture of where vacuum is going before inventing a mechanism
that may not mean anything come 8.4
Now, if you (Matthew, or Glen as well!) were to work on that it'll be
appreciated ;-) and we could team up.
I am happy to try and put in some design thought and discussion with
others to come up with something that will work well.
Regards
Russell Smith
Alvaro Herrera wrote:
No, we currently don't expose the number of dead tuples which autovacuum
uses.
Patch submitted :-)
Russell Smith wrote:
Alvaro Herrera wrote:
I intend to work on the maintenance window idea for 8.3. I'm not sure
if I'll be able to introduce the worker process stuff in there as well.
I actually haven't done much design on the stuff so I can't say.What does a maintenance window mean? I am slightly fearful that it as
other improvements to vacuum are made, it will change it's meaning.
The maintenance window design as I understand it (Alvaro chime in if I
get this wrong) is that we will be able to specify blocks of time that
are assigned specific autovacuum settings. For example we might define
a maintenance window of Sunday morning from 1AM - 8AM, during that time
all autvacuum thresholds will be dropped to .01, that way everything
will get vacuumed that needs it during that window. Outside of the
window default autovacuum settings apply.
There has been discussion about a bitmap of dirty pages in a relation
for vacuum to clean. Do that effect what maintenance means? eg.
Does maintenance mean I can only scan the whole relation for XID wrap
in maintenance mode and not during non-maintenance time. Does it mean
we don't vacuum at all in non-maintenance mode. Or do we just have a
different set of thresholds during maintenance.
Different thresholds during maintenance window.
Further to this was a patch a long time ago for partial vacuum, which
only vacuumed part of the relation. It was rejected on grounds of not
helping as the index cleanup is the expensive part. My view on this
is, if with a very large table you should be able to vacuum until you
fill your maintenance work mem. You are then forced to process
indexes. Then that is a good time to stop vacuuming. You would have
to start the process again effectively. This may also change the
meaning of maintenance window. Again, only full relation scans in
maintenance times, possibly something else entirely.
I'm not sure how partial vacuums will effect autovacuum, but I'm not
going to worry about it until it gets accepted which I don't think is
going to happen anytime soon.
BTW, when a vacuum starts during a maintenance window but doesn't finish
before the window closes, I think it should continue running, however I
believe the default vacuum delay setting will apply which could be setup
to help reduce the impact that vacuum has outside the maintenance window.
I am unsure of what the long term goal of the maintenance window is.
I understand it's to produce a time when vacuum is able to be more
aggressive on the system. But I do not know what that means in light
of other improvements such as those listed above. Coming up with a
method for maintenance window that just used a separate set of
thresholds is one option. However is that the best thing to do. Some
clarity here from others would probably help. But I also think we
need to consider the big picture of where vacuum is going before
inventing a mechanism that may not mean anything come 8.4
I think for now all we are talking about are different thresholds. As
newer vacuuming options become available we should consider how they
apply, but we aren't there yet.
On Thu, 2006-12-21 at 18:03, Matthew T. O'Connor wrote:
The maintenance window design as I understand it (Alvaro chime in if I
get this wrong) is that we will be able to specify blocks of time that
are assigned specific autovacuum settings. For example we might define
a maintenance window of Sunday morning from 1AM - 8AM, during that time
all autvacuum thresholds will be dropped to .01, that way everything
will get vacuumed that needs it during that window. Outside of the
window default autovacuum settings apply.
Changing thresholds is not a viable solution for all the cases. If I
have a huge table with many indexes, I still don't want to vacuum it
unless there are a significant amount of dead pages so that the
sequential scan of it and it's indexes pays off. In this case dropping
the autovacuum threshold would be totally counterproductive even at
night. This solution would only rule out really static tables, which
don't change almost at all. In real life there are many more possible
data access scenarios...
From all the discussion here I think the most benefit would result from
a means to assign tables to different categories, and set up separate
autovacuum rules per category (be it time window when vacuuming is
allowed, autovacuum processes assigned, cost settings, etc). I doubt you
can really define upfront all the vacuum strategies you would need in
real life, so why not let the user define it ? Define the categories by
assigning tables to them, and the rules per category. Then you can
decide what rules to implement, and what should be the defaults...
Cheers,
Csaba.
I intend to work on the maintenance window idea for 8.3. I'm not sure
if I'll be able to introduce the worker process stuff in there as well.
I actually haven't done much design on the stuff so I can't say.
Something to consider, per-day maintenance windows, where Sat & Sun could
have 24-hour window, and trying to vacuum largest tables during longest
windows. This wouldn't work for every server, but for many...
--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice
Csaba Nagy wrote:
On Thu, 2006-12-21 at 18:03, Matthew T. O'Connor wrote:
The maintenance window design as I understand it (Alvaro chime in if I
get this wrong) is that we will be able to specify blocks of time that
are assigned specific autovacuum settings. For example we might define
a maintenance window of Sunday morning from 1AM - 8AM, during that time
all autvacuum thresholds will be dropped to .01, that way everything
will get vacuumed that needs it during that window. Outside of the
window default autovacuum settings apply.Changing thresholds is not a viable solution for all the cases.
My idea was to change autovacuum parameters in general, for example the
"enable" parameter. And the idea is to be able to do it per-table
(or rather, per table group), so you can group all your mostly-static
tables in a group and then say "I don't want this group to be vacuumed".
From all the discussion here I think the most benefit would result from
a means to assign tables to different categories, and set up separate
autovacuum rules per category (be it time window when vacuuming is
allowed, autovacuum processes assigned, cost settings, etc). I doubt you
can really define upfront all the vacuum strategies you would need in
real life, so why not let the user define it ? Define the categories by
assigning tables to them, and the rules per category. Then you can
decide what rules to implement, and what should be the defaults...
Hmm, yeah, I think this is more or less what I have in mind.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, 2006-12-21 at 18:41, Alvaro Herrera wrote:
From all the discussion here I think the most benefit would result from
a means to assign tables to different categories, and set up separate
autovacuum rules per category (be it time window when vacuuming is
allowed, autovacuum processes assigned, cost settings, etc). I doubt you
can really define upfront all the vacuum strategies you would need in
real life, so why not let the user define it ? Define the categories by
assigning tables to them, and the rules per category. Then you can
decide what rules to implement, and what should be the defaults...Hmm, yeah, I think this is more or less what I have in mind.
Cool :-)
Can I suggest to also consider the idea of some kind of autovacuum
process group, with settings like:
- number of processes running in parallel;
- time windows when they are allowed to run;
Then have the table categories with all the rest of the
threshold/cost/delay settings.
Then have the possibility to assign tables to categories, and to assign
categories to processing groups.
I think this would allow the most flexibility with the minimum of
repetition in settings (from the user perspective).
Cheers,
Csaba.
After takin a swig o' Arrakan spice grog, nagy@ecircle-ag.com (Csaba Nagy) belched out:
On Thu, 2006-12-21 at 18:41, Alvaro Herrera wrote:
From all the discussion here I think the most benefit would result from
a means to assign tables to different categories, and set up separate
autovacuum rules per category (be it time window when vacuuming is
allowed, autovacuum processes assigned, cost settings, etc). I doubt you
can really define upfront all the vacuum strategies you would need in
real life, so why not let the user define it ? Define the categories by
assigning tables to them, and the rules per category. Then you can
decide what rules to implement, and what should be the defaults...Hmm, yeah, I think this is more or less what I have in mind.
Cool :-)
Can I suggest to also consider the idea of some kind of autovacuum
process group, with settings like:- number of processes running in parallel;
- time windows when they are allowed to run;Then have the table categories with all the rest of the
threshold/cost/delay settings.Then have the possibility to assign tables to categories, and to assign
categories to processing groups.I think this would allow the most flexibility with the minimum of
repetition in settings (from the user perspective).
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.
That should keep any small tables from getting "vacuum-starved."
I'd think the next step would be to increase the number of queues,
perhaps in a time-based fashion. There might be times when it's
acceptable to vacuum 5 tables at once, so you burn thru little tables
"like the blazes," and handle larger ones fairly promptly. And other
times when you don't want to do *any* big tables, and limit a single
queue to just the itty bitty ones.
This approach allows you to stay mostly heuristic-based, as opposed to
having to describe policies in gratuitous detail.
Having a mechanism that requires enormous DBA effort and where there
is considerable risk of simple configuration errors that will be hard
to notice may not be the best kind of "feature" :-).
--
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/slony.html
"You can measure a programmer's perspective by noting his attitude on
the continuing vitality of FORTRAN." -- Alan Perlis
Having a mechanism that requires enormous DBA effort and where there
is considerable risk of simple configuration errors that will be hard
to notice may not be the best kind of "feature" :-).
Why not? It seems to have worked remarkably well for the market leader ;-)
--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice
Christopher Browne wrote:
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.That should keep any small tables from getting "vacuum-starved."
Hmm, would it make sense to keep 2 queues, one that goes through the
tables in smaller-to-larger order, and the other one in the reverse
direction?
I am currently writing a design on how to create "vacuum queues" but I'm
thinking that maybe it's getting too complex to handle, and a simple
idea like yours is enough (given sufficient polish).
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
A long time ago, in a galaxy far, far away, alvherre@commandprompt.com (Alvaro Herrera) wrote:
Christopher Browne wrote:
Seems to me that you could get ~80% of the way by having the
simplest "2 queue" implementation, where tables with size < some
threshold get thrown at the "little table" queue, and tables above
that size go to the "big table" queue.That should keep any small tables from getting "vacuum-starved."
Hmm, would it make sense to keep 2 queues, one that goes through the
tables in smaller-to-larger order, and the other one in the reverse
direction?
Interesting approach; that would mean having just one priority queue
for all the work. That seems to simplify things a bit, which is a
good thing.
Unifying policies further might have some merit, too. The worker
processes (that do the vacuuming) could be set up to alternate between
head and tail of the queue. That is, a worker process could vacuum
the littlest table and then go after the biggest table. That way,
they'd eat at both ends towards the middle. Adding more workers could
easily add to the speed at which both ends of the queue get eaten
(assuming you've got the I/O to support having 4 or 5 vacuums running
concurrently).
There is one thing potentially bad, with that; the thing we never want
is for all the workers to get busy on the biggest tables so that
little ones are no longer being serviced. So there needs to be a way
to make sure that there's one worker devoted to "little tables." I
suppose the rule may be that the 1st worker process *never* goes after
the biggest tables.
That ought to be enough to prevent starvation.
I am currently writing a design on how to create "vacuum queues" but
I'm thinking that maybe it's getting too complex to handle, and a
simple idea like yours is enough (given sufficient polish).
There's plenty to like about coming up with a reasonable set of
heuristics...
--
output = ("cbbrowne" "@" "acm.org")
http://linuxdatabases.info/info/slony.html
Rules of the Evil Overlord #191. "I will not appoint a relative to my
staff of advisors. Not only is nepotism the cause of most breakdowns
in policy, but it also causes trouble with the EEOC."
<http://www.eviloverlord.com/>
On Sun, 2006-12-24 at 03:03, Christopher Browne wrote:
[snip]
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.
That would most definitely not cut it for me, I have more than 2
categories of tables:
- a few small but very often updated/inserted/deleted table: these must
be continuously vacuumed, your "little queue" is not good enough for
that, as even the round trip between the small tables could lead to
bloat on them;
- a few small and moderately updated, that could live with the "little
queue";
- a few big and frequently updated, but which only have a small
percentage of rows actively updated at any time: those could live with
the big queue;
- the rest which are rarely updated, I would put those in a separate
queue so they won't affect the rest, cause vacuuming them is really
mostly not critical;
The point is that I'm not sure there couldn't be even more reasons to
split the tables in even more queues based on the importance of
vacuuming them combined with update rate and their size. If I can set up
my own queues I can experiment with what works best for me... for the
base setup you could set up some default queues. I wonder though how
would you handle dynamics of tables, I mean when will a small table
which grows start to be considered a big table for the purpose of
putting it in one queue or the other ? I guess it would be done on
analyzing the table, which is also handled by autovacuum, so tables with
no vacuum queue settings could go to one of the 2 default queues you
mention.
That should keep any small tables from getting "vacuum-starved."
I'd think the next step would be to increase the number of queues,
perhaps in a time-based fashion. There might be times when it's
acceptable to vacuum 5 tables at once, so you burn thru little tables
"like the blazes," and handle larger ones fairly promptly. And other
times when you don't want to do *any* big tables, and limit a single
queue to just the itty bitty ones.
This is all nice and it would be cool if you could set it up per vacuum
queue. I mean how much more effort would be to allow vacuum queues with
generic settings like time windows with max number of threads for each
window, and let the user explicitly assign tables to those queues,
instead of hard coding the queues and their settings and assign tables
to them based on size or any other heuristics ?
For the average application which needs simple settings, there could be
a default setup with the 2 queues you mention. If it would be possible
to set up some rules to assign tables to queues based on their
properties on analyze time, instead of explicitly assigning to one queue
or other, that would be nice too, and then you can completely cover the
default setup with those settings, and allow for more complex setups for
those who need it.
This approach allows you to stay mostly heuristic-based, as opposed to
having to describe policies in gratuitous detail.
I agree that for simple setups that would be OK, but like I said, if it
would be easy enough to code that heuristics, and provide some sane
setup as default, and then let the user optimize it, that would be a
cool solution.
Now it's true I don't really know how would you code 'assign all tables
which are smaller than x rows to vacuum queue "little-tables"' ... maybe
by providing a function to the queue which "matches" on the table ? And
you can change that function ? No idea, but it probably can be done...
Having a mechanism that requires enormous DBA effort and where there
is considerable risk of simple configuration errors that will be hard
to notice may not be the best kind of "feature" :-).
I think most people will not want to touch the default settings unless
it will not work good enough for them. I definitely not like too much
that I had to set up some cron jobs beside autovacuum, as they are most
definitely not doing optimal job, but autovacuum was not doing that
either, and I'm afraid a 2-queue system would also not do it at least
for the queue-like tables I have, which must be vacuumed continuously,
but only if they need it... that's what I expect from autovacuum, to
vacuum all tables in the proper periodicity/time window for each of
them, but only if they need it... and I can imagine way more such
periodicity/time window settings than 2. Now if autovacuum could figure
out on itself all those settings, that would be even cooler, but if I
can set it up myself that would be good enough.
Actually I think all vacuum patterns could be automatically figured out
by looking at the statistics AND the dynamics of those statistics (i.e.
it changes in bursts, or steadily increasing over time, etc.), and
possibly also the read access statistics (there's no big reward in too
frequently vacuuming a table which is only inserted and deleted and
rarely read), and perhaps some hints from the user about speed
requirements for specific tables.
The problem with all this is that I doubt there is enough experience to
write such a heuristics to optimally cover all situations, and even if
there were, it could result in really complex code, so that's why I
think it is more reasonable to let people set up their vacuum queues...
Another point: it would be nice if autovacuum could also decide to do a
full vacuum, or even better a CLUSTER under certain circumstances for
tables which are badly bloated (you could argue that should never happen
if autovacuum is set up properly, but think about a queue-like table
heavily updated during a backup is running). Of course this can also
backfire if set up by default, so I guess this would have to be set up
explicitly... possibly with rules like table size, max bloat allowed,
time window, etc. One thing to avoid badly locking the application is to
acquire an exclusive lock with nowait and only do the full vacuum if the
lock succeeds (to avoid situations like: a backup is running, and it
will for the next 2 hours, we ask for an exclusive lock, will stay on
hold, but in the same time we lock all new read requests for the next 2
hours till the backup is done... while the operation we wanted to do is
guarantied to be finished in 10 seconds, as the table is heavily bloated
but still small).
Another thing autovacuum could figure out is not to do a vacuum at all
if there is a long running transaction running and disabling anyway the
work vacuum would do (although I'm not sure it does not do this one
already, does it ?).
Well, maybe not all what I rambled along makes sense, but I dumped my
brain now anyway... hope I didn't bore you too much :-)
Cheers,
Csaba.
nagy@ecircle-ag.com (Csaba Nagy) writes:
On Sun, 2006-12-24 at 03:03, Christopher Browne wrote:
[snip]Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.That would most definitely not cut it for me, I have more than 2
categories of tables:- a few small but very often updated/inserted/deleted table: these must
be continuously vacuumed, your "little queue" is not good enough for
that, as even the round trip between the small tables could lead to
bloat on them;
I disagree; if we added more "work processes," that could eat quickly
through the short end of the queue.
- a few small and moderately updated, that could live with the "little
queue";- a few big and frequently updated, but which only have a small
percentage of rows actively updated at any time: those could live with
the big queue;- the rest which are rarely updated, I would put those in a separate
queue so they won't affect the rest, cause vacuuming them is really
mostly not critical;The point is that I'm not sure there couldn't be even more reasons
to split the tables in even more queues based on the importance of
vacuuming them combined with update rate and their size. If I can
set up my own queues I can experiment with what works best for
me... for the base setup you could set up some default queues. I
wonder though how would you handle dynamics of tables, I mean when
will a small table which grows start to be considered a big table
for the purpose of putting it in one queue or the other ? I guess it
would be done on analyzing the table, which is also handled by
autovacuum, so tables with no vacuum queue settings could go to one
of the 2 default queues you mention.
The heuristic I was thinking of didn't involve having two queues, but
rather just 1. By having some size information, work processes could
eat at the queue from both ends.
If you have cases where tables need to be vacuumed *really*
frequently, then you make sure that they are being injected
frequently, and that some of the workers are tied to Just Doing Small
Tables.
I think that *does* cover your scenario quite adequately, and without
having to get into having a bunch of queues.
The heuristic is incomplete in one other fashion, namely that it
doesn't guarantee that tables in the middle will ever get "gotten to."
That mandates having a third policy, namely to have a worker that goes
through tables in the (singular) queue some form of chronological
order.
That should keep any small tables from getting "vacuum-starved."
I'd think the next step would be to increase the number of queues,
perhaps in a time-based fashion. There might be times when it's
acceptable to vacuum 5 tables at once, so you burn thru little tables
"like the blazes," and handle larger ones fairly promptly. And other
times when you don't want to do *any* big tables, and limit a single
queue to just the itty bitty ones.This is all nice and it would be cool if you could set it up per vacuum
queue. I mean how much more effort would be to allow vacuum queues with
generic settings like time windows with max number of threads for each
window, and let the user explicitly assign tables to those queues,
instead of hard coding the queues and their settings and assign tables
to them based on size or any other heuristics ?For the average application which needs simple settings, there could be
a default setup with the 2 queues you mention. If it would be possible
to set up some rules to assign tables to queues based on their
properties on analyze time, instead of explicitly assigning to one queue
or other, that would be nice too, and then you can completely cover the
default setup with those settings, and allow for more complex setups for
those who need it.
My thinking has headed more towards simplifying this; two queues seems
to be one too many :-).
This approach allows you to stay mostly heuristic-based, as opposed to
having to describe policies in gratuitous detail.I agree that for simple setups that would be OK, but like I said, if it
would be easy enough to code that heuristics, and provide some sane
setup as default, and then let the user optimize it, that would be a
cool solution.Now it's true I don't really know how would you code 'assign all tables
which are smaller than x rows to vacuum queue "little-tables"' ... maybe
by providing a function to the queue which "matches" on the table ? And
you can change that function ? No idea, but it probably can be done...
Based on the three policies I've seen, it could make sense to assign
worker policies:
1. You have a worker that moves its way through the queue in some sort of
sequential order, based on when the table is added to the queue, to
guarantee that all tables get processed, eventually.
2. You have workers that always pull the "cheapest" tables in the
queue, perhaps with some sort of upper threshold that they won't go
past.
3. You have workers that alternate between eating from the two ends of the
queue.
Only one queue is needed, and there's only one size parameter
involved.
Having multiple workers of type #2 seems to me to solve the problem
you're concerned about.
--
(format nil "~S@~S" "cbbrowne" "cbbrowne.com")
http://cbbrowne.com/info/spiritual.html
<a href="http://www.netizen.com.au/">thorfinn@netizen.com.au</a>
Millihelen, adj:
The amount of beauty required to launch one ship.
On Mon, 2007-01-08 at 22:29, Chris Browne wrote:
[snip]
Based on the three policies I've seen, it could make sense to assign
worker policies:1. You have a worker that moves its way through the queue in some sort of
sequential order, based on when the table is added to the queue, to
guarantee that all tables get processed, eventually.2. You have workers that always pull the "cheapest" tables in the
queue, perhaps with some sort of upper threshold that they won't go
past.3. You have workers that alternate between eating from the two ends of the
queue.Only one queue is needed, and there's only one size parameter
involved.
Having multiple workers of type #2 seems to me to solve the problem
you're concerned about.
This sounds better, but define "cheapest" in #2... I actually want to
continuously vacuum tables which are small, heavily recycled
(insert/update/delete), and which would bloat quickly. So how do you
define the cost function for having these tables the "cheapest" ?
And how will you define the worker thread count policy ? Always 1 worker
per category, or you can define the number of threads in the 3
categories ? Or you still have in mind time window policies with allowed
number of threads per worker category ? (those numbers could be 0 to
disable a a worker category).
Other thing, how will the vacuum queue be populated ? Or the "queue" here means nothing, all workers will always go through all tables to pick one based on their own criteria ? My concern here is that the current way of checking 1 DB per minute is not going to work with category #2 tables, they really have to be vacuumed continuously sometimes.
Cheers,
Csaba.
Csaba Nagy wrote:
Other thing, how will the vacuum queue be populated ? Or the "queue" here means nothing, all workers will always go through all tables to pick one based on their own criteria ? My concern here is that the current way of checking 1 DB per minute is not going to work with category #2 tables, they really have to be vacuumed continuously sometimes.
Without getting into all the details, the autovacuum naptime is a GUC
variable right now, so it can be much more frequent than the current
default which is 60 seconds.
On Tue, 2007-01-09 at 17:31, Matthew T. O'Connor wrote:
Without getting into all the details, the autovacuum naptime is a GUC
variable right now, so it can be much more frequent than the current
default which is 60 seconds.
Hmm, for some reason I thought the granularity is minutes, but it is
indeed in seconds... one more thing learned.
Cheers,
Csaba.
On Tue, 2007-01-09 at 17:36, Csaba Nagy wrote:
On Tue, 2007-01-09 at 17:31, Matthew T. O'Connor wrote:
Without getting into all the details, the autovacuum naptime is a GUC
variable right now, so it can be much more frequent than the current
default which is 60 seconds.Hmm, for some reason I thought the granularity is minutes, but it is
indeed in seconds... one more thing learned.
OK, so after checking my config, it is still not optimal because it
refers to all the data bases in the cluster. I have setups where I have
multiple data bases in the same cluster, with various degrees of
activity... some of them should be checked continuously, some rarely...
so now if I let the default 60 seconds, each data base will be checked
in db_count * (60 + vacuum time) seconds. This is not optimal... some of
the DBs have way much activity than others. Those I would like to be
checked say each 10 seconds, the rest each 5 minutes...
Cheers,
Csaba.
Csaba Nagy wrote:
On Tue, 2007-01-09 at 17:36, Csaba Nagy wrote:
On Tue, 2007-01-09 at 17:31, Matthew T. O'Connor wrote:
Without getting into all the details, the autovacuum naptime is a GUC
variable right now, so it can be much more frequent than the current
default which is 60 seconds.Hmm, for some reason I thought the granularity is minutes, but it is
indeed in seconds... one more thing learned.OK, so after checking my config, it is still not optimal because it
refers to all the data bases in the cluster. I have setups where I have
multiple data bases in the same cluster, with various degrees of
activity... some of them should be checked continuously, some rarely...
so now if I let the default 60 seconds, each data base will be checked
in db_count * (60 + vacuum time) seconds. This is not optimal... some of
the DBs have way much activity than others. Those I would like to be
checked say each 10 seconds, the rest each 5 minutes...
Agreed, this is the point of this whole thread that there are lots of
setups where autovacuum could do better. My point was only that as we
move forward with these multiple queue / multiple worker process setups
etc, that we already have some infrastructure to make things go faster.
A long time ago, in a galaxy far, far away, nagy@ecircle-ag.com (Csaba Nagy) wrote:
On Mon, 2007-01-08 at 22:29, Chris Browne wrote:
[snip]Based on the three policies I've seen, it could make sense to assign
worker policies:1. You have a worker that moves its way through the queue in some sort of
sequential order, based on when the table is added to the queue, to
guarantee that all tables get processed, eventually.2. You have workers that always pull the "cheapest" tables in the
queue, perhaps with some sort of upper threshold that they won't go
past.3. You have workers that alternate between eating from the two ends of the
queue.Only one queue is needed, and there's only one size parameter
involved.
Having multiple workers of type #2 seems to me to solve the problem
you're concerned about.This sounds better, but define "cheapest" in #2... I actually want to
continuously vacuum tables which are small, heavily recycled
(insert/update/delete), and which would bloat quickly. So how do you
define the cost function for having these tables the "cheapest" ?
Cost would be based on the number of pages in the table. The smallest
tables are obviously the cheapest to vacuum.
That's separate from the policy for adding tables to the queue; THAT
would sensibly be based on the number of dead tuples; the current
policy of autovacuum seems not unreasonable...
And how will you define the worker thread count policy ? Always 1
worker per category, or you can define the number of threads in the
3 categories ? Or you still have in mind time window policies with
allowed number of threads per worker category ? (those numbers could
be 0 to disable a a worker category).
It would make a lot of sense to have time ranges that would indicate
when different values were wanted. Good question...
Other thing, how will the vacuum queue be populated ? Or the "queue"
here means nothing, all workers will always go through all tables to
pick one based on their own criteria ? My concern here is that the
current way of checking 1 DB per minute is not going to work with
category #2 tables, they really have to be vacuumed continuously
sometimes.
I think it makes considerable sense to have a queue table for this.
Having one of the threads look for new entries makes considerable
sense.
Offering the Gentle DBA the ability to add in entries based on their
special knowledge would also seem sensible.
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/slony.html
Keeping instructions and operands in different memories saves .20
(.09) microseconds.
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
Christopher Browne wrote:
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.That should keep any small tables from getting "vacuum-starved."
Hmm, would it make sense to keep 2 queues, one that goes through the
tables in smaller-to-larger order, and the other one in the reverse
direction?I am currently writing a design on how to create "vacuum queues" but I'm
thinking that maybe it's getting too complex to handle, and a simple
idea like yours is enough (given sufficient polish).
Sounds good to me. My colleague Pavan has just suggested multiple
autovacuums and then prototyped something almost as a side issue while
trying to solve other problems. I'll show him this entry, maybe he saw
it already? I wasn't following this discussion until now.
The 2 queue implementation seemed to me to be the most straightforward
implementation, mirroring Chris' suggestion. A few aspects that haven't
been mentioned are:
- if you have more than one VACUUM running, we'll need to watch memory
management. Having different queues based upon table size is a good way
of doing that, since the smaller queues have a naturally limited memory
consumption.
- with different size-based queues, the larger VACUUMs can be delayed so
they take much longer, while the small tables can go straight through
Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.
The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.
Alvaro, have you completed your design?
Pavan, what are your thoughts?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote:
Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range.
Yeah, eventually it occurred to me the fact that as soon as you have 2
queues, you may as well want to have 3 or in fact any number. Which in
my proposal is very easily achieved.
Alvaro, have you completed your design?
No, I haven't, and the part that's missing is precisely the queues
stuff. I think I've been delaying posting it for too long, and that is
harmful because it makes other people waste time thinking on issues that
I may already have resolved, and delays the bashing that yet others will
surely inflict on my proposal, which is never a good thing ;-) So maybe
I'll put in a stub about the "queues" stuff and see how people like the
whole thing.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Fri, 2007-01-12 at 19:33 -0300, Alvaro Herrera wrote:
Alvaro, have you completed your design?
No, I haven't, and the part that's missing is precisely the queues
stuff. I think I've been delaying posting it for too long, and that is
harmful because it makes other people waste time thinking on issues that
I may already have resolved, and delays the bashing that yet others will
surely inflict on my proposal, which is never a good thing ;-) So maybe
I'll put in a stub about the "queues" stuff and see how people like the
whole thing.
I've not read a word spoken against the general idea, so I think we
should pursue this actively for 8.3. It should be straightforward to
harvest the good ideas, though there will definitely be many.
Perhaps we should focus on the issues that might result, so that we
address those before we spend time on the details of the user interface.
Can we deadlock or hang from running multiple autovacuums?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
In an attempt to throw the authorities off his trail, alvherre@commandprompt.com (Alvaro Herrera) transmitted:
Simon Riggs wrote:
Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with
millions of blocks, the tables in the mid-range still lose out. So
I'm thinking that a design with 3 queues based upon size ranges,
plus the idea that when a queue is empty it will scan for tables
slightly above/below its normal range.Yeah, eventually it occurred to me the fact that as soon as you have
2 queues, you may as well want to have 3 or in fact any number.
Which in my proposal is very easily achieved.
Adding an extra attribute to reflect a different ordering or a
different policy allows having as many queues in one queue table as
you might need.
Alvaro, have you completed your design?
No, I haven't, and the part that's missing is precisely the queues
stuff. I think I've been delaying posting it for too long, and that
is harmful because it makes other people waste time thinking on
issues that I may already have resolved, and delays the bashing that
yet others will surely inflict on my proposal, which is never a good
thing ;-) So maybe I'll put in a stub about the "queues" stuff and
see how people like the whole thing.
Seems like a good idea to me.
Implementing multiple queues amounts to having different worker
processes/threads that operate on the queue table using varying
policies.
--
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/lisp.html
Rules of the Evil Overlord #60. "My five-year-old child advisor will
also be asked to decipher any code I am thinking of using. If he
breaks the code in under 30 seconds, it will not be used. Note: this
also applies to passwords." <http://www.eviloverlord.com/>
On Fri, Jan 12, 2007 at 07:33:05PM -0300, Alvaro Herrera wrote:
Simon Riggs wrote:
Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range.Yeah, eventually it occurred to me the fact that as soon as you have 2
queues, you may as well want to have 3 or in fact any number. Which in
my proposal is very easily achieved.Alvaro, have you completed your design?
No, I haven't, and the part that's missing is precisely the queues
stuff. I think I've been delaying posting it for too long, and that is
harmful because it makes other people waste time thinking on issues that
I may already have resolved, and delays the bashing that yet others will
surely inflict on my proposal, which is never a good thing ;-) So maybe
I'll put in a stub about the "queues" stuff and see how people like the
whole thing.
Have you made any consideration of providing feedback on autovacuum to users?
Right now we don't even know what tables were vacuumed when and what was
reaped. This might actually be another topic.
---elein
elein@varlena.com
Simon Riggs wrote:
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
Christopher Browne wrote:
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.That should keep any small tables from getting "vacuum-starved."
This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.
In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.
Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.Alvaro, have you completed your design?
Pavan, what are your thoughts?
IMO 2-queue is a good step forward, but in long term we may need to go
for a multiprocess autovacuum where the number and tasks of processes
are either demand based and/or user configurable.
Another idea is to vacuum the tables in round-robin fashion
where the quantum could be either "time" or "number of block". The
autovacuum process would vacuum 'x' blocks of one table and then
schedule next table in the queue. This would avoid starvation of
small tables, though cost of index cleanup might go up because of
increased IO. Any thoughts of this approach ?
Thanks,
Pavan
Simon Riggs wrote:
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
Christopher Browne wrote:
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.That should keep any small tables from getting "vacuum-starved."
This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.
In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.
Some feedback from initial testing is that 2 queues probably isn't
enough. If you have tables with 100s of blocks and tables with millions
of blocks, the tables in the mid-range still lose out. So I'm thinking
that a design with 3 queues based upon size ranges, plus the idea that
when a queue is empty it will scan for tables slightly above/below its
normal range. That way we wouldn't need to specify the cut-offs with a
difficult to understand new set of GUC parameters, define them exactly
and then have them be wrong when databases grow.The largest queue would be the one reserved for Xid wraparound
avoidance. No table would be eligible for more than one queue at a time,
though it might change between queues as it grows.Alvaro, have you completed your design?
Pavan, what are your thoughts?
IMO 2-queue is a good step forward, but in long term we may need to go
for a multiprocess autovacuum where the number and tasks of processes
are either demand based and/or user configurable.
Another idea is to vacuum the tables in round-robin fashion
where the quantum could be either "time" or "number of block". The
autovacuum process would vacuum 'x' blocks of one table and then
schedule next table in the queue. This would avoid starvation of
small tables, though cost of index cleanup might go up because of
increased IO. Any thoughts of this approach ?
Thanks,
Pavan
Pavan Deolasee wrote:
Simon Riggs wrote:
On Fri, 2006-12-29 at 20:25 -0300, Alvaro Herrera wrote:
Christopher Browne wrote:
Seems to me that you could get ~80% of the way by having the simplest
"2 queue" implementation, where tables with size < some threshold get
thrown at the "little table" queue, and tables above that size go to
the "big table" queue.That should keep any small tables from getting "vacuum-starved."
This is exectly what I am trying, two process autovacuum and a GUC to
seperate small tables.In this case, one process takes up vacuuming of the small tables and
other process vacuuming of the remaining tables as well as Xid
avoidance related vacuuming. The goal is to avoid starvation of small
tables when a large table is being vacuumed (which may take
several hours) without adding too much complexity to the code.
Would it work to make the queues push the treshold into the direction of
the still running queue if the other queue finishes before the still
running one? This would achieve some kind of auto-tuning, but that is
usually tricky.
For example, what if one of the queues got stuck on a lock?
--
Alban Hertroys
alban@magproductions.nl
magproductions b.v.
T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede
// Integrate Your World //
Simon Riggs wrote:
Perhaps we should focus on the issues that might result, so that we
address those before we spend time on the details of the user interface.
Can we deadlock or hang from running multiple autovacuums?
If you were to run multiple autovacuum processes the way they are today,
maybe. But that's not my intention -- the launcher would be the only
one to read the catalogs; the workers would be started only to do a
single VACUUM job. This reduces the risk of this kind of problems.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
elein wrote:
Have you made any consideration of providing feedback on autovacuum to users?
Right now we don't even know what tables were vacuumed when and what was
reaped. This might actually be another topic.
I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc. I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).
Now, I'd like this to be a VACUUM thing, not autovacuum. That means
that manually-run vacuums would be logged as well.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
elein wrote:
Have you made any consideration of providing feedback on autovacuum to
users? Right now we don't even know what tables were vacuumed when and
what was reaped. This might actually be another topic.I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc. I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).Now, I'd like this to be a VACUUM thing, not autovacuum. That means
that manually-run vacuums would be logged as well.
Yes I did intend this thought for vacuum, not strictly autovacuum.
Alvaro Herrera wrote:
I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc. I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).
I'm not familiar with his proposal, but I'm not sure what I think of
logging vacuum (and perhaps analyze) commands to a table. We have never
logged anything to tables inside PG. I would be worried about this
eating a lot of space in some situations.
I think most people would just be happy if we could get autovacuum to
log it's actions at a much higher log level. I think that "autovacuum
vacuumed table x" is important and shouldn't be all the way down at the
debug level.
The other (more involved) solution to this problem was proposed which
was create a separate set of logging control params for autovacuum so
that you can turn it up or down independent of the general server logging.
Now, I'd like this to be a VACUUM thing, not autovacuum. That means
that manually-run vacuums would be logged as well.
+1
Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, [...]I think most people would just be happy if we could get autovacuum to
log it's actions at a much higher log level. I think that "autovacuum
vacuumed table x" is important and shouldn't be all the way down at the
debug level.
+1 here. Even more than "autovacuum vacuumed table x", I'd like to see
"vacuum starting table x" and "vacuum done table x". The reason I say
that is because the speculation "autovacuum might have been running"
is now a frequent phrase I hear when performance questions are asked.
If vacuum start and end times were logged at a much earlier level,
that feature plus log_min_duration_statement could easily disprove
the "vacuum might have been running" hypothesis.
On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
elein wrote:
Have you made any consideration of providing feedback on autovacuum to
users? Right now we don't even know what tables were vacuumed when and
what was reaped. This might actually be another topic.I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc. I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).Now, I'd like this to be a VACUUM thing, not autovacuum. That means
that manually-run vacuums would be logged as well.Yes I did intend this thought for vacuum, not strictly autovacuum.
I agree, for all VACUUMs: we need a log table.
The only way we can get a feedback loop on what has come before is by
remembering what happened. Simply logging it is interesting, but not
enough.
There is some complexity there, because with many applications a small
table gets VACUUMed every few minutes, so the log table would become a
frequently updated table itself. I'd also suggest that we might want to
take account of the number of tuples removed by btree pre-split VACUUMs
also.
I also like the idea of a single scheduler and multiple child workers.
The basic architecture is clear and obviously beneficial. What worries
me is how the scheduler will work; there seems to be as many ideas as we
have hackers. I'm wondering if we should provide the facility of a
pluggable scheduler? That way you'd be able to fine tune the schedule to
both the application and to the business requirements. That would allow
integration with external workflow engines and job schedulers, for when
VACUUMs need to not-conflict with external events.
If no scheduler has been defined, just use a fairly simple default.
The three main questions are
- what is the maximum size of VACUUM that can start *now*
- can *this* VACUUM start now?
- which is the next VACUUM to run?
If we have an API that allows those 3 questions to be asked, then a
scheduler plug-in could supply the answers. That way any complex
application rules (table A is available for VACUUM now for next 60 mins,
table B is in constant use so we must use vacuum_delay), external events
(long running reports have now finished, OK to VACUUM), time-based rules
(e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
first 3 days of the each month is financial accounting close) can be
specified.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Friday 19 January 2007 01:47, Simon Riggs wrote:
On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
elein wrote:
Have you made any consideration of providing feedback on autovacuum
to users? Right now we don't even know what tables were vacuumed when
and what was reaped. This might actually be another topic.I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc. I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).Now, I'd like this to be a VACUUM thing, not autovacuum. That means
that manually-run vacuums would be logged as well.Yes I did intend this thought for vacuum, not strictly autovacuum.
I agree, for all VACUUMs: we need a log table.
The only way we can get a feedback loop on what has come before is by
remembering what happened. Simply logging it is interesting, but not
enough.
Correct, I think we are all saying the same thing that is this log table is
purely inserts so that we can see trends over time.
There is some complexity there, because with many applications a small
table gets VACUUMed every few minutes, so the log table would become a
frequently updated table itself. I'd also suggest that we might want to
take account of the number of tuples removed by btree pre-split VACUUMs
also.
Thinking on this a bit more, I suppose that this table really should allow for
user defined triggers on it, so that a DBA can create partioning for it, not
to mention being able to move it off into it's own tablespace.
I also like the idea of a single scheduler and multiple child workers.
The basic architecture is clear and obviously beneficial. What worries
me is how the scheduler will work; there seems to be as many ideas as we
have hackers. I'm wondering if we should provide the facility of a
pluggable scheduler? That way you'd be able to fine tune the schedule to
both the application and to the business requirements. That would allow
integration with external workflow engines and job schedulers, for when
VACUUMs need to not-conflict with external events.If no scheduler has been defined, just use a fairly simple default.
The three main questions are
- what is the maximum size of VACUUM that can start *now*
How can we determine this given we have no real knowledge of the upcoming
adverse IO conditions ?
- can *this* VACUUM start now?
- which is the next VACUUM to run?If we have an API that allows those 3 questions to be asked, then a
scheduler plug-in could supply the answers. That way any complex
application rules (table A is available for VACUUM now for next 60 mins,
table B is in constant use so we must use vacuum_delay), external events
(long running reports have now finished, OK to VACUUM), time-based rules
(e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
first 3 days of the each month is financial accounting close) can be
specified.
Another thought, is it at all possible to do a partial vacuum? ie spend the
next 30 minutes vacuuming foo table, and update the fsm with what hew have
learned over the 30 mins, even if we have not done a full table scan ?
--
Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
Added to TODO:
o Allow multiple vacuums so large tables do not starve small
tableshttp://archives.postgresql.org/pgsql-general/2007-01/msg00031.php
o Improve control of auto-vacuum
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00876.php
---------------------------------------------------------------------------
Darcy Buskermolen wrote:
On Friday 19 January 2007 01:47, Simon Riggs wrote:
On Tue, 2007-01-16 at 07:16 -0800, Darcy Buskermolen wrote:
On Tuesday 16 January 2007 06:29, Alvaro Herrera wrote:
elein wrote:
Have you made any consideration of providing feedback on autovacuum
to users? Right now we don't even know what tables were vacuumed when
and what was reaped. This might actually be another topic.I'd like to hear other people's opinions on Darcy Buskermolen proposal
to have a log table, on which we'd register what did we run, at what
time, how long did it last, how many tuples did it clean, etc. I feel
having it on the regular text log is useful but it's not good enough.
Keep in mind that in the future we may want to peek at that collected
information to be able to take better scheduling decisions (or at least
inform the DBA that he sucks).Now, I'd like this to be a VACUUM thing, not autovacuum. That means
that manually-run vacuums would be logged as well.Yes I did intend this thought for vacuum, not strictly autovacuum.
I agree, for all VACUUMs: we need a log table.
The only way we can get a feedback loop on what has come before is by
remembering what happened. Simply logging it is interesting, but not
enough.Correct, I think we are all saying the same thing that is this log table is
purely inserts so that we can see trends over time.There is some complexity there, because with many applications a small
table gets VACUUMed every few minutes, so the log table would become a
frequently updated table itself. I'd also suggest that we might want to
take account of the number of tuples removed by btree pre-split VACUUMs
also.Thinking on this a bit more, I suppose that this table really should allow for
user defined triggers on it, so that a DBA can create partioning for it, not
to mention being able to move it off into it's own tablespace.I also like the idea of a single scheduler and multiple child workers.
The basic architecture is clear and obviously beneficial. What worries
me is how the scheduler will work; there seems to be as many ideas as we
have hackers. I'm wondering if we should provide the facility of a
pluggable scheduler? That way you'd be able to fine tune the schedule to
both the application and to the business requirements. That would allow
integration with external workflow engines and job schedulers, for when
VACUUMs need to not-conflict with external events.If no scheduler has been defined, just use a fairly simple default.
The three main questions are
- what is the maximum size of VACUUM that can start *now*How can we determine this given we have no real knowledge of the upcoming
adverse IO conditions ?- can *this* VACUUM start now?
- which is the next VACUUM to run?If we have an API that allows those 3 questions to be asked, then a
scheduler plug-in could supply the answers. That way any complex
application rules (table A is available for VACUUM now for next 60 mins,
table B is in constant use so we must use vacuum_delay), external events
(long running reports have now finished, OK to VACUUM), time-based rules
(e.g. first Sunday of the month 00:00 - 04:00 is scheduled downtime,
first 3 days of the each month is financial accounting close) can be
specified.Another thought, is it at all possible to do a partial vacuum? ie spend the
next 30 minutes vacuuming foo table, and update the fsm with what hew have
learned over the 30 mins, even if we have not done a full table scan ?--
Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Darcy Buskermolen wrote:
[snip]
Another thought, is it at all possible to do a partial vacuum? ie spend the
next 30 minutes vacuuming foo table, and update the fsm with what hew have
learned over the 30 mins, even if we have not done a full table scan ?
There was a proposal for this, but it was dropped on 2 grounds.
1. partial vacuum would mean that parts of the table are missed, the
user could never vacuum certain parts and transaction wraparound would
get you. You may also have other performance issues as you forgot
certian parts of the table
2. Index cleanup is the most expensive part of vacuum. So doing a
partial vacuum actually means more I/O as you have to do index cleanup
more often.
If we are talking about autovacuum, 1 doesn't become so much of an issue
as you just make the autovacuum remember what parts of the table it's
vacuumed. This really has great power when you have a dead space map.
Item 2 will still be an issue. But if you define "partial" as either
fill maintenance_work_mem, or finish the table, you are not increasing
I/O at all. As when maintenance work mem is full, you have to cleanup
all the indexes anyway. This is really more like VACUUM SINGLE, but the
same principal applies.
I believe all planning really needs to think about how a dead space map
will effect what vacuum is going to be doing in the future.
Strange idea that I haven't researched, Given Vacuum can't be run in a
transaction, it is possible at a certain point to quit the current
transaction and start another one. There has been much chat and now a
TODO item about allowing multiple vacuums to not starve small tables.
But if a big table has a long running vacuum the vacuum of the small
table won't be effective anyway will it? If vacuum of a big table was
done in multiple transactions you could reduce the effect of long
running vacuum. I'm not sure how this effects the rest of the system
thought.
Russell Smith
Show quoted text
Russell Smith wrote:
Strange idea that I haven't researched, Given Vacuum can't be run in a
transaction, it is possible at a certain point to quit the current
transaction and start another one. There has been much chat and now a
TODO item about allowing multiple vacuums to not starve small tables.
But if a big table has a long running vacuum the vacuum of the small
table won't be effective anyway will it? If vacuum of a big table was
done in multiple transactions you could reduce the effect of long
running vacuum. I'm not sure how this effects the rest of the system
thought.
That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to
ignore other vacuums in the oldest xmin calculation.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Sat, 2007-01-20 at 09:41 +1100, Russell Smith wrote:
Darcy Buskermolen wrote:
[snip]
Another thought, is it at all possible to do a partial vacuum? ie spend the
next 30 minutes vacuuming foo table, and update the fsm with what hew have
learned over the 30 mins, even if we have not done a full table scan ?There was a proposal for this, but it was dropped on 2 grounds.
1. partial vacuum would mean that parts of the table are missed, the
user could never vacuum certain parts and transaction wraparound would
get you. You may also have other performance issues as you forgot
certian parts of the table
Partial vacuum would still be possible if you remembered where you got
to in the VACUUM and then started from that same point next time. It
could then go to the end of the table and wrap back around.
2. Index cleanup is the most expensive part of vacuum. So doing a
partial vacuum actually means more I/O as you have to do index cleanup
more often.
Again, not necessarily. A large VACUUM can currently perform more than
one set of index scans, so if you chose the right stopping place for a
partial VACUUM you need never incur any additional work. It might even
save effort in the long run.
I'm not necessarily advocating partial VACUUM, just pointing out that
the problems you raise need not be barriers to implementation, should
that be considered worthwhile.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Sun, Jan 21, 2007 at 12:24:38PM +0000, Simon Riggs wrote:
Partial vacuum would still be possible if you remembered where you got
to in the VACUUM and then started from that same point next time. It
could then go to the end of the table and wrap back around.
ISTM the Dead Space Map would give you this automatically, since that's
your memory... Once you have the DSM to track where the dead pages are,
you can set it up to target clusters first, thus giving maximum bang
for buck.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Sun, Jan 21, 2007 at 11:39:45AM +0000, Heikki Linnakangas wrote:
Russell Smith wrote:
Strange idea that I haven't researched, Given Vacuum can't be run in a
transaction, it is possible at a certain point to quit the current
transaction and start another one. There has been much chat and now a
TODO item about allowing multiple vacuums to not starve small tables.
But if a big table has a long running vacuum the vacuum of the small
table won't be effective anyway will it? If vacuum of a big table was
done in multiple transactions you could reduce the effect of long
running vacuum. I'm not sure how this effects the rest of the system
thought.That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to
ignore other vacuums in the oldest xmin calculation.
And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
will commit and start a new transaction. That's still useful even with
Hannu's patch in case you start a vacuum with maintenance_work_mem too
small; you can abort the vacuum some time later and at least some of the
work it's done will get committed.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Sun, 2007-01-21 at 14:26 -0600, Jim C. Nasby wrote:
On Sun, Jan 21, 2007 at 11:39:45AM +0000, Heikki Linnakangas wrote:
Russell Smith wrote:
Strange idea that I haven't researched, Given Vacuum can't be run in a
transaction, it is possible at a certain point to quit the current
transaction and start another one. There has been much chat and now a
TODO item about allowing multiple vacuums to not starve small tables.
But if a big table has a long running vacuum the vacuum of the small
table won't be effective anyway will it? If vacuum of a big table was
done in multiple transactions you could reduce the effect of long
running vacuum. I'm not sure how this effects the rest of the system
thought.That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to
ignore other vacuums in the oldest xmin calculation.And IIRC in 8.1 every time vacuum finishes a pass over the indexes it
will commit and start a new transaction.
err...It doesn't do this now and IIRC didn't do that in 8.1 either.
That's still useful even with
Hannu's patch in case you start a vacuum with maintenance_work_mem too
small; you can abort the vacuum some time later and at least some of the
work it's done will get committed.
True, but not recommended, though for a variety of reasons.
The reason is not intermediate commits, but just that the work of VACUUM
is mostly non-transactional in nature, apart from the various catalog
entries when it completes.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Hi there,
Is is possible to stop all user access to postgres, but still give access to
admin?
Just temporarily, not a security setup.
Something like, stop all users but allow user x and y.
thx
org@kewlstuff.co.za wrote:
Hi there,
Is is possible to stop all user access to postgres, but still give
access to admin?
Just temporarily, not a security setup.Something like, stop all users but allow user x and y.
You could restart in single user mode, or alter pg_hba.conf to allow the
users you want and disallow all other users.
single user mode will require you have direct access to the machine to
do the alterations.
using pg_hba.conf will not disconnect existing users as far as I'm aware.
That's the best advice I can offer, maybe somebody else will be able to
give you more
Show quoted text
thx
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Am Montag, 22. Januar 2007 10:32 schrieb org@kewlstuff.co.za:
Is is possible to stop all user access to postgres, but still give access
to admin?
Make the appropriate adjustments to pg_hba.conf.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum. So doing a
partial vacuum actually means more I/O as you have to do index cleanup
more often.
I don't think that's usually the case. Index(es) are typically only a
fraction of the size of the table, and since 8.2 we do index vacuums in
a single scan in physical order. In fact, in many applications the index
is be mostly cached and the index scan doesn't generate any I/O at all.
I believe the heap scans are the biggest issue at the moment.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum. So doing a
partial vacuum actually means more I/O as you have to do index cleanup
more often.I don't think that's usually the case. Index(es) are typically only a
fraction of the size of the table, and since 8.2 we do index vacuums in
a single scan in physical order. In fact, in many applications the index
is be mostly cached and the index scan doesn't generate any I/O at all.
Are _all_ the indexes cached? I would doubt that. Also, for typical
table, what percentage is the size of all indexes combined?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
Heikki Linnakangas wrote:
Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum. So doing a
partial vacuum actually means more I/O as you have to do index cleanup
more often.I don't think that's usually the case. Index(es) are typically only a
fraction of the size of the table, and since 8.2 we do index vacuums in
a single scan in physical order. In fact, in many applications the index
is be mostly cached and the index scan doesn't generate any I/O at all.Are _all_ the indexes cached? I would doubt that.
Well, depends on your schema, of course. In many applications, yes.
Also, for typical
table, what percentage is the size of all indexes combined?
Well, there's no such thing as a typical table. As an anecdote here's
the ratios (total size of all indexes of a table)/(size of corresponding
heap) for the bigger tables for a DBT-2 run I have at hand:
Stock: 1190470/68550 = 6%
Order_line: 950103/274372 = 29%
Customer: 629011 /(5711+20567) = 8%
In any case, for the statement "Index cleanup is the most expensive part
of vacuum" to be true, you're indexes would have to take up 2x as much
space as the heap, since the heap is scanned twice. I'm sure there's
databases like that out there, but I don't think it's the common case.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Bruce Momjian wrote:
Heikki Linnakangas wrote:
Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum. So doing a
partial vacuum actually means more I/O as you have to do index cleanup
more often.I don't think that's usually the case. Index(es) are typically only a
fraction of the size of the table, and since 8.2 we do index vacuums in
a single scan in physical order. In fact, in many applications the index
is be mostly cached and the index scan doesn't generate any I/O at all.Are _all_ the indexes cached? I would doubt that.
Well, depends on your schema, of course. In many applications, yes.
Also, for typical
table, what percentage is the size of all indexes combined?Well, there's no such thing as a typical table. As an anecdote here's
the ratios (total size of all indexes of a table)/(size of corresponding
heap) for the bigger tables for a DBT-2 run I have at hand:Stock: 1190470/68550 = 6%
Order_line: 950103/274372 = 29%
Customer: 629011 /(5711+20567) = 8%In any case, for the statement "Index cleanup is the most expensive part
of vacuum" to be true, you're indexes would have to take up 2x as much
space as the heap, since the heap is scanned twice. I'm sure there's
databases like that out there, but I don't think it's the common case.
I agree it index cleanup isn't > 50% of vacuum. I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Mon, 2007-01-22 at 12:18 -0500, Bruce Momjian wrote:
Heikki Linnakangas wrote:
In any case, for the statement "Index cleanup is the most expensive part
of vacuum" to be true, you're indexes would have to take up 2x as much
space as the heap, since the heap is scanned twice. I'm sure there's
databases like that out there, but I don't think it's the common case.I agree it index cleanup isn't > 50% of vacuum. I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.
Clearly keeping track of what needs vacuuming will lead to a more
efficient VACUUM. Your math applies to *any* design that uses some form
of book-keeping to focus in on the hot spots.
On a separate thread, Heikki has raised a different idea for VACUUM.
Heikki's idea asks an important question: where and how should DSM
information be maintained? Up to now everybody has assumed that it would
be maintained when DML took place and that the DSM would be a
transactional data structure (i.e. on-disk). Heikki's idea requires
similar bookkeeping requirements to the original DSM concept, but the
interesting aspect is that the DSM information is collected off-line,
rather than being an overhead on every statement's response time.
That idea seems extremely valuable to me.
One of the main challenges is how we cope with large tables that have a
very fine spray of updates against them. A DSM bitmap won't help with
that situation, regrettably.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
"Bruce Momjian" <bruce@momjian.us> writes:
I agree it index cleanup isn't > 50% of vacuum. I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.
Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.
That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.
The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.
In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.
Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.
---------------------------------------------------------------------------
Gregory Stark wrote:
"Bruce Momjian" <bruce@momjian.us> writes:
I agree it index cleanup isn't > 50% of vacuum. I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.That is, if you assum that only 5% of the table will be scanned and you
arrange to do it sequentially then you should expect the i/o to be marginally
faster than just reading the entire table. Vacuum does do some cpu work and
wouldn't have to consult the clog as often, so it would still be somewhat
faster.The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
I also had some theories involving linux being confused by the seeks and
turning off read-ahead but I could never prove them.In short, to see big benefits you would have to have a much smaller percentage
of the table being read. That shouldn't be taken to mean that the DSM is a
loser. There are plenty of use cases where tables can be extremely large and
have only very small percentages that are busy. The big advantage of the DSM
is that it takes the size of the table out of the equation and replaces it
with the size of the busy portion of the table. So updating a single record in
a terabyte table has the same costs as updating a single record in a kilobyte
table.Sadly that's not quite true due to indexes, and due to the size of the bitmap
itself. But going back to your numbers it does mean that if you update a
single row out of a terabyte table then we'll be removing about 85% of the i/o
(minus the i/o needed to read the DSM, about .025%). If you update about 1%
then you would be removing substantially less, and once you get to about 10%
then you're back where you started.--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes:
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.
Well the answer is of course "that depends".
If you maintain the dead space at a steady state averaging 1.5% instead of 5%
your table is 3.33% smaller on average. If this is a DSS system that will
translate into running your queries 3.33% faster. It will take a lot of
vacuums before they hurt more than a 3%+ performance drop.
If it's an OLTP system the it's harder to figure. a 3.33% increase in data
density will translate to a higher cache hit rate but how much higher depends
on a lot of factors. In our experiments we actually got bigger boost in these
kinds of situations than the I expected (I expected comparable to the 3.33%
improvement). So it could be even more than 3.33%. But like said it depends.
If you already have the whole database cache you won't see any improvement. If
you are right on the cusp you could see a huge benefit.
It sounds like you're underestimating the performance drain 10% wasted space
has. If we found out that one routine was unnecessarily taking 10% of the cpu
time it would be an obvious focus of attention. 10% wasted space is going to
work out to about 10% of the i/o time.
It also sounds like we're still focused on the performance impact in absolute
terms. I'm much more interested in changing the performance characteristics so
they're predictable and scalable. It doesn't matter much if your 1kb table is
100% slower than necessary but it does matter if your 1TB table needs 1,000x
as much vacuuming as your 1GB table even if it's getting the same update
traffic.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.
Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Mon, Jan 22, 2007 at 06:42:09PM +0000, Simon Riggs wrote:
On Mon, 2007-01-22 at 13:27 -0500, Bruce Momjian wrote:
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.
Ken
Bruce Momjian wrote:
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.
But if the partial vacuum is able to clean the busiest pages and reclaim
useful space, currently-running transactions will be able to use that
space and thus not have to extend the table. Not that extension is a
problem on itself, but it'll keep your working set smaller.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Bruce Momjian wrote:
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.
Buying a house with a 25-year mortgage is much more expensive than just
paying cash too, but you don't always have a choice.
Surely the key benefit of the partial vacuuming thing is that you can at
least do something useful with a large table if a full vacuum takes 24
hours and you only have 4 hours of idle I/O.
It's also occurred to me that all the discussion of scheduling way back
when isn't directly addressing the issue. What most people want (I'm
guessing) is to vacuum *when the user-workload allows* and the
time-tabling is just a sysadmin first-approximation at that.
With partial vacuuming possible, we can arrange things with just three
thresholds and two measurements:
Measurement 1 = system workload
Measurement 2 = a per-table "requires vacuuming" value
Threshold 1 = workload at which we do more vacuuming
Threshold 2 = workload at which we do less vacuuming
Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to the
thresholds and starts a new vacuum, kills one or does nothing. New
vacuum processes keep getting started as long as there is workload spare
and tables that need vacuuming.
Now the trick of course is how you measure system workload in a
meaningful manner.
--
Richard Huxton
Archonet Ltd
Gregory Stark wrote:
"Bruce Momjian" <bruce@momjian.us> writes:
I agree it index cleanup isn't > 50% of vacuum. I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.
Note that with uniformly random updates, you have dirtied every page of
the table until you get anywhere near 5% of dead space. So we have to
assume non-uniform distribution of update for the DSM to be of any help.
And if we assume non-uniform distribution, it's a good bet that the
blocks that need vacuuming are also not randomly distributed. In fact,
they might very well all be in one cluster, so that scanning that
cluster is indeed sequential I/O.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Kenneth Marshall wrote:
On Mon, Jan 22, 2007 at 06:42:09PM +0000, Simon Riggs wrote:
Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.
Yeah, I had the same idea when we discussed synchronizing sequential
scans. The biggest difference is that with queries, there's often a user
waiting for the query to finish, but with vacuum we don't care so much
how long it takes.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Jan 22, 2007 at 07:24:20PM +0000, Heikki Linnakangas wrote:
Kenneth Marshall wrote:
On Mon, Jan 22, 2007 at 06:42:09PM +0000, Simon Riggs wrote:
Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...
There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.Yeah, I had the same idea when we discussed synchronizing sequential
scans. The biggest difference is that with queries, there's often a user
waiting for the query to finish, but with vacuum we don't care so much
how long it takes.
Yes, but with trending and statistical analysis you may not need the
exact answer ASAP. An approximate answer based on a fraction of the
information would be useful. Also, "what if" queries could be run without
impacting the production uses of a database. One might imagine having a
query with results that "converge" as the table is processed during normal
use.
Ken
Alvaro Herrera wrote:
Bruce Momjian wrote:
Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.But if the partial vacuum is able to clean the busiest pages and reclaim
useful space, currently-running transactions will be able to use that
space and thus not have to extend the table. Not that extension is a
problem on itself, but it'll keep your working set smaller.
Yes, but my point is that if you are trying to avoid vacuuming the
table, I am afraid the full index scan is going to be painful too. I
can see corner cases where partial vacuum is a win (I only have 4 hours
of idle I/O), but for the general case I am still worried that partial
vacuum will not be that useful as long as we have to scan the indexes.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Gregory Stark wrote:
Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%. [...]The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.
Could one take advantage of this observation in designing the DSM?
Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit. It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Gregory Stark
Sent: maandag 22 januari 2007 19:41
To: Bruce Momjian
Cc: Heikki Linnakangas; Russell Smith; Darcy Buskermolen;
Simon Riggs; Alvaro Herrera; Matthew T. O'Connor; Pavan
Deolasee; Christopher Browne; pgsql-general@postgresql.org;
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [GENERAL] Autovacuum Improvements"Bruce Momjian" <bruce@momjian.us> writes:
Yep, agreed on the random I/O issue. The larger question is if you
have a huge table, do you care to reclaim 3% of the tablesize, rather
than just vacuum it when it gets to 10% dirty? I realize the vacuum
is going to take a lot of time, but vacuuming to relaim 3%three times
seems like it is going to be more expensive than just vacuuming the
10% once. And vacuuming to reclaim 1% ten times seems even more
expensive. The partial vacuum idea is starting to look likea loser to me again.
Well the answer is of course "that depends".
If you maintain the dead space at a steady state averaging
1.5% instead of 5% your table is 3.33% smaller on average. If
this is a DSS system that will translate into running your
queries 3.33% faster. It will take a lot of vacuums before
they hurt more than a 3%+ performance drop.
Good, this means a DSS system will mostly do table scans (right?). So
probably you should witness the 'table scan' statistic and rows fetched
aproaching the end of the universe (at least compared to
inserts/updates/deletes)?
If it's an OLTP system the it's harder to figure. a 3.33%
increase in data density will translate to a higher cache hit
rate but how much higher depends on a lot of factors. In our
experiments we actually got bigger boost in these kinds of
situations than the I expected (I expected comparable to the
3.33% improvement). So it could be even more than 3.33%. But
like said it depends.
If you already have the whole database cache you won't see any
improvement. If you are right on the cusp you could see a huge benefit.
These tables have high insert, update and delete rates, probably a lot
of index scans? I believe the workload on table scans should be (close
to) none.
Are you willing to share some of this measured data? I'm quite
interested in such figures.
It sounds like you're underestimating the performance drain
10% wasted space has. If we found out that one routine was
unnecessarily taking 10% of the cpu time it would be an
obvious focus of attention. 10% wasted space is going to work
out to about 10% of the i/o time.It also sounds like we're still focused on the performance
impact in absolute terms. I'm much more interested in changing
the performance characteristics so they're predictable and
scalable. It doesn't matter much if your 1kb table is 100%
slower than necessary but it does matter if your 1TB table
needs 1,000x as much vacuuming as your 1GB table even if it's
getting the same update traffic.
Or rather, the vacuuming should pay back.
A nice metric might be: cost_of_not_vacuuming / cost_of_vacuuming.
Obviously, the higher the better.
- Joris Dobbelsteen
On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
Gregory Stark wrote:
Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%. [...]The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.Could one take advantage of this observation in designing the DSM?
Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit. It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.
If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:
Bruce Momjian wrote:
Yep, agreed on the random I/O issue. The larger question is if
you have
a huge table, do you care to reclaim 3% of the table size, rather
than
just vacuum it when it gets to 10% dirty? I realize the vacuum is
going
to take a lot of time, but vacuuming to relaim 3% three times
seems like
it is going to be more expensive than just vacuuming the 10%
once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.Buying a house with a 25-year mortgage is much more expensive than
just paying cash too, but you don't always have a choice.Surely the key benefit of the partial vacuuming thing is that you
can at least do something useful with a large table if a full
vacuum takes 24 hours and you only have 4 hours of idle I/O.It's also occurred to me that all the discussion of scheduling way
back when isn't directly addressing the issue. What most people
want (I'm guessing) is to vacuum *when the user-workload allows*
and the time-tabling is just a sysadmin first-approximation at that.
Yup. I'd really like for my app to be able to say "Hmm. No
interactive users at the moment, no critical background tasks. Now
would be a really good time for the DB to do some maintenance." but
also to be able to interrupt the maintenance process if some new
users or other system load show up.
With partial vacuuming possible, we can arrange things with just
three thresholds and two measurements:
Measurement 1 = system workload
Measurement 2 = a per-table "requires vacuuming" value
Threshold 1 = workload at which we do more vacuuming
Threshold 2 = workload at which we do less vacuuming
Threshold 3 = point at which a table is considered worth vacuuming.
Once every 10 seconds, the manager compares the current workload to
the thresholds and starts a new vacuum, kills one or does nothing.
New vacuum processes keep getting started as long as there is
workload spare and tables that need vacuuming.Now the trick of course is how you measure system workload in a
meaningful manner.
I'd settle for a "start maintenance", "stop maintenance" API.
Anything else (for instance the heuristics you suggest above) would
definitely be gravy.
It's not going to be simple to do, though, I don't think.
Cheers,
Steve
On Mon, Jan 22, 2007 at 05:11:03PM -0600, Jim C. Nasby wrote:
On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
Gregory Stark wrote:
Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%. [...]The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.Could one take advantage of this observation in designing the DSM?
Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit. It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Yes, most OS have some read-ahead when reading a file from disk. Any
increment over 1 would be an improvement. If you used a counter with
a time-based decrement function, you could increase the amount that
the relation is extended based on temporal proximity. If you have
extended it several times recently, increase the size of the new
extension to reduce the overhead even further. The default should
be approximately the OS standard read-ahead amount.
Ken
On Mon, Jan 22, 2007 at 05:51:53PM +0000, Gregory Stark wrote:
Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.
I asusume this means you were reading 7% of the blocks, not skipping 7%
of the blocks when you broke even?
I presume by break-even you mean it took just as long, time-wise. But
did it have the same effect on system load? If reading only 7% of the
blocks allows the drive to complete other requests more quickly then
it's beneficial, even if the vacuum takes longer.
This may be a silly thought, I'm not sure how drives handle multiple
requests...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Thx Russel,
I want to control it from software, changing network access via pg_hba with
software doesnt feel right.
================ possible case================
Say I have a Group called Normal_Rights and one called Zero_Rights.
So dB runs as... Normal_Rights(User A, User B, User C, User D)
Then via sql, superuser REVOKEs those user rights and GRANTs them
Zero_Rights(User A, User B, User C, User D)... ie make users a member of the
ZERO rights group.
Then hopefully Postgres kicks them out gracefully?????
Then software make changes and switch's them back to their Normal_Rights
group.
================ or more general case================
RECORD all the SQL for all user rights...
REVOKE everything except needed software superusers (postgres, and program
superuser).
make changes via software.
PLAY BACK all the rights SQL script.
What do you think, will PG kill connections, let them go gracefully, stop
after current transaction????
================ maybe I'm in the wrong tree================
Is it possible to make quick structural changes to postgres, with user
activety?
Maybe start a transaction that changes structure... wonder if that will stop
or hold user activity???
Thx
.
org@kewlstuff.co.za schrieb:
Thx Russel,
I want to control it from software, changing network access via pg_hba
with software doesnt feel right.================ possible case================
Say I have a Group called Normal_Rights and one called Zero_Rights.So dB runs as... Normal_Rights(User A, User B, User C, User D)
Then via sql, superuser REVOKEs those user rights and GRANTs them
Zero_Rights(User A, User B, User C, User D)... ie make users a member of
the ZERO rights group.Then hopefully Postgres kicks them out gracefully?????
Then software make changes and switch's them back to their Normal_Rights
group.================ or more general case================
RECORD all the SQL for all user rights...
REVOKE everything except needed software superusers (postgres, and
program superuser).
make changes via software.
PLAY BACK all the rights SQL script.What do you think, will PG kill connections, let them go gracefully,
stop after current transaction????================ maybe I'm in the wrong tree================
Yes I'm thinking that too:
Is it possible to make quick structural changes to postgres, with user
activety?
of course.
Maybe start a transaction that changes structure... wonder if that will
stop or hold user activity???
Usually not - all your DDL is done in a transaction just like any other
access users would make. So it only fails (but as a whole) if you want
to modify locked tables and such. But you would not end up w/ a partly
changed database in any case. Just make sure you do everything in
a transaction. No need to suspend user accounts for that.
Regards
Tino
On Jan 22, 2007, at 6:53 PM, Kenneth Marshall wrote:
The default should
be approximately the OS standard read-ahead amount.
Is there anything resembling a standard across the OSes we support?
Better yet, is there a standard call that allows you to find out what
the read-ahead setting is?
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tue, Jan 23, 2007 at 09:01:41PM -0600, Jim Nasby wrote:
On Jan 22, 2007, at 6:53 PM, Kenneth Marshall wrote:
The default should
be approximately the OS standard read-ahead amount.Is there anything resembling a standard across the OSes we support?
Better yet, is there a standard call that allows you to find out what
the read-ahead setting is?
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Not that I am aware of. Even extending the relation by one additional
block can make a big difference in performance and should easily fall
within every read-ahead in use today. Or a GUC variable, that defaults
to a small power of 2 number of PostgreSQL blocks, with a default arrived
at by testing.
Ken
Ha ha... thx Tino
Yes, I think this is way to go, strange how my mind climbs the wrong tree
sometimes :)
I actually need to aquire a transaction across several dB's, check if the
conditions are right, and then modify some tables, write and remove some
triggers.
Transactions in postgres are 2 sophisticated, I dont think they will hold
the locks at the level I need them.
But I was thinking (climbing out of the wrong tree;)... I can just aquire
exclusive locks on the tables, and hey presto, users are on hold while the
software checks the dB's.
Effectively creating a very rough transaction, with the lock scope needed?
... ie it will keep users out long enough to align several dB's... I'm
hoping?
From: "Tino Wildenhain" <tino@wildenhain.de>
Show quoted text
================ maybe I'm in the wrong tree================
Yes I'm thinking that too:
Is it possible to make quick structural changes to postgres, with user
activety?of course.
Maybe start a transaction that changes structure... wonder if that will
stop or hold user activity???Usually not - all your DDL is done in a transaction just like any other
access users would make. So it only fails (but as a whole) if you want
to modify locked tables and such. But you would not end up w/ a partly
changed database in any case. Just make sure you do everything in
a transaction. No need to suspend user accounts for that.Regards
Tino
Hi,
org@kewlstuff.co.za wrote:
Ha ha... thx Tino
Yes, I think this is way to go, strange how my mind climbs the wrong
tree sometimes :)
I actually need to aquire a transaction across several dB's, check if
the conditions are right, and then modify some tables, write and remove
some triggers.
Transactions in postgres are 2 sophisticated, I dont think they will
hold the locks at the level I need them.
You want to read about explicit locking:
http://www.postgresql.org/docs/8.2/static/explicit-locking.html
But I was thinking (climbing out of the wrong tree;)... I can just
aquire exclusive locks on the tables, and hey presto, users are on hold
while the software checks the dB's.
I'm sure, that's possible. However, I remember you were talking about
replication, thus I have to add a warning: please keep in mind that this
does not scale. You're most probably better using two phase commit,
aren't you?
Regards
Markus
Kenneth Marshall <ktm@is.rice.edu> writes:
Not that I am aware of. Even extending the relation by one additional
block can make a big difference in performance
Do you have any evidence to back up that assertion?
It seems a bit nontrivial to me --- not the extension part exactly, but
making sure that the space will get used promptly. With the current
code the backend extending a relation will do subsequent inserts into
the block it just got, which is fine, but there's no mechanism for
remembering that any other newly-added blocks are available --- unless
you wanted to push them into the FSM, which could work but the current
FSM code doesn't support piecemeal addition of space, and in any case
there's some question in my mind about the concurrency cost of increasing
FSM traffic even more.
In short, it's hardly an unquestionable improvement, so we need some
evidence.
regards, tom lane
Good memory you have and you exactly right.
Yes... the replication is using posgres's normal transactions ie 2 phase
commits.... and it works like a dream.
When moving data during replication, the locks are happening at record
level, and its intrinsic to the postgres transaction machinery.
ie postgres is deciding how 'fine grained' the locks should be, and doing
all that other amazing MVCC stuff.
The part I'm toying with and struggling with is the start and stop, or the
admin side of the replication. As it stands now, one has to start with
identicle databases, then setup replication, and then the users come on. But
now say I want to make a structural change... as it stands I have to claim
the dB's back, fix them all, make sure they identicle, re-set up the
replication, and then the users can come back on. Its that.... the dB's must
be identicle on setting up replication, that I'm trying to get around. I
think that when it comes to the structural side, I have to hold(LOCK) those
dB's, while the software removes the replication, changes the structures of
all the dB's, reinstalls the scripts and triggers... and I want to make that
invisible to a system thats already active. Ideally the user software just
delays for say 10 seconds, and in that time, 6 dB's have been restructured,
checked and the replication restarted.
In terms of the set up I want to get it to... make those 5 dB's the same as
this template and start or continue replicating.... becomes a mind twister.
Thats the idea anyway... current version is at http://coolese.100free.com/
it works great, but you'll see it has a setup, breakdown problem on an
active system.
Thx 4 the help Johnny
From: "Markus Schiltknecht" <markus@bluegap.ch>
Show quoted text
But I was thinking (climbing out of the wrong tree;)... I can just aquire
exclusive locks on the tables, and hey presto, users are on hold while
the software checks the dB's.I'm sure, that's possible. However, I remember you were talking about
replication, thus I have to add a warning: please keep in mind that this
does not scale. You're most probably better using two phase commit, aren't
you?Regards
Markus
On Wed, Jan 24, 2007 at 07:30:05PM -0500, Tom Lane wrote:
Kenneth Marshall <ktm@is.rice.edu> writes:
Not that I am aware of. Even extending the relation by one additional
block can make a big difference in performanceDo you have any evidence to back up that assertion?
It seems a bit nontrivial to me --- not the extension part exactly, but
making sure that the space will get used promptly. With the current
code the backend extending a relation will do subsequent inserts into
the block it just got, which is fine, but there's no mechanism for
remembering that any other newly-added blocks are available --- unless
you wanted to push them into the FSM, which could work but the current
FSM code doesn't support piecemeal addition of space, and in any case
there's some question in my mind about the concurrency cost of increasing
FSM traffic even more.In short, it's hardly an unquestionable improvement, so we need some
evidence.regards, tom lane
My comment was purely based on the reduction in fragmentation of the
file behind the relation. A result that I have seen repeatedly in file
related data processing. It does sound much more complicated to make the
additional space available to other backends. If one backend was doing
many inserts, it might still be of value even for just that backend. As
you mention, testing is needed to see if there is enough value in this
process.
Ken
Jim C. Nasby wrote:
On Mon, Jan 22, 2007 at 12:17:39PM -0800, Ron Mayer wrote:
Gregory Stark wrote:
Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%. [...]The theory online was that as long as you're reading one page from each disk
track you're going to pay the same seek overhead as reading the entire track.Could one take advantage of this observation in designing the DSM?
Instead of a separate bit representing every page, having each bit
represent 20 or so pages might be a more useful unit. It sounds
like the time spent reading would be similar; while the bitmap
would be significantly smaller.If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
Actually, there is evidence that adding only a single page to the end
causes a lot of contention for that last page, and that adding a few
might be better.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Jim C. Nasby wrote:
If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.
Actually, there is evidence that adding only a single page to the end
causes a lot of contention for that last page, and that adding a few
might be better.
Evidence where? The code is designed so that the last page *isn't*
shared --- go read the comments in hio.c sometime.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Jim C. Nasby wrote:
If we extended relations by more than one page at a time we'd probably
have a better shot at the blocks on disk being contiguous and all read
at the same time by the OS.Actually, there is evidence that adding only a single page to the end
causes a lot of contention for that last page, and that adding a few
might be better.Evidence where? The code is designed so that the last page *isn't*
shared --- go read the comments in hio.c sometime.
I was talking about the last page of a table, where INSERTs all cluster
on that last page and cause lots of page locking. hio.c does look like
it avoids that problem.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +