Suggestion; "WITH VACUUM" option
Tom, Folks:
Joe and I were discussing your recent discussion about the costs of VACUUM and
tuple maintainence, and I had an interesting idea.
How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.
I agree that we don't want immediate tuple maintainence, most of the time.
However, a couple of my clients would really like to have a few queries do
their vacuum immediately, and are currently getting around that by vacuuming
every 5 mintutes.
Easy? Hard? Insane? What do you think?
--
-Josh Berkus
Aglio Database Solutions
San Francisco
On Mon, 16 Dec 2002, Josh Berkus wrote:
Tom, Folks:
Joe and I were discussing your recent discussion about the costs of VACUUM and
tuple maintainence, and I had an interesting idea.How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.I agree that we don't want immediate tuple maintainence, most of the time.
However, a couple of my clients would really like to have a few queries do
their vacuum immediately, and are currently getting around that by vacuuming
every 5 mintutes.Easy? Hard? Insane? What do you think?
Just curious, but wouldn't it be just as simple to issue a VACUUM call
right after the UPDATE/DELETE?
Marc,
Easy? Hard? Insane? What do you think?
Just curious, but wouldn't it be just as simple to issue a VACUUM call
right after the UPDATE/DELETE?
Well, you can't do that as part of a transaction or procedure, whereas ....
Hmmm. Couldn't do "with vacuum" as part of a transaction, either.
Funny, it sounded like a really good idea on the phone.
In the words of Rosanne Rosanna Danna,
"Neeeveeeeeer miiiiiiiiind."
--
-Josh Berkus
Aglio Database Solutions
San Francisco
Josh Berkus <josh@agliodbs.com> writes:
How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.
Easy? Hard? Insane? What do you think?
Impossible. You can't vacuum a tuple until the last open transaction
that can see it is gone. It is therefore *impossible* for a transaction
to vacuum away its own detritus; until the transaction commits, you
can't even start to wonder whether other open transactions see it or
not.
Vacuuming has to be done later, and that being the case, I don't see any
real advantage to altering the "background vacuum" design we have.
regards, tom lane
On Mon, 16 Dec 2002, Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.Easy? Hard? Insane? What do you think?
Impossible. You can't vacuum a tuple until the last open transaction
that can see it is gone. It is therefore *impossible* for a transaction
to vacuum away its own detritus; until the transaction commits, you
can't even start to wonder whether other open transactions see it or
not.Vacuuming has to be done later, and that being the case, I don't see any
real advantage to altering the "background vacuum" design we have.
Then, would a "commit with vacuum" work? OR a "begin transaction with
vacuum" Just tossing them out there...
On Mon, 16 Dec 2002, scott.marlowe wrote:
On Mon, 16 Dec 2002, Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.Easy? Hard? Insane? What do you think?
Impossible. You can't vacuum a tuple until the last open transaction
that can see it is gone. It is therefore *impossible* for a transaction
to vacuum away its own detritus; until the transaction commits, you
can't even start to wonder whether other open transactions see it or
not.Vacuuming has to be done later, and that being the case, I don't see any
real advantage to altering the "background vacuum" design we have.Then, would a "commit with vacuum" work? OR a "begin transaction with
vacuum" Just tossing them out there...
Tom will correct me here, but I believe what he was trying to get across
isn't that the 'current transaction' is the problem ... the problem is the
other connections who have open transactions ... my simplistic
understanding (and I hope it isn't too flawed) of MVCC is that as long as
*one* transaction is outstanding on a tuple, that tuple can't be
physically removed ... as far as any new transactions are concerned, it
has disappeared ... so if I open a transaction, then you open one 'with
vacuum', your 'with vacuum' will fail unless I happen to be out of my
transaction before you ...
On Mon, 16 Dec 2002, Marc G. Fournier wrote:
On Mon, 16 Dec 2002, scott.marlowe wrote:
On Mon, 16 Dec 2002, Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.Easy? Hard? Insane? What do you think?
Impossible. You can't vacuum a tuple until the last open transaction
that can see it is gone. It is therefore *impossible* for a transaction
to vacuum away its own detritus; until the transaction commits, you
can't even start to wonder whether other open transactions see it or
not.Vacuuming has to be done later, and that being the case, I don't see any
real advantage to altering the "background vacuum" design we have.Then, would a "commit with vacuum" work? OR a "begin transaction with
vacuum" Just tossing them out there...Tom will correct me here, but I believe what he was trying to get across
isn't that the 'current transaction' is the problem ... the problem is the
other connections who have open transactions ... my simplistic
understanding (and I hope it isn't too flawed) of MVCC is that as long as
*one* transaction is outstanding on a tuple, that tuple can't be
physically removed ... as far as any new transactions are concerned, it
has disappeared ... so if I open a transaction, then you open one 'with
vacuum', your 'with vacuum' will fail unless I happen to be out of my
transaction before you ...
Oh, yeah I have no doubt of that. I was thinking more along the lines of
when a transaction ends it throws a background "vacuum table1;vacuum
table2;vacuum tablen" command into some kind of vacuuming hopper. I.e. it
doesn't block waiting, it runs it as though it were run AFTER the
transaction. If there are a few tuples from other transactions we can't
reclaim, no big deal.
The other option would be some kind of GUC that set a max number of rows
deleted/updated in a table in a transaction that would trigger this kind
of thing automagically.
But I could see such a setting causing just as much harm (chaos theory
anyone? :-) as good.
"scott.marlowe" <scott.marlowe@ihs.com> writes:
Oh, yeah I have no doubt of that. I was thinking more along the lines of
when a transaction ends it throws a background "vacuum table1;vacuum
table2;vacuum tablen" command into some kind of vacuuming hopper.
Actually, the plans I liked best for driving auto-vacuum were
essentially an indirect version of that: the FSM module would keep track
of committed deletes + aborted inserts for each active table, and then
the autovacuum scheduler could use that info to decide which tables are
highest-priority to vacuum.
(Or possibly the runtime stats module would be a better place to track
it than FSM.)
regards, tom lane
Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
How hard would it be to add a "WITH (VACUUM)" option to UPDATE and DELETE
queries? This option would cause the regular vacuum activity -- purging the
dead tuple and its index references -- to be done immediately, as part of the
statement, instead of being deferred.Easy? Hard? Insane? What do you think?
Impossible. You can't vacuum a tuple until the last open transaction
that can see it is gone. It is therefore *impossible* for a transaction
to vacuum away its own detritus; until the transaction commits, you
can't even start to wonder whether other open transactions see it or
not.Vacuuming has to be done later, and that being the case, I don't see any
real advantage to altering the "background vacuum" design we have.
This does raise an interresting question, and I understand that it is
*impossible* to do with PostgreSQL as it currently exists, however, let
me just toss this out there:
Suppose you do this:
update largetable set foo=bar;
Lets also assume that "largetable" has tens of millions of rows. I have
databases like this, and I sometimes do operations like this. I have
found it more efficient to break up the update into a series of:
update largetable set foo=bar where somefield < a;
vacuum
update largetable set foo=bar where somefield < b;
vacuum
update largetable set foo=bar where somefield < c;
vacuum
update largetable set foo=bar where not foo = bar;
vacuum
On some of my databases a statement which updates all the rows is
unworkable in PostgreSQL, on Oracle, however, there is no poblem.
For my use, it is a pain in the neck to deal with, but not unworkable.
For some other users, it may be a bigger problem.
Show quoted text
On Tue, 17 Dec 2002, mlw wrote:
update largetable set foo=bar;
Lets also assume that "largetable" has tens of millions of rows.
[..]
On some of my databases a statement which updates all the rows is
unworkable in PostgreSQL, on Oracle, however, there is no poblem.
.. provided you have a lot of rollback space, which is
essentially what the datafile growth here is providing.
Matthew.