a vacuum thread is not the answer

Started by mlwalmost 24 years ago4 messages
#1mlw
markw@mohawksoft.com

I was just toying around with things, and you know, running vacuum in the
background doesn't work. It slows things down too much.

The worst case senario is when one does this:

update accounts set abalance = abalance + 1 ;

This takes forever to run and doubles the size of the table.

Is there a way that a separate thread managing the freelist can perform a "per
row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
need to be able to recover rows already in memory for performance.

#2Hannu Krosing
hannu@krosing.net
In reply to: mlw (#1)
Re: a vacuum thread is not the answer

On Thu, 2002-03-07 at 20:15, mlw wrote:

I was just toying around with things, and you know, running vacuum in the
background doesn't work. It slows things down too much.

The worst case senario is when one does this:

update accounts set abalance = abalance + 1 ;

This takes forever to run and doubles the size of the table.

How is this related to running vacuum in background ?

Does it run fast when vacuum is not running ?

Is there a way that a separate thread managing the freelist can perform a "per
row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
need to be able to recover rows already in memory for performance.

What could be possibly done (and is probably not very useful anyway) is
reusing the row modified _in_the_same_transaction_ so that

begin;
abalance = abalance + 1 ;
abalance = abalance + 1 ;
abalance = abalance + 1 ;
end;

would consume just 2x the tablespace and not 4x. But this does not
require a separate thread, just some changes in update logic.

OTOH, this will probably interfere with some transaction modes that make
use of command ids.

--------------
Hannu

#3Jan Wieck
janwieck@yahoo.com
In reply to: mlw (#1)
Re: a vacuum thread is not the answer

mlw wrote:

I was just toying around with things, and you know, running vacuum in the
background doesn't work. It slows things down too much.

The worst case senario is when one does this:

update accounts set abalance = abalance + 1 ;

This takes forever to run and doubles the size of the table.

Is there a way that a separate thread managing the freelist can perform a "per
row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
need to be able to recover rows already in memory for performance.

So you want to reuse space from rows before your transaction
committed? Fine, I'm all for it, as long as

begin ;
update accounts set abalance = abalance + 1 ;
rollback ;

still works.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#4mlw
markw@mohawksoft.com
In reply to: mlw (#1)
Re: a vacuum thread is not the answer

Hannu Krosing wrote:

On Thu, 2002-03-07 at 20:15, mlw wrote:

I was just toying around with things, and you know, running vacuum in the
background doesn't work. It slows things down too much.

The worst case senario is when one does this:

update accounts set abalance = abalance + 1 ;

This takes forever to run and doubles the size of the table.

How is this related to running vacuum in background ?

Does it run fast when vacuum is not running ?

The problem is that it doubles the size of a table. This invariably means that
you have more I/O. If there were a way to reuse old tulples, while they are
still in the buffer cache, then PostgreSQL could handle this query faster.

It was, however, pointed out that (obviously) you can't do reclaimation during
a transaction because if it fails or someone issues "rollback" you have broken
the database.

So, I guess I'm saying ignore that part.

Is there a way that a separate thread managing the freelist can perform a "per
row" vacuum concurrently? Maybe I am stating the problem incorrectly, but we
need to be able to recover rows already in memory for performance.

What could be possibly done (and is probably not very useful anyway) is
reusing the row modified _in_the_same_transaction_ so that

begin;
abalance = abalance + 1 ;
abalance = abalance + 1 ;
abalance = abalance + 1 ;
end;

would consume just 2x the tablespace and not 4x. But this does not
require a separate thread, just some changes in update logic.

OTOH, this will probably interfere with some transaction modes that make
use of command ids.

I haven't looked at the code, so I don't even know if it is doable. Could a
small vacuum thread run in the background and monitor the buffer cache? When it
finds a buffer with an unreferenced tuple, do what vacuum does, but only to
that block?

Here is my problem with vacuum. It scans the whole damn table and it takes a
long time. In many, dare I say most, SQL databases, the rows which are updated
are likely a small percent.

If a small vacuum routine can be run against the blocks that are already in the
buffer, this will eliminate a block read, and focus more on blocks which are
likely to have been modified.