Vacuum improvement

Started by Gavin Sherryabout 23 years ago11 messages
#1Gavin Sherry
swm@linuxworld.com.au

Hi all,

I'm thinking that there is an improvement to vacuum which could be made
for 7.4. VACUUM FULLing large, heavily updated tables is a pain. There's
very little an application can do to minimise dead-tuples, particularly if
the table is randomly updated. Wouldn't it be beneficial if VACUUM could
have a parameter which specified how much of the table is vacuumed. That
is, you could specify:

VACUUM FULL test 20 precent;

Yes, terrible syntax but regardless: this would mean that we could
spread the vacuum out and not, possibly, be backing up queues. ANALYZE
could be modified, if necessary.

Thoughts?

Gavin

#2Greg Copeland
greg@CopelandConsulting.Net
In reply to: Gavin Sherry (#1)
Re: Vacuum improvement

That a good idea. That way, if your database slows during specific
windows in time, you can vacuum larger sizes, etc. Seemingly would help
you better manage your vacuuming against system loading.

Greg

Show quoted text

On Tue, 2002-10-15 at 19:22, Gavin Sherry wrote:

Hi all,

I'm thinking that there is an improvement to vacuum which could be made
for 7.4. VACUUM FULLing large, heavily updated tables is a pain. There's
very little an application can do to minimise dead-tuples, particularly if
the table is randomly updated. Wouldn't it be beneficial if VACUUM could
have a parameter which specified how much of the table is vacuumed. That
is, you could specify:

VACUUM FULL test 20 precent;

Yes, terrible syntax but regardless: this would mean that we could
spread the vacuum out and not, possibly, be backing up queues. ANALYZE
could be modified, if necessary.

Thoughts?

Gavin

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#1)
Re: Vacuum improvement

Gavin Sherry <swm@linuxworld.com.au> writes:

have a parameter which specified how much of the table is vacuumed. That
is, you could specify:
VACUUM FULL test 20 precent;

Erm ... but which 20 percent? In other words, how could you arrange for
repeated applications of such a command to cover the whole table, and
not just retrace an already-cleaned-out portion?

I don't object to the idea in principle, but I am not sure how to
implement it in practice.

regards, tom lane

#4Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#3)
Re: Vacuum improvement

On Tue, Oct 15, 2002 at 11:52:35PM -0400, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

have a parameter which specified how much of the table is vacuumed. That
is, you could specify:
VACUUM FULL test 20 precent;

Erm ... but which 20 percent? In other words, how could you arrange for
repeated applications of such a command to cover the whole table, and
not just retrace an already-cleaned-out portion?

Maybe each relation block can have a last-vacuumed timestamp? Somewhere
in the table there would have to be a linked list of least-recently
vacuumed blocks so the vacuum cleaner does not have to read every
block to know which one to clean.

Or maybe some system table can provide information about activity in
each block since last vacuum. This forces the use of the stat
collector...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).

#5Hannu Krosing
hannu@tm.ee
In reply to: Gavin Sherry (#1)
Re: Vacuum improvement

On Wed, 2002-10-16 at 05:22, Gavin Sherry wrote:

Hi all,

I'm thinking that there is an improvement to vacuum which could be made
for 7.4. VACUUM FULLing large, heavily updated tables is a pain. There's
very little an application can do to minimise dead-tuples, particularly if
the table is randomly updated. Wouldn't it be beneficial if VACUUM could
have a parameter which specified how much of the table is vacuumed. That
is, you could specify:

VACUUM FULL test 20 precent;

What about

VACUUM FULL test WORK 5 SLEEP 50;

meaning to VACUUM FULL the whole table, but to work in small chunks and
relaese all locks and let others access the tables between these ?

You could even fire up a separate thread to

VACUUM [FULL] test WORK 5 SLEEP 50 CONTINUOUS;

To keep vacuuming a heavily updated table.

------------------
Hannu

#6Gavin Sherry
swm@linuxworld.com.au
In reply to: Hannu Krosing (#5)
Re: Vacuum improvement

On 16 Oct 2002, Hannu Krosing wrote:

On Wed, 2002-10-16 at 05:22, Gavin Sherry wrote:

Hi all,

I'm thinking that there is an improvement to vacuum which could be made
for 7.4. VACUUM FULLing large, heavily updated tables is a pain. There's
very little an application can do to minimise dead-tuples, particularly if
the table is randomly updated. Wouldn't it be beneficial if VACUUM could
have a parameter which specified how much of the table is vacuumed. That
is, you could specify:

VACUUM FULL test 20 precent;

What about

VACUUM FULL test WORK 5 SLEEP 50;

meaning to VACUUM FULL the whole table, but to work in small chunks and
relaese all locks and let others access the tables between these ?

Great idea. I think this could work as a complement to the idea I had. To
answer Tom's question, how would we know what we've vacuumed, we could
store the range of tids we've vacuumed in pg_class. Or, we could store the
block offset of where we left off vacuuming before and using stats, run
for another X% of the heap. Is this possible?

Gavin

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#5)
Re: Vacuum improvement

Hannu Krosing <hannu@tm.ee> writes:

meaning to VACUUM FULL the whole table, but to work in small chunks and
relaese all locks and let others access the tables between these ?

AFAICS this is impossible for VACUUM FULL. You can't let other accesses
in and then resume processing, because that invalidates all the state
you have about where to put moved tuples.

But the whole point of developing non-FULL vacuuming was to make
something that could be run concurrently with other stuff. I fail
to see the point of insisting that frequent vacuums be FULL.

regards, tom lane

#8Greg Copeland
greg@CopelandConsulting.Net
In reply to: Gavin Sherry (#6)
Re: Vacuum improvement

On Wed, 2002-10-16 at 02:29, Gavin Sherry wrote:

On 16 Oct 2002, Hannu Krosing wrote:

On Wed, 2002-10-16 at 05:22, Gavin Sherry wrote:

Hi all,

I'm thinking that there is an improvement to vacuum which could be made
for 7.4. VACUUM FULLing large, heavily updated tables is a pain. There's
very little an application can do to minimise dead-tuples, particularly if
the table is randomly updated. Wouldn't it be beneficial if VACUUM could
have a parameter which specified how much of the table is vacuumed. That
is, you could specify:

VACUUM FULL test 20 precent;

What about

VACUUM FULL test WORK 5 SLEEP 50;

meaning to VACUUM FULL the whole table, but to work in small chunks and
relaese all locks and let others access the tables between these ?

Great idea. I think this could work as a complement to the idea I had. To
answer Tom's question, how would we know what we've vacuumed, we could
store the range of tids we've vacuumed in pg_class. Or, we could store the
block offset of where we left off vacuuming before and using stats, run
for another X% of the heap. Is this possible?

Why couldn't you start your % from the first rotten/dead tuple? Just
reading through trying to find the first tuple to start counting from
wouldn't hold locks would it? That keeps you from having to track stats
and ensures that X% of the tuples will be vacuumed.

Greg

#9David Walker
pgsql@grax.com
In reply to: Greg Copeland (#8)
Re: Vacuum improvement

Vacuum full locks the whole table currently. I was thinking if you used a
similar to a hard drive defragment that only 2 rows would need to be locked
at a time. When you're done vacuum/defragmenting you shorten the file to
discard the dead tuples that are located after your useful data. There might
be a need to lock the table for a little while at the end but it seems like
you could reduce that time greatly.

I had one table that is heavily updated and it grew to 760 MB even with
regular vacuuming. A vacuum full reduced it to 1.1 MB. I am running 7.2.0
(all my vacuuming is done by superuser).

Show quoted text

On Wednesday 16 October 2002 09:30 am, (Via wrote:

On Wed, 2002-10-16 at 02:29, Gavin Sherry wrote:

On 16 Oct 2002, Hannu Krosing wrote:

On Wed, 2002-10-16 at 05:22, Gavin Sherry wrote:

Hi all,

I'm thinking that there is an improvement to vacuum which could be
made for 7.4. VACUUM FULLing large, heavily updated tables is a pain.
There's very little an application can do to minimise dead-tuples,
particularly if the table is randomly updated. Wouldn't it be
beneficial if VACUUM could have a parameter which specified how much
of the table is vacuumed. That is, you could specify:

VACUUM FULL test 20 precent;

What about

VACUUM FULL test WORK 5 SLEEP 50;

meaning to VACUUM FULL the whole table, but to work in small chunks and
relaese all locks and let others access the tables between these ?

Great idea. I think this could work as a complement to the idea I had. To
answer Tom's question, how would we know what we've vacuumed, we could
store the range of tids we've vacuumed in pg_class. Or, we could store
the block offset of where we left off vacuuming before and using stats,
run for another X% of the heap. Is this possible?

Why couldn't you start your % from the first rotten/dead tuple? Just
reading through trying to find the first tuple to start counting from
wouldn't hold locks would it? That keeps you from having to track stats
and ensures that X% of the tuples will be vacuumed.

Greg

#10Greg Copeland
greg@CopelandConsulting.Net
In reply to: David Walker (#9)
Re: Vacuum improvement

But doesn't the solution I offer present a possible work around? The
table wouldn't need to be locked (I think) until the first dead tuple
were located. After that, you would only keep the locks until you've
scanned X% of the table and shrunk as needed. The result, I think,
results in incremental vacuuming with shorter duration locks being
held. It's not ideal (locks) but may shorten the duration behind help
by locks.

I'm trying to figure out if the two approaches can't be combined
somehow. That is, a percent with maybe even a max lock duration?

Greg

Show quoted text

On Wed, 2002-10-16 at 11:33, David Walker wrote:

Vacuum full locks the whole table currently. I was thinking if you used a
similar to a hard drive defragment that only 2 rows would need to be locked
at a time. When you're done vacuum/defragmenting you shorten the file to
discard the dead tuples that are located after your useful data. There might
be a need to lock the table for a little while at the end but it seems like
you could reduce that time greatly.

I had one table that is heavily updated and it grew to 760 MB even with
regular vacuuming. A vacuum full reduced it to 1.1 MB. I am running 7.2.0
(all my vacuuming is done by superuser).

On Wednesday 16 October 2002 09:30 am, (Via wrote:

On Wed, 2002-10-16 at 02:29, Gavin Sherry wrote:

On 16 Oct 2002, Hannu Krosing wrote:

On Wed, 2002-10-16 at 05:22, Gavin Sherry wrote:

Hi all,

I'm thinking that there is an improvement to vacuum which could be
made for 7.4. VACUUM FULLing large, heavily updated tables is a pain.
There's very little an application can do to minimise dead-tuples,
particularly if the table is randomly updated. Wouldn't it be
beneficial if VACUUM could have a parameter which specified how much
of the table is vacuumed. That is, you could specify:

VACUUM FULL test 20 precent;

What about

VACUUM FULL test WORK 5 SLEEP 50;

meaning to VACUUM FULL the whole table, but to work in small chunks and
relaese all locks and let others access the tables between these ?

Great idea. I think this could work as a complement to the idea I had. To
answer Tom's question, how would we know what we've vacuumed, we could
store the range of tids we've vacuumed in pg_class. Or, we could store
the block offset of where we left off vacuuming before and using stats,
run for another X% of the heap. Is this possible?

Why couldn't you start your % from the first rotten/dead tuple? Just
reading through trying to find the first tuple to start counting from
wouldn't hold locks would it? That keeps you from having to track stats
and ensures that X% of the tuples will be vacuumed.

Greg

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#11Robert Treat
xzilla@users.sourceforge.net
In reply to: David Walker (#9)
Re: Vacuum improvement

On Wed, 2002-10-16 at 12:33, David Walker wrote:

Vacuum full locks the whole table currently. I was thinking if you used a
similar to a hard drive defragment that only 2 rows would need to be locked
at a time. When you're done vacuum/defragmenting you shorten the file to
discard the dead tuples that are located after your useful data. There might
be a need to lock the table for a little while at the end but it seems like
you could reduce that time greatly.

I had one table that is heavily updated and it grew to 760 MB even with
regular vacuuming. A vacuum full reduced it to 1.1 MB. I am running 7.2.0
(all my vacuuming is done by superuser).

Not that I'm against the idea, but isn't this just a sign that your just
not vacuuming frequently enough?

Robert Treat