Reusing Dead Tuples:
Hi,
I am doing some experiments on dead tuples, I am looking of reusing the
dead tuples apace in a particular page during the "Update".This patch
is meant for the tables
which are heavily updated to avoid vacuum very frequently.By using it
will arrest the size of
table for heavily updated table. The algorithm works like this:
1) During the update it check for the dead tuples in the current
page(page that contain
the tuple that need to be updated). If it finds any dead tuples it uses
the dead tuple space
by ovewriting on dead tuple. The checking of dead tuple is very similer
to the task that of
lazy vaccum.
2) If it cannot find any dead tuple it proceed as usual by inserting
at the end of table .
Performance Effect:
1) The CPU processing will be slighly more for the update, but io
processing is
exactly same
2) The size of table grows slower under heavy update , so vacuum is
not required very frequently.
The total processing for update is more or less same even after
doing large number of updates without vacuum.
Does it breaks anythings by overwriting the dead tuples ?.
Comments?.
jana
Janardhan <jana-reddy@mediaring.com.sg> writes:
Does it breaks anythings by overwriting the dead tuples ?.
Yes. You cannot do that unless you've first removed index entries
pointing at the dead tuples --- and jumped through the same locking
hoops that lazy vacuum does while removing index entries.
regards, tom lane
Tom Lane wrote:
Janardhan <jana-reddy@mediaring.com.sg> writes:
Does it breaks anythings by overwriting the dead tuples ?.
Yes. You cannot do that unless you've first removed index entries
pointing at the dead tuples --- and jumped through the same locking
hoops that lazy vacuum does while removing index entries.regards, tom lane
if i am not wrong while updating a tuple, we are also creating a new
index entry . so if the
tuple is dead then the index entry pointing it also a dead index tuple.
so even if dead index tuple is not
removed then also it should not break thing, since the dead index tuple
will not be used, am i correct?.
what is reason why the dead heap tuples are maintained in a linked list
?. since for every dead heap tuple there
is a corresponding dead index tuple.
Regards
jana
Janardhan <jana-reddy@mediaring.com.sg> writes:
if i am not wrong while updating a tuple, we are also creating a new
index entry .
Yes.
so if the
tuple is dead then the index entry pointing it also a dead index tuple.
Yes.
so even if dead index tuple is not
removed then also it should not break thing, since the dead index tuple
will not be used, am i correct?.
No. A process running an indexscan will assume that the index tuple
accurately describes the heap tuple it is pointing at. If the heap
tuple is live then it will be returned as satisfying the indexscan.
regards, tom lane
Tom Lane wrote:
Janardhan <jana-reddy@mediaring.com.sg> writes:
Does it breaks anythings by overwriting the dead tuples ?.
Yes. You cannot do that unless you've first removed index entries
pointing at the dead tuples --- and jumped through the same locking
hoops that lazy vacuum does while removing index entries.regards, tom lane
Does it breaks any other things if all the index entries pointing the
dead tuple are removed before reusing the dead tuple?.
Regards
jana
Janardhan <jana-reddy@mediaring.com.sg> writes:
Does it breaks any other things if all the index entries pointing the
dead tuple are removed before reusing the dead tuple?.
Possibly you could make that work, but I think you'll find the
efficiency advantage you were chasing to be totally gone. The locking
scheme is heavily biased against you, and the index AMs don't offer an
API designed for efficient retail index-tuple deletion.
Of course that just says that you're swimming against the tide of
previous optimization efforts. But the thing you need to face up to
is you are taking what had been background maintenance tasks (viz,
VACUUM) and moving them into the foreground critical path. This *will*
slow down your foreground applications.
regards, tom lane
Tom Lane wrote:
Janardhan <jana-reddy@mediaring.com.sg> writes:
Does it breaks any other things if all the index entries pointing the
dead tuple are removed before reusing the dead tuple?.Possibly you could make that work, but I think you'll find the
efficiency advantage you were chasing to be totally gone. The locking
scheme is heavily biased against you, and the index AMs don't offer an
API designed for efficient retail index-tuple deletion.Of course that just says that you're swimming against the tide of
previous optimization efforts. But the thing you need to face up to
is you are taking what had been background maintenance tasks (viz,
VACUUM) and moving them into the foreground critical path. This *will*
slow down your foreground applications.regards, tom lane
today i could able to complete the patch and it is working only for
b-tree. i have added a new method am_delete
to the API and bt_delete to the B-tree index to delete a single entry.
for the timebeing this works only with
b-tree indexs.
Regarding the complexity of deleting a tuple from b-tree , it is same
or less then that of
inserting a tuple into a B-tree( since delete does not require spliting
the page). The approach is slightly
different to that of lazy vacuum. lazy vacuum scan entire index table to
delete the dead entries.
here it search for the pariticilar entry similer to that of insert .
here locking may not have much impact. It locks only single buffer to
delete the index entry.
Regarding the efficiency, if the entire Index table is in buffered then
it does not require any
additional IO , only extra CPU is required to delete entries in index table.
I am using postgres in a application where is there is heavy updates for
group of tables(small size), before inserting
a single record in huge table. this entire thing constitue single
transaction. currently as time goes on the transaction
processing speed decreases till the database is vacuumed.
Using this new patch i am hoping the trasaction processing time will
remain constant irrespective of time. Only i need to do
vaccum once i delete large number of entries from some of the tables.
regards, jana