HOT line pointer bloat and PageRepairFragmentation

Started by Pavan Deolaseeover 18 years ago8 messages
#1Pavan Deolasee
pavan.deolasee@gmail.com

We know that HOT can cause line pointer bloat because of redirect dead
line pointers. In the worst case there could be MaxHeapTuplesPerPage
redirect-dead line pointers in a page. VACUUM can reclaim these line
pointers and mark them ~LP_USED (what is now called LP_UNUSED).
But we don't reclaim the space used by unused line pointers during
repairing page fragmentation, and hence we would never be able to
remove the line pointer bloat completely. Fundamentally we should
be able to reclaim the unused line pointers at the end of the lp array
(i.e. unused line pointers immediate to pd_lower)

I had earlier tried to repair the bloat by reclaiming the space used
by LP_UNUSED line pointers at the end of the array. But it doesn't work
well with VACUUM FULL which tracks unused line pointers for moving
tuples. Its not that we can not fix that issue, but I am reluctant to spend
time on that right now because many of us feel that VACUUM FULL is
near its EOL.

How about passing a boolean to PageRepairFragmentation to
command it to reclaim unused line pointers ? We pass "true" at all
places except in the VACUUM FULL code path. IOW we reclaim unused
line pointers in defragmentation and LAZY VACUUM. We would need
to WAL log this information in xl_heap_clean so that we redo the same
during recovery. I have a patch ready since I had already implemented
this few weeks back.

Comments ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavan Deolasee (#1)
Re: HOT line pointer bloat and PageRepairFragmentation

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

How about passing a boolean to PageRepairFragmentation to
command it to reclaim unused line pointers ?

The difficulty with this is having to be 100% confident that noplace in
the system tries to dereference a TID without checking that the line
number (offset) is within range. At one time that was demonstrably
not so. I think we've cleaned up most if not all such places, but
I wouldn't want to swear to it.

I'm not convinced it's worth taking any risk for.

regards, tom lane

#3Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Tom Lane (#2)
Re: HOT line pointer bloat and PageRepairFragmentation

On 9/13/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The difficulty with this is having to be 100% confident that noplace in
the system tries to dereference a TID without checking that the line
number (offset) is within range. At one time that was demonstrably
not so. I think we've cleaned up most if not all such places, but
I wouldn't want to swear to it.

If there are such places, aren't we already in problem ? An unused
line pointer can be reused for unrelated tuple. Dereferencing the TID
can cause data corruption, isn't it ? If you want, I can do
a quick search for all callers of PageGetItemId and confirm that
the offset is checked and add any missing checks.

In normal circumstances, line pointer bloat should not occur. But in
some typical cases it may cause unrepairable damage. For example:

CREATE TABLE test (a int, b char(200));
CREATE UNIQUE INDEX testindx ON test(a);
INSERT INTO test VALUES (1, 'foo');

Now, if we repeatedly update the tuple so that each update is a
COLD update, we would bloat the page with redirect-dead line pointers.

Any other idea to recover from this ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#4Zeugswetter Andreas ADI SD
Andreas.Zeugswetter@s-itsolutions.at
In reply to: Pavan Deolasee (#3)
Re: HOT line pointer bloat and PageRepairFragmentation

CREATE TABLE test (a int, b char(200));
CREATE UNIQUE INDEX testindx ON test(a);
INSERT INTO test VALUES (1, 'foo');

Now, if we repeatedly update the tuple so that each update is a
COLD update, we would bloat the page with redirect-dead line pointers.

Um, sorry for not understanding, but why would a COLD update produce a
redirect-dead line pointer (and not two LP_NORMAL ones) ?

Andreas

#5Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Zeugswetter Andreas ADI SD (#4)
Re: HOT line pointer bloat and PageRepairFragmentation

On 9/13/07, Zeugswetter Andreas ADI SD <Andreas.Zeugswetter@s-itsolutions.at>
wrote:

CREATE TABLE test (a int, b char(200));
CREATE UNIQUE INDEX testindx ON test(a);
INSERT INTO test VALUES (1, 'foo');

Now, if we repeatedly update the tuple so that each update is a
COLD update, we would bloat the page with redirect-dead line pointers.

Um, sorry for not understanding, but why would a COLD update produce a
redirect-dead line pointer (and not two LP_NORMAL ones) ?

The COLD updated (old) tuple would be pruned to dead line pointer
once the tuple becomes DEAD. Normally that would let us reuse the
tuple storage for other purposes. We do the same for DELETEd tuples.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#6Zeugswetter Andreas ADI SD
Andreas.Zeugswetter@s-itsolutions.at
In reply to: Pavan Deolasee (#5)
Re: HOT line pointer bloat and PageRepairFragmentation

The COLD updated (old) tuple would be pruned to dead line pointer
once the tuple becomes DEAD. Normally that would let us reuse the
tuple storage for other purposes. We do the same for DELETEd tuples.

Oh, I thought only pruned tuples from HOT chains can produce a
"redirect dead" line pointer.

This looks like a problem, since we might end up with a page filled with
LP_DEAD slots, that all have no visibility info and can thus not be
cleaned
by vacuum.

Maybe PageRepairFragmentation when called from HOT should prune less
aggressively. e.g. prune until a max of 1/2 the available slots are
LP_DEAD,
and not prune the rest.

Andreas

#7Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Zeugswetter Andreas ADI SD (#6)
Re: HOT line pointer bloat and PageRepairFragmentation

On 9/13/07, Zeugswetter Andreas ADI SD <Andreas.Zeugswetter@s-itsolutions.at>
wrote:

The COLD updated (old) tuple would be pruned to dead line pointer
once the tuple becomes DEAD. Normally that would let us reuse the
tuple storage for other purposes. We do the same for DELETEd tuples.

Oh, I thought only pruned tuples from HOT chains can produce a
"redirect dead" line pointer.

This looks like a problem, since we might end up with a page filled with
LP_DEAD slots, that all have no visibility info and can thus not be
cleaned
by vacuum.

It has nothing to do with visibility info. We already know the tuple is DEAD
and thats why its line pointer is LP_DEAD.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas ADI SD (#6)
Re: HOT line pointer bloat and PageRepairFragmentation

"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:

...This looks like a problem, since we might end up with a page filled with
LP_DEAD slots, that all have no visibility info and can thus not be
cleaned by vacuum.

No, it's the other way round: an LP_DEAD item pointer can *always* be
cleaned by VACUUM. It would not have become LP_DEAD unless someone had
confirmed that the pointed-to tuple was no longer visible to anyone.

The only reason we have LP_DEAD at all is that we don't want HOT pruning
to be required to remove the index entries that link to the item pointer.

regards, tom lane