Bug in visibility hint bit

Started by Jeff Janesover 16 years ago4 messageshackers
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

There seems to be a bug in the visibility map in 8.4.0, introduced to
cvs on 2008-12-03. It results in tuples being called visible that
shouldn't be.

In heap_update function from heapam.c:

/*
* Note: we mustn't clear PD_ALL_VISIBLE flags before writing the WAL
* record, because log_heap_update looks at those flags to set the
* corresponding flags in the WAL record.
*/

So the full_page_write of the block sent to WAL has the wrong
PD_ALL_VISIBLE. It needs to be fixed during WAL replay after a crash.
But it is not.

In heap_xlog_update:

if (record->xl_info & XLR_BKP_BLOCK_1)
{
if (samepage)
return; /* backup
block covered both changes */
goto newt;
}

The goto newt causes it to skip the code that would have called
PageClearAllVisible.

I don't feel particularly competent to propose a patch for this. It
seems to me that
log_heap_update should be sent the correct block in the first place,
and some other
method should be used to communicate between heap_update and log_heap_update
if communication is necessary. But really, I don't think such
communication should be necessary, and the xlrec.all_visible_cleared
and xlrec.new_all_visible_cleared fields are unneeded. Just assume
they are true. It seems like the worst thing that can happen is that
we call PageClearAllVisible when it is already cleared, which is
hardly harmful (the blocks that have redo applied to them are already
dirty, so a spurious clear doesn't cause unneeded IO)

Jeff

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#1)
Re: Bug in visibility hint bit

On Mon, Aug 24, 2009 at 6:23 PM, Jeff Janes<jeff.janes@gmail.com> wrote:

There seems to be a bug in the visibility map in 8.4.0, introduced to
cvs on 2008-12-03. It results in tuples being called visible that
shouldn't be.

Well, never mind. It took me a few days to track down the bug and in the
mean time I didn't want to rsync the CVS repository and lose my own local
against it. So once I'm done I rsync and see that Tom already patched it
yesterday.

Cheers,

Jeff

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Janes (#2)
Re: Bug in visibility hint bit

Jeff Janes escribi�:

On Mon, Aug 24, 2009 at 6:23 PM, Jeff Janes<jeff.janes@gmail.com> wrote:

There seems to be a bug in the visibility map in 8.4.0, introduced to
cvs on 2008-12-03. It results in tuples being called visible that
shouldn't be.

Well, never mind. It took me a few days to track down the bug and in the
mean time I didn't want to rsync the CVS repository and lose my own local
against it. So once I'm done I rsync and see that Tom already patched it
yesterday.

Congratulations on finding it independently. We welcome your eyes on
our code ;-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#1)
Re: Bug in visibility hint bit

Jeff Janes <jeff.janes@gmail.com> writes:

... But really, I don't think such
communication should be necessary, and the xlrec.all_visible_cleared
and xlrec.new_all_visible_cleared fields are unneeded. Just assume
they are true. It seems like the worst thing that can happen is that
we call PageClearAllVisible when it is already cleared, which is
hardly harmful (the blocks that have redo applied to them are already
dirty, so a spurious clear doesn't cause unneeded IO)

Just to respond to that --- I spent awhile yesterday thinking the same
thing. But the value of those flags is to tell the WAL replay functions
whether they need to go and clear the corresponding bits in the
visibility map. Making them do that unconditionally for every
insert/update/delete would surely be a pretty big hit to the speed of
WAL replay, which already leaves a lot to be desired :-(

regards, tom lane