heap_tuple_needs_freeze false positive
Hi,
I noticed that heap_tuple_needs_freeze might return true in cases where
the Xmax is leftover junk from somebody who set HEAP_XMAX_INVALID in the
far past without resetting the Xmax value itself to Invalid. I think
this is incorrect usage; the rule, I think, is that one shouldn't even
read Xmax at all unless HEAP_XMAX_INVALID is reset.
This might cause unnecessary acquisitions of the cleanup lock, if a
tuple is deemed freezable when in fact it isn't.
Suggested patch attached. I'd backpatch this as far as it applies
cleanly.
--
Álvaro Herrera <alvherre@alvh.no-ip.org>
Attachments:
heap_tuple_needs_freeze.patchapplication/octet-stream; name=heap_tuple_needs_freeze.patchDownload+13-13
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
! if (!(tuple->t_infomask & HEAP_XMAX_INVALID))
{
! if (!(tuple->t_infomask & HEAP_XMAX_IS_MULTI))
How about just one test,
if (!(tuple->t_infomask & (HEAP_XMAX_INVALID | HEAP_XMAX_IS_MULTI)))
But other than that quibble, yeah, it's a bug. XMAX_INVALID means just
that: the xmax is not to be thought valid.
regards, tom lane
On Wed, Feb 1, 2012 at 8:01 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Suggested patch attached. I'd backpatch this as far as it applies
cleanly.
This is new code in 9.2, but it's modelled on heap_freeze_tuple(), which is old.
I'm not convinced that it's a bug. Suppose that xmax is set but is
hinted as invalid. We process the table and advanced relfrozenxid;
then, we crash. After recovery, it's possible that the hint bit is
gone (after all, setting hint bits isn't WAL-logged). Now we're in
big trouble, because the next CLOG lookup on that xmax value might not
happen until it's been reused, and we might get a different answer
than before.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I'm not convinced that it's a bug. Suppose that xmax is set but is
hinted as invalid.
XMAX_INVALID is not a "hint". When it's set, the contents of the field
must be presumed to be garbage. Any code failing to adhere to that rule
is broken.
We process the table and advanced relfrozenxid;
then, we crash. After recovery, it's possible that the hint bit is
gone (after all, setting hint bits isn't WAL-logged). Now we're in
big trouble, because the next CLOG lookup on that xmax value might not
happen until it's been reused, and we might get a different answer
than before.
I believe we have adequate defenses against that, and even if we did
not, that doesn't make the code in question less wrong.
regards, tom lane
On Thu, Feb 2, 2012 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I'm not convinced that it's a bug. Suppose that xmax is set but is
hinted as invalid.XMAX_INVALID is not a "hint". When it's set, the contents of the field
must be presumed to be garbage. Any code failing to adhere to that rule
is broken.We process the table and advanced relfrozenxid;
then, we crash. After recovery, it's possible that the hint bit is
gone (after all, setting hint bits isn't WAL-logged). Now we're in
big trouble, because the next CLOG lookup on that xmax value might not
happen until it's been reused, and we might get a different answer
than before.I believe we have adequate defenses against that, and even if we did
not, that doesn't make the code in question less wrong.
I believe the adequate defense that we have is precisely the logic you
are proposing to change. Regardless of whether you want to call
XMAX_INVALID a hint or, say, a giant tortoise, I am fairly sure that
we don't WAL-log setting it. That means that a bit set before a crash
won't necessarily still be set after a crash. But the corresponding
relfrozenxid advancement will be WAL-logged, leading to the problem
scenario I described.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Feb 2, 2012 at 12:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I believe the adequate defense that we have is precisely the logic you
are proposing to change. Regardless of whether you want to call
XMAX_INVALID a hint or, say, a giant tortoise, I am fairly sure that
we don't WAL-log setting it. That means that a bit set before a crash
won't necessarily still be set after a crash. But the corresponding
relfrozenxid advancement will be WAL-logged, leading to the problem
scenario I described.
To put that another way, the problem isn't that we might have code
somewhere in the system that ignores HEAP_XMAX_INVALID. The problem
is that HEAP_XMAX_INVALID might not still be set on that tuple the
next time somebody looks at it, if a database crash intervenes after
that bit is set and before it is flushed to disk.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company