pgsql: Introduce WAL records to log reuse of btree pages, allowing
Log Message:
-----------
Introduce WAL records to log reuse of btree pages, allowing conflict
resolution during Hot Standby. Page reuse interlock requested by Tom.
Analysis and patch by me.
Modified Files:
--------------
pgsql/src/backend/access/nbtree:
nbtpage.c (r1.118 -> r1.119)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtpage.c?r1=1.118&r2=1.119)
nbtxlog.c (r1.60 -> r1.61)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/nbtree/nbtxlog.c?r1=1.60&r2=1.61)
pgsql/src/include/access:
nbtree.h (r1.128 -> r1.129)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/nbtree.h?r1=1.128&r2=1.129)
Simon Riggs wrote:
Introduce WAL records to log reuse of btree pages, allowing conflict
resolution during Hot Standby. Page reuse interlock requested by Tom.
Analysis and patch by me.
There's still a theoretical possibility for this to happen:
1. A page is marked as deleted by VACUUM, setting xact field in the opaque
2. Master crashes. WAL replay replays the XLOG_BTREE_DELETE_PAGE record.
It resets the xact field to FrozenTransactionId
3. The page is recycled. This writes a XLOG_BTREE_REUSE_PAGE record with
FrozenTransactionId as latestRemovedXid
When the standby replays that, it will call
ResolveRecoveryConflictWithSnapshot with FrozenTransactionid, not the
original xid that was used in the master when the page was deleted.
A straightforward way to fix that is to WAL-log the real xid in the
XLOG_BTREE_DELETE_PAGE records, instead of resetting it to
FrozenTransactionId.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote:
Simon Riggs wrote:
Introduce WAL records to log reuse of btree pages, allowing conflict
resolution during Hot Standby. Page reuse interlock requested by Tom.
Analysis and patch by me.There's still a theoretical possibility for this to happen:
1. A page is marked as deleted by VACUUM, setting xact field in the opaque
2. Master crashes. WAL replay replays the XLOG_BTREE_DELETE_PAGE record.
It resets the xact field to FrozenTransactionId
3. The page is recycled. This writes a XLOG_BTREE_REUSE_PAGE record with
FrozenTransactionId as latestRemovedXidWhen the standby replays that, it will call
ResolveRecoveryConflictWithSnapshot with FrozenTransactionid, not the
original xid that was used in the master when the page was deleted.
A straightforward way to fix that is to WAL-log the real xid in the
XLOG_BTREE_DELETE_PAGE records, instead of resetting it to
FrozenTransactionId.
An even simpler way would be to reset the value to latestCompletedXid
during btree_xlog_delete_page(). That touches less code. I doubt it will
make much difference to conflict recovery, since if pages are being
deleted then btree delete records are likely to be frequent and will
have already killed long running queries.
--
Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes:
On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote:
A straightforward way to fix that is to WAL-log the real xid in the
XLOG_BTREE_DELETE_PAGE records, instead of resetting it to
FrozenTransactionId.
An even simpler way would be to reset the value to latestCompletedXid
during btree_xlog_delete_page(). That touches less code. I doubt it will
make much difference to conflict recovery, since if pages are being
deleted then btree delete records are likely to be frequent and will
have already killed long running queries.
I'm a bit concerned about XID wraparound if the value doesn't get reset
to FrozenTransactionId. There's no guarantee the page will get reused
promptly ...
regards, tom lane
On Thu, 2010-02-18 at 14:17 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote:
A straightforward way to fix that is to WAL-log the real xid in the
XLOG_BTREE_DELETE_PAGE records, instead of resetting it to
FrozenTransactionId.An even simpler way would be to reset the value to latestCompletedXid
during btree_xlog_delete_page(). That touches less code. I doubt it will
make much difference to conflict recovery, since if pages are being
deleted then btree delete records are likely to be frequent and will
have already killed long running queries.I'm a bit concerned about XID wraparound if the value doesn't get reset
to FrozenTransactionId. There's no guarantee the page will get reused
promptly ...
I'd be very interested for you to have a look at Hot Standby from a
transaction wraparound perspective. There was some code in there to
handle anti-wraparound in RecordKnownAssignedTransactionId() but it was
removed, though I'm a little hazy on that myself. You've got the best
nose for corner cases and risks.
In this case, I don't see any problem. The xid after recovery will be a
same or higher value than if the crash had never taken place, so I can't
see any risk that isn't already addressed.
Since we now have to handle cases where blocks have been touched in
pre-9.0 code and are in a state they could never get into in 9.0, we do
still have to handle a value of btpo.xact == FrozenTransactionId. I will
add a special case to the handling of XLOG_BTREE_REUSE_PAGE records also
to allow for that.
Any similar theoretical issues would be most welcome if reported.
--
Simon Riggs www.2ndQuadrant.com
On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote:
Simon Riggs wrote:
Introduce WAL records to log reuse of btree pages, allowing conflict
resolution during Hot Standby. Page reuse interlock requested by Tom.
Analysis and patch by me.There's still a theoretical possibility for this to happen:
1. A page is marked as deleted by VACUUM, setting xact field in the opaque
2. Master crashes. WAL replay replays the XLOG_BTREE_DELETE_PAGE record.
It resets the xact field to FrozenTransactionId
3. The page is recycled. This writes a XLOG_BTREE_REUSE_PAGE record with
FrozenTransactionId as latestRemovedXidWhen the standby replays that, it will call
ResolveRecoveryConflictWithSnapshot with FrozenTransactionid, not the
original xid that was used in the master when the page was deleted.A straightforward way to fix that is to WAL-log the real xid in the
XLOG_BTREE_DELETE_PAGE records, instead of resetting it to
FrozenTransactionId.
Bug accepted, proposal implemented and committed.
--
Simon Riggs www.2ndQuadrant.com