Subtransaction commits and Hot Standby
Subtransactions cause a couple of problems for Hot Standby:
* we don't record new subtransactionids in WAL, so we have no direct way
to issue updates to subtrans while in recovery mode as would normally
happen when we assign subtransaction ids (subxids)
* we don't record subtransaction commit in WAL, so we have no direct way
to issue updates to clog to show TRANSACTION_STATUS_SUB_COMMITTED while
in recovery mode
The obvious solutions are to record this in WAL by doing
(1) every new subxid create an entry in WAL, either by creating a whole
new record or by augmenting the first WAL record created by the new
subxid
(2) we issue a WAL record every time we subcommit
Lets see if we can improve on that:
If we did not have an entry in subtrans, what would happen? We would be
unable to tell whether a subtransaction was visible or not in a snapshot
because TransactionIdIsInProgress() requires it. And so, if an xid was
active at time-of-snapshot yet has now committed we would mistakenly
conclude it was visible. So we must be able to track every xid back to
our snapshot *if* there should be a linkage.
However, we only need to check subtrans if the snapshot cache has
overflowed. Is it possible to only insert entries into subtrans when
they will be required, rather than always? Hmmm, nothing clear, so we
must do (1) it seems.
If we did not have an entry in clog, what would happen? Correct clog
entries are *not* required to prove whether an xid is a subxid and part
of our snapshot, we only need subtrans for that.
XidInMVCCSnapshot() calls SubTransGetTopmostTransaction() which doesn't
touch clog at all. Clog entries are currently required when we issue
TransactionIdDidCommit() on a subxid. If clog entries were missing then
we would only make a mistake about the correct status of a subxid when
we were mid-way through updating the status of an xid to committed.
So clog update might still be needed for subtransactions, but not until
commit time, so we can completely avoid the need to generate WAL records
at time of subcommit for use in Hot Standby. And it would seem that if
we update the clog atomically, we would not need to mark the subcommit
state in clog at all, even in normal running.
Right now we lock and unlock the clog for each committed subtransaction
at commit time, which is wasteful. A better scheme:
pre-scan the list of xids to derive list of pages
if we have just a single page to update
{
update all entries on page in one action
}
else
{
loop thru xids marking them all as subcommitted
mark top level transaction committed
loop thus xids again marking them all as committed
}
All clog updates would be performed page-at-a-time, in ascending xid
order.
This seems likely to work well since many subtransactions will be on
same clog page as the top-level xid and the locking will often be more
efficient than the current case of repeated single lock acquisitions. It
also means we can skip RecordSubTransactionCommit() entirely,
significantly reducing clog contention.
Anybody see a problem there?
If not, I will work on separate patches:
* re-work subtrans commit so that we use the form described above. This
should improve performance for both normal and standby modes, hence do
this as a stand-alone patch
* include code in the main hot standby patch to update subtrans during
recovery when a new subtransaction is created
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
Subtransactions cause a couple of problems for Hot Standby:
Do we need to treat subtransactions any differently from normal
transactions? Just treat all subtransactions as top-level transactions
until commit, and mark them all as committed when you see the commit
record for the top-level transaction.
Right now we lock and unlock the clog for each committed subtransaction
at commit time, which is wasteful. A better scheme:
pre-scan the list of xids to derive list of pages
if we have just a single page to update
{
update all entries on page in one action
}
else
{
loop thru xids marking them all as subcommitted
mark top level transaction committed
loop thus xids again marking them all as committed
}All clog updates would be performed page-at-a-time, in ascending xid
order.This seems likely to work well since many subtransactions will be on
same clog page as the top-level xid and the locking will often be more
efficient than the current case of repeated single lock acquisitions. It
also means we can skip RecordSubTransactionCommit() entirely,
significantly reducing clog contention.Anybody see a problem there?
Hmm, I don't see anything immediately wrong with that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Simon Riggs wrote:
Subtransactions cause a couple of problems for Hot Standby:
Do we need to treat subtransactions any differently from normal
transactions? Just treat all subtransactions as top-level transactions
until commit, and mark them all as committed when you see the commit
record for the top-level transaction.
This could lead to inconsistent results -- some of the subtransactions
could be marked as committed while others are still in progress. Unless
we want to be able to atomically mark them all as committed, but I don't
think that's really an option because it could mean holding the clog
lock for a long time, possibly involving I/O of clog pages.
Right now we lock and unlock the clog for each committed subtransaction
at commit time, which is wasteful. A better scheme:
pre-scan the list of xids to derive list of pages
if we have just a single page to update
{
update all entries on page in one action
}
else
{
loop thru xids marking them all as subcommitted
mark top level transaction committed
loop thus xids again marking them all as committed
}
Hmm, I don't see anything immediately wrong with that.
Neither do I.
I wonder if the improved clog API required to mark multiple transactions
as committed at once would be also useful to TransactionIdCommitTree
which is used in regular transaction commit.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Tue, 2008-09-16 at 17:01 +0300, Heikki Linnakangas wrote:
Simon Riggs wrote:
Subtransactions cause a couple of problems for Hot Standby:
Do we need to treat subtransactions any differently from normal
transactions? Just treat all subtransactions as top-level transactions
until commit, and mark them all as committed when you see the commit
record for the top-level transaction.
If we do that, snapshots become infinitely sized objects though, which
then requires us to invent some way of scrolling it to disk. So having
removed the need for subtrans, I then need to reinvent something similar
(or at least something like a multitrans entry).
Perhaps it is sufficient to throw an error if the subxid cache
overflows? But I suspect that may not be acceptable...
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:
Heikki Linnakangas wrote:
Simon Riggs wrote:
Subtransactions cause a couple of problems for Hot Standby:
Do we need to treat subtransactions any differently from normal
transactions? Just treat all subtransactions as top-level transactions
until commit, and mark them all as committed when you see the commit
record for the top-level transaction.This could lead to inconsistent results -- some of the subtransactions
could be marked as committed while others are still in progress. Unless
we want to be able to atomically mark them all as committed, but I don't
think that's really an option because it could mean holding the clog
lock for a long time, possibly involving I/O of clog pages.
If we did that we would need to mark them all subcomitted and then mark
them all committed. So its possible, but not desirable.
I wonder if the improved clog API required to mark multiple
transactions
as committed at once would be also useful to TransactionIdCommitTree
which is used in regular transaction commit.
Yes, I think its an improvement for regular commits/subcommits also.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
Perhaps it is sufficient to throw an error if the subxid cache
overflows? But I suspect that may not be acceptable...
Certainly not.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:
Right now we lock and unlock the clog for each committed subtransaction
at commit time, which is wasteful. A better scheme:
pre-scan the list of xids to derive list of pages
if we have just a single page to update
{
update all entries on page in one action
}
else
{
loop thru xids marking them all as subcommitted
mark top level transaction committed
loop thus xids again marking them all as committed
}Hmm, I don't see anything immediately wrong with that.
Neither do I.
I wonder if the improved clog API required to mark multiple transactions
as committed at once would be also useful to TransactionIdCommitTree
which is used in regular transaction commit.
I enclose a patch to transform what we have now into what I think is
possible. If we agree this is possible, then I will do further work to
optimise transam.c (using clog.c changes also). So this is an
"intermediate" or precursor patch for discussion only.
backend/access/transam/transam.c | 78 ++++++++++++-----------------!
backend/access/transam/twophase.c | 4 !
backend/access/transam/xact.c | 50 -----------------!!!!!!!
include/access/transam.h | 7 !!!
4 files changed, 32 insertions(+), 78 deletions(-), 29 modifications(!)
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
atomic_subxids.v1.patchtext/x-patch; charset=utf-8; name=atomic_subxids.v1.patchDownload
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.76
diff -c -r1.76 transam.c
*** src/backend/access/transam/transam.c 26 Mar 2008 18:48:59 -0000 1.76
--- src/backend/access/transam/transam.c 16 Sep 2008 16:55:30 -0000
***************
*** 287,317 ****
return false;
}
-
- /*
- * TransactionIdCommit
- * Commits the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- */
- void
- TransactionIdCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdAsyncCommit
- * Same as above, but for async commits. The commit record LSN is needed.
- */
- void
- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
- }
-
/*
* TransactionIdAbort
* Aborts the transaction associated with the identifier.
--- 287,292 ----
***************
*** 328,359 ****
}
/*
- * TransactionIdSubCommit
- * Marks the subtransaction associated with the identifier as
- * sub-committed.
- *
- * Note:
- * No async version of this is needed.
- */
- void
- TransactionIdSubCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
* TransactionIdCommitTree
* Marks all the given transaction ids as committed.
*
* The caller has to be sure that this is used only to mark subcommitted
* subtransactions as committed, and only *after* marking the toplevel
* parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
if (nxids > 0)
TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
InvalidXLogRecPtr);
--- 303,335 ----
}
/*
* TransactionIdCommitTree
* Marks all the given transaction ids as committed.
*
* The caller has to be sure that this is used only to mark subcommitted
* subtransactions as committed, and only *after* marking the toplevel
* parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit since we do not apply changes atomically, yet.
*/
void
! TransactionIdCommitTree(TransactionId commitxid, int nxids, TransactionId *xids)
{
+ /*
+ * Mark top level transaction id as committed first, to avoid
+ * race conditions with TransactionIdDidCommit
+ */
+ TransactionLogUpdate(commitxid, TRANSACTION_STATUS_COMMITTED,
+ InvalidXLogRecPtr);
+
+ /*
+ * If there is more than one subcommit, then we need to mark them
+ * subcommitted first to ensure there is no race condition where
+ * we might see a subtransaction as still in progress when it is
+ * now committed.
+ */
+ if (nxids > 1)
+ TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_SUB_COMMITTED,
+ InvalidXLogRecPtr);
if (nxids > 0)
TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
InvalidXLogRecPtr);
***************
*** 364,371 ****
* Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
if (nxids > 0)
TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
lsn);
--- 340,363 ----
* Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(TransactionId commitxid, int nxids, TransactionId *xids, XLogRecPtr lsn)
{
+ /*
+ * Mark top level transaction id as committed first, to avoid
+ * race conditions with TransactionIdDidCommit
+ */
+ TransactionLogUpdate(commitxid, TRANSACTION_STATUS_COMMITTED,
+ lsn);
+
+ /*
+ * If there is more than one subcommit, then we need to mark them
+ * subcommitted first to ensure there is no race condition where
+ * we might see a subtransaction as still in progress when it is
+ * now committed.
+ */
+ if (nxids > 1)
+ TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_SUB_COMMITTED,
+ lsn);
if (nxids > 0)
TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
lsn);
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.45
diff -c -r1.45 twophase.c
*** src/backend/access/transam/twophase.c 11 Aug 2008 11:05:10 -0000 1.45
--- src/backend/access/transam/twophase.c 16 Sep 2008 16:47:21 -0000
***************
*** 1745,1753 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1745,1751 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 16 Sep 2008 16:40:17 -0000
***************
*** 254,260 ****
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
- static void RecordSubTransactionCommit(void);
static void StartSubTransaction(void);
static void CommitSubTransaction(void);
static void AbortSubTransaction(void);
--- 254,259 ----
***************
*** 952,962 ****
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! {
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
! }
}
else
{
--- 951,957 ----
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! TransactionIdCommitTree(xid, nchildren, children);
}
else
{
***************
*** 974,984 ****
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! {
! TransactionIdAsyncCommit(xid, XactLastRecEnd);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
! }
}
/*
--- 969,975 ----
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
}
/*
***************
*** 1156,1191 ****
s->maxChildXids = 0;
}
- /*
- * RecordSubTransactionCommit
- */
- static void
- RecordSubTransactionCommit(void)
- {
- TransactionId xid = GetCurrentTransactionIdIfAny();
-
- /*
- * We do not log the subcommit in XLOG; it doesn't matter until the
- * top-level transaction commits.
- *
- * We must mark the subtransaction subcommitted in the CLOG if it had a
- * valid XID assigned. If it did not, nobody else will ever know about
- * the existence of this subxact. We don't have to deal with deletions
- * scheduled for on-commit here, since they'll be reassigned to our parent
- * (who might still abort).
- */
- if (TransactionIdIsValid(xid))
- {
- /* XXX does this really need to be a critical section? */
- START_CRIT_SECTION();
-
- /* Record subtransaction subcommit */
- TransactionIdSubCommit(xid);
-
- END_CRIT_SECTION();
- }
- }
-
/* ----------------------------------------------------------------
* AbortTransaction stuff
* ----------------------------------------------------------------
--- 1147,1152 ----
***************
*** 3791,3799 ****
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
- /* Mark subtransaction as subcommitted */
- RecordSubTransactionCommit();
-
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
AtSubCommit_childXids();
--- 3752,3757 ----
***************
*** 4259,4269 ****
TransactionId max_xid;
int i;
- TransactionIdCommit(xid);
-
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4217,4225 ----
TransactionId max_xid;
int i;
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
Index: src/include/access/transam.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/transam.h,v
retrieving revision 1.65
diff -c -r1.65 transam.h
*** src/include/access/transam.h 11 Mar 2008 20:20:35 -0000 1.65
--- src/include/access/transam.h 16 Sep 2008 17:05:12 -0000
***************
*** 139,150 ****
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
- extern void TransactionIdCommit(TransactionId transactionId);
- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
--- 139,147 ----
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdCommitTree(TransactionId commitxid, int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(TransactionId commitxid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
On Tue, 2008-09-16 at 15:38 +0100, Simon Riggs wrote:
On Tue, 2008-09-16 at 17:01 +0300, Heikki Linnakangas wrote:
Simon Riggs wrote:
Subtransactions cause a couple of problems for Hot Standby:
Do we need to treat subtransactions any differently from normal
transactions? Just treat all subtransactions as top-level transactions
until commit, and mark them all as committed when you see the commit
record for the top-level transaction.If we do that, snapshots become infinitely sized objects though, which
then requires us to invent some way of scrolling it to disk. So having
removed the need for subtrans, I then need to reinvent something similar
(or at least something like a multitrans entry).
Currently we keep track of whether the whole subxid cache has
overflowed, or not. It seems possible to track for overflows of
individual parts of the cache. That makes the code path for subxid
overflow in GetSnapshotData() slightly slower, but it's not the common
case. We save time elsewhere in more common cases.
We would be able to avoid making an entry in subtrans for new subxids
unless our local backend has overflowed its cache. That will reduce
subtrans access frequency considerably and greatly reduce the number of
requests that might need to perform I/O, possibly to zero. It would also
avoid the need for generating WAL records for new subxids for standby.
The path thru XidInMVCCSnapshot() would then require us to *always*
check the subxid cache, even if it has overflowed. If we find the xid
then we don't need to check subtrans at all. That's quite useful because
searching the subxid cache is cheaper than looking in subtrans and the
probability it would be there rather than in subtrans is still good,
even for overflows of up to 3-5 times the subxid cache. It would
increase the cost of subxid checking slightly when running with very
high numbers of subxids.
For Hot Standby, this would mean we could avoid generating WAL records
for new subxids in most cases - only generate them when our backend's
subxid cache has overflowed. On the standby it then means we can store
xids into a fixed size snapshot without worrying about whether it
overflows because the xids all fitted in the snapshot on the master
(whose xids we are emulating), *or* we have a WAL record that tells us
the cache overflowed and we make the insert into subtrans instead. When
we use the standby snapshot we look in subxid cache first and if we
don't find it then we check in subtrans.
Sounds possible?
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 2008-09-16 at 11:08 -0400, Alvaro Herrera wrote:
Simon Riggs wrote:
Perhaps it is sufficient to throw an error if the subxid cache
overflows? But I suspect that may not be acceptable...Certainly not.
Yeh :-) ... it was just a rhetorical question. I'll try to avoid those.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:
I wonder if the improved clog API required to mark multiple
transactions as committed at once would be also useful to
TransactionIdCommitTree which is used in regular transaction commit.
I've hacked together this concept patch (WIP).
Not fully tested yet, but it gives a flavour of the API rearrangements
required for atomic clog updates. It passes make check, but that's not
saying enough for a serious review yet. I expect to pick this up again
next week.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
atomic_subxids.v3.patchtext/x-patch; charset=utf-8; name=atomic_subxids.v3.patchDownload
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -r1.47 clog.c
*** src/backend/access/transam/clog.c 1 Aug 2008 13:16:08 -0000 1.47
--- src/backend/access/transam/clog.c 17 Sep 2008 20:06:15 -0000
***************
*** 80,89 ****
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
!
!
! /*
! * Record the final state of a transaction in the commit log.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
--- 80,105 ----
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
! static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommitted);
! static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
! XLogRecPtr lsn, int slotno);
!
! /*
! * TransactionIdSetTreeStatus
! *
! * Record the final state of transaction entries in the commit log for
! * a transaction and its subtransaction tree. Take care to ensure this is
! * both atomic and efficient. Prior to 8.4, this capability was provided
! * by the non-atomic TransactionIdSetStatus, which is replaced by this
! * new atomic version.
! *
! * xid is a single xid to set status for. This will typically be
! * the top level transactionid for a top level commit or abort. It can
! * also be a subtransaction when we record transaction aborts.
! *
! * subxids is an array of xids of length nsubxids, representing subtransactions
! * in the tree of xid. In various cases nsubxids may be zero.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
***************
*** 91,107 ****
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
{
- int pageno = TransactionIdToPage(xid);
- int byteno = TransactionIdToByte(xid);
- int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
int slotno;
! char *byteptr;
! char byteval;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
--- 107,221 ----
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; functions in in transam.c are the intended callers.
! *
! * Note that no lock requests are made at this level, only lower functions.
! *
! * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
! * but aren't yet in cache, as well as hinting pages not to fall out of
! * cache yet.
*/
void
! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
! {
! int pageno = TransactionIdToPage(xid); /* get page of parent */
! int i;
! bool subcommitted = false;
!
! /*
! * See how many subxids, if any, are on the same page as the parent, if any.
! */
! for (i = 0; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! break;
! }
!
! /*
! * Do all items fit on a single page?
! */
! if (i == nsubxids)
! {
! /*
! * Set the parent and any subtransactions on same page as it
! */
! TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
! pageno, subcommitted);
! }
! else
! {
! int start_subxid = 0;
! int num_on_page = 1;
!
! if (status == TRANSACTION_STATUS_COMMITTED)
! {
! /*
! * If this is a commit then we care about doing this atomically.
! * By here, we know we're updating more than one page of clog,
! * so we must mark entries that are not on the first page so
! * that they show as subcommitted before we then return to
! * update the status to fully committed.
! * We don't mark the first page, because we will be doing that
! * part when we access the first page in our next step.
! */
! TransactionIdSetPageStatus(InvalidTransactionId,
! nsubxids - i, subxids + i,
! TRANSACTION_STATUS_SUB_COMMITTED, lsn, pageno,
! false /* this setting never relevant */);
! }
!
! /*
! * Now set the parent and any subtransactions on same page as it
! */
! TransactionIdSetPageStatus(xid, i, subxids, status, lsn,
! pageno, subcommitted);
!
! /*
! * By now, all subtransactions have been subcommitted, if relevant
! */
! subcommitted = true;
!
! /*
! * Now work through the rest of the subxids one clog page at a time,
! * starting from next subxid.
! */
! start_subxid = i;
! pageno = TransactionIdToPage(subxids[i]);
! for (; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! {
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + start_subxid,
! status, lsn, pageno, true);
! start_subxid = i;
! num_on_page = 1;
! pageno = TransactionIdToPage(subxids[start_subxid]);
! }
! else
! num_on_page++;
! }
!
! /* Write last page */
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + start_subxid,
! status, lsn, pageno, true);
! }
! }
!
! /*
! * Record the final state of transaction entries in the commit log for
! * all entries on *one* page only. Atomic only on this page.
! *
! * Otherwise API is same as TransactionIdSetTreeStatus()
! */
! static void
! TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommitted)
{
int slotno;
! int i;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
***************
*** 116,124 ****
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think. Hence, pass write_ok = XLogRecPtrIsInvalid(lsn).
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
--- 230,284 ----
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think.
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+
+ /*
+ * If we synch commit more than one xid on this page while write busy
+ * we might find that some of the bits go to disk and others don't.
+ * That would break atomicity, so if we haven't already subcommitted
+ * the xids for this commit, we do that first and then come back
+ * to start marking commits. If using async commit then we already
+ * waited for the write I/O to complete by this point, so nothing to do.
+ */
+ if (status == TRANSACTION_STATUS_COMMITTED && !subcommitted && XLogRecPtrIsInvalid(lsn))
+ {
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i],
+ TRANSACTION_STATUS_SUB_COMMITTED, lsn, slotno);
+ }
+ }
+
+ /* Set the main transaction id, if any */
+ if (TransactionIdIsValid(xid))
+ TransactionIdSetStatusBit(xid, status, lsn, slotno);
+
+ /* Set the subtransactions on this page only */
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+ }
+
+ ClogCtl->shared->page_dirty[slotno] = true;
+
+ LWLockRelease(CLogControlLock);
+ }
+
+ /*
+ * Must be called with CLogControlLock held
+ */
+ static void
+ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+ {
+ int byteno = TransactionIdToByte(xid);
+ int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+ char *byteptr;
+ char byteval;
+
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
***************
*** 132,139 ****
byteval |= (status << bshift);
*byteptr = byteval;
- ClogCtl->shared->page_dirty[slotno] = true;
-
/*
* Update the group LSN if the transaction completion LSN is higher.
*
--- 292,297 ----
***************
*** 149,156 ****
if (XLByteLT(ClogCtl->shared->group_lsn[lsnindex], lsn))
ClogCtl->shared->group_lsn[lsnindex] = lsn;
}
-
- LWLockRelease(CLogControlLock);
}
/*
--- 307,312 ----
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.76
diff -c -r1.76 transam.c
*** src/backend/access/transam/transam.c 26 Mar 2008 18:48:59 -0000 1.76
--- src/backend/access/transam/transam.c 17 Sep 2008 17:54:53 -0000
***************
*** 40,54 ****
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
- static void TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
! * TransactionLogUpdate
* ----------------------------------------------------------------
*/
--- 40,58 ----
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
! *
! * Prior to 8.4, we also had TransactionLogUpdate and
! * TransactionLogMultiUpdate. These have now been merged
! * into a single command TransactionIdSetTreeStatus(),
! * though that is now part of clog.c because of the need
! * for closer integration with clog code to achieve
! * atomic clog updates for subtransactions.
* ----------------------------------------------------------------
*/
***************
*** 100,140 ****
return xidstatus;
}
- /*
- * TransactionLogUpdate
- *
- * Store the new status of a transaction. The commit record LSN must be
- * passed when recording an async commit; else it should be InvalidXLogRecPtr.
- */
- static inline void
- TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn)
- {
- /*
- * update the commit log
- */
- TransactionIdSetStatus(transactionId, status, lsn);
- }
-
- /*
- * TransactionLogMultiUpdate
- *
- * Update multiple transaction identifiers to a given status.
- * Don't depend on this being atomic; it's not.
- */
- static inline void
- TransactionLogMultiUpdate(int nxids, TransactionId *xids,
- XidStatus status, XLogRecPtr lsn)
- {
- int i;
-
- Assert(nxids != 0);
-
- for (i = 0; i < nxids; i++)
- TransactionIdSetStatus(xids[i], status, lsn);
- }
-
-
/* ----------------------------------------------------------------
* Interface functions
*
--- 104,109 ----
***************
*** 143,154 ****
* ========
* these functions test the transaction status of
* a specified transaction id.
! *
! * TransactionIdCommit
! * TransactionIdAbort
* ========
* these functions set the transaction status
! * of the specified xid.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
--- 112,125 ----
* ========
* these functions test the transaction status of
* a specified transaction id.
! *
! * TransactionIdCommitTree
! * TransactionIdAsyncCommitTree
! * TransactionIdAbortTree
* ========
* these functions set the transaction status
! * of the specified transaction tree. As of 8.4, these
! * are now atomic so we set the whole tree in a single call.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
***************
*** 287,374 ****
return false;
}
-
- /*
- * TransactionIdCommit
- * Commits the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- */
- void
- TransactionIdCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdAsyncCommit
- * Same as above, but for async commits. The commit record LSN is needed.
- */
- void
- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
- }
-
- /*
- * TransactionIdAbort
- * Aborts the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- * No async version of this is needed.
- */
- void
- TransactionIdAbort(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdSubCommit
- * Marks the subtransaction associated with the identifier as
- * sub-committed.
- *
- * Note:
- * No async version of this is needed.
- */
- void
- TransactionIdSubCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
- InvalidXLogRecPtr);
- }
-
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed.
! *
! * The caller has to be sure that this is used only to mark subcommitted
! * subtransactions as committed, and only *after* marking the toplevel
! * parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
* TransactionIdAsyncCommitTree
! * Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! lsn);
}
/*
--- 258,284 ----
return false;
}
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed, atomically.
*/
void
! TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, InvalidXLogRecPtr);
}
/*
* TransactionIdAsyncCommitTree
! * Same as above, but for async commits, atomically. The commit record
! * LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
***************
*** 379,392 ****
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED,
! InvalidXLogRecPtr);
}
-
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
--- 289,300 ----
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
{
! TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.45
diff -c -r1.45 twophase.c
*** src/backend/access/transam/twophase.c 11 Aug 2008 11:05:10 -0000 1.45
--- src/backend/access/transam/twophase.c 17 Sep 2008 15:01:20 -0000
***************
*** 1745,1753 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1745,1751 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
***************
*** 1822,1829 ****
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1820,1826 ----
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
}
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 17 Sep 2008 18:00:29 -0000
***************
*** 254,260 ****
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
- static void RecordSubTransactionCommit(void);
static void StartSubTransaction(void);
static void CommitSubTransaction(void);
static void AbortSubTransaction(void);
--- 254,259 ----
***************
*** 952,962 ****
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! {
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
! }
}
else
{
--- 951,957 ----
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! TransactionIdCommitTree(xid, nchildren, children);
}
else
{
***************
*** 974,984 ****
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! {
! TransactionIdAsyncCommit(xid, XactLastRecEnd);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
! }
}
/*
--- 969,975 ----
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
}
/*
***************
*** 1156,1191 ****
s->maxChildXids = 0;
}
- /*
- * RecordSubTransactionCommit
- */
- static void
- RecordSubTransactionCommit(void)
- {
- TransactionId xid = GetCurrentTransactionIdIfAny();
-
- /*
- * We do not log the subcommit in XLOG; it doesn't matter until the
- * top-level transaction commits.
- *
- * We must mark the subtransaction subcommitted in the CLOG if it had a
- * valid XID assigned. If it did not, nobody else will ever know about
- * the existence of this subxact. We don't have to deal with deletions
- * scheduled for on-commit here, since they'll be reassigned to our parent
- * (who might still abort).
- */
- if (TransactionIdIsValid(xid))
- {
- /* XXX does this really need to be a critical section? */
- START_CRIT_SECTION();
-
- /* Record subtransaction subcommit */
- TransactionIdSubCommit(xid);
-
- END_CRIT_SECTION();
- }
- }
-
/* ----------------------------------------------------------------
* AbortTransaction stuff
* ----------------------------------------------------------------
--- 1147,1152 ----
***************
*** 1288,1301 ****
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
- *
- * The ordering here isn't critical but it seems best to mark the parent
- * first. This assures an atomic transition of all the subtransactions to
- * aborted state from the point of view of concurrent
- * TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
--- 1249,1256 ----
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
***************
*** 3791,3798 ****
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /* Mark subtransaction as subcommitted */
! RecordSubTransactionCommit();
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
--- 3746,3757 ----
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /*
! * Prior to 8.4 we marked subcommit in clog at this point.
! * We now only perform that step, if required, as part of the
! * atomic update of the whole transaction tree at top level
! * commit or abort.
! */
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
***************
*** 4259,4269 ****
TransactionId max_xid;
int i;
- TransactionIdCommit(xid);
-
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4218,4226 ----
TransactionId max_xid;
int i;
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
***************
*** 4299,4309 ****
TransactionId max_xid;
int i;
- TransactionIdAbort(xid);
-
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4256,4264 ----
TransactionId max_xid;
int i;
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
Index: src/include/access/clog.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/clog.h,v
retrieving revision 1.21
diff -c -r1.21 clog.h
*** src/include/access/clog.h 1 Jan 2008 19:45:56 -0000 1.21
--- src/include/access/clog.h 17 Sep 2008 18:14:35 -0000
***************
*** 32,38 ****
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
--- 32,39 ----
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
Index: src/include/access/transam.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/transam.h,v
retrieving revision 1.65
diff -c -r1.65 transam.h
*** src/include/access/transam.h 11 Mar 2008 20:20:35 -0000 1.65
--- src/include/access/transam.h 17 Sep 2008 16:43:53 -0000
***************
*** 139,151 ****
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
- extern void TransactionIdCommit(TransactionId transactionId);
- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
--- 139,148 ----
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
On Thu, 2008-09-18 at 15:59 +0100, Simon Riggs wrote:
On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:
I wonder if the improved clog API required to mark multiple
transactions as committed at once would be also useful to
TransactionIdCommitTree which is used in regular transaction commit.I've hacked together this concept patch (WIP).
Not fully tested yet, but it gives a flavour of the API rearrangements
required for atomic clog updates. It passes make check, but that's not
saying enough for a serious review yet. I expect to pick this up again
next week.
I've tested this some more and am much happier with it now.
Also added README details; there are no user interface or behaviour
changes.
The patch removes the need for RecordSubTransactionCommit() which
* reduces response times of subtransaction queries because we are able
to apply these changes in batches at commit time. This requires a
batch-style API that now works atomically, so there is much change in
transam.c
* reduces the path length for visibility tests for all users viewing
concurrent subtransaction activity since we are much less likely to
waste time following a long trail to an uncommitted higher-level
transaction
* removes the need for additional WAL logging to implement
subtransaction commits for Hot Standby
So half the patch is refactoring, half rearranging of clog access
functions to support batched-access.
An early review would greatly help my work on Hot Standby. Thanks.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
atomic_subxids.v3a.patchtext/x-patch; charset=utf-8; name=atomic_subxids.v3a.patchDownload
Index: src/backend/access/transam/README
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/README,v
retrieving revision 1.11
diff -c -r1.11 README
*** src/backend/access/transam/README 21 Mar 2008 13:23:28 -0000 1.11
--- src/backend/access/transam/README 23 Sep 2008 21:23:02 -0000
***************
*** 342,351 ****
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
longer running, but its parent has not updated its state yet (either it is
! still running, or the backend crashed without updating its status). A
! sub-committed transaction's status will be updated again to the final value as
! soon as the parent commits or aborts, or when the parent is detected to be
! aborted.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
--- 342,360 ----
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
longer running, but its parent has not updated its state yet (either it is
! still running, or the backend crashed without updating its status). Prior
! to 8.4 we updated the status to sub-committed in clog as soon as
! sub-commit had happened. It was later realised that this is not actually
! required for any purpose and the action can be deferred until transaction
! commit. The main role of marking transactions as sub-committed is to
! provide an atomic commit protocol when transaction status is spread across
! multiple clog pages. As a result whenever transaction status spreads
! across multiple pages we must use a two-phase commit protocol. The first
! phase is to mark the subtransactions as sub-committed, then we mark the
! top level transaction and all its subtransactions committed (in that order).
! So in 8.4 sub-committed state still exists, but as a transitory state as
! part of final commit. Subtransaction abort is always marked in clog as
! soon as it occurs, to allow locks to be released.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -r1.47 clog.c
*** src/backend/access/transam/clog.c 1 Aug 2008 13:16:08 -0000 1.47
--- src/backend/access/transam/clog.c 23 Sep 2008 20:41:17 -0000
***************
*** 80,89 ****
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
!
!
! /*
! * Record the final state of a transaction in the commit log.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
--- 80,105 ----
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
! static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommitted);
! static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
! XLogRecPtr lsn, int slotno);
!
! /*
! * TransactionIdSetTreeStatus
! *
! * Record the final state of transaction entries in the commit log for
! * a transaction and its subtransaction tree. Take care to ensure this is
! * both atomic and efficient. Prior to 8.4, this capability was provided
! * by the non-atomic TransactionIdSetStatus, which is replaced by this
! * new atomic version.
! *
! * xid is a single xid to set status for. This will typically be
! * the top level transactionid for a top level commit or abort. It can
! * also be a subtransaction when we record transaction aborts.
! *
! * subxids is an array of xids of length nsubxids, representing subtransactions
! * in the tree of xid. In various cases nsubxids may be zero.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
***************
*** 91,107 ****
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
{
- int pageno = TransactionIdToPage(xid);
- int byteno = TransactionIdToByte(xid);
- int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
int slotno;
! char *byteptr;
! char byteval;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
--- 107,221 ----
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; functions in in transam.c are the intended callers.
! *
! * Note that no lock requests are made at this level, only lower functions.
! *
! * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
! * but aren't yet in cache, as well as hinting pages not to fall out of
! * cache yet.
*/
void
! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
! {
! int pageno = TransactionIdToPage(xid); /* get page of parent */
! int i;
! bool subcommitted = false;
!
! /*
! * See how many subxids, if any, are on the same page as the parent, if any.
! */
! for (i = 0; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! break;
! }
!
! /*
! * Do all items fit on a single page?
! */
! if (i == nsubxids)
! {
! /*
! * Set the parent and any subtransactions on same page as it
! */
! TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
! pageno, subcommitted);
! }
! else
! {
! int start_subxid = 0;
! int num_on_page = 1;
!
! if (status == TRANSACTION_STATUS_COMMITTED)
! {
! /*
! * If this is a commit then we care about doing this atomically.
! * By here, we know we're updating more than one page of clog,
! * so we must mark entries that are not on the first page so
! * that they show as subcommitted before we then return to
! * update the status to fully committed.
! * We don't mark the first page, because we will be doing that
! * part when we access the first page in our next step.
! */
! TransactionIdSetPageStatus(InvalidTransactionId,
! nsubxids - i, subxids + i,
! TRANSACTION_STATUS_SUB_COMMITTED, lsn, pageno,
! false /* this setting never relevant */);
! }
!
! /*
! * Now set the parent and any subtransactions on same page as it
! */
! TransactionIdSetPageStatus(xid, i, subxids, status, lsn,
! pageno, subcommitted);
!
! /*
! * By now, all subtransactions have been subcommitted, if relevant
! */
! subcommitted = true;
!
! /*
! * Now work through the rest of the subxids one clog page at a time,
! * starting from next subxid.
! */
! start_subxid = i;
! pageno = TransactionIdToPage(subxids[i]);
! for (; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! {
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + start_subxid,
! status, lsn, pageno, true);
! start_subxid = i;
! num_on_page = 1;
! pageno = TransactionIdToPage(subxids[start_subxid]);
! }
! else
! num_on_page++;
! }
!
! /* Write last page */
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + start_subxid,
! status, lsn, pageno, true);
! }
! }
!
! /*
! * Record the final state of transaction entries in the commit log for
! * all entries on *one* page only. Atomic only on this page.
! *
! * Otherwise API is same as TransactionIdSetTreeStatus()
! */
! static void
! TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommitted)
{
int slotno;
! int i;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
***************
*** 116,124 ****
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think. Hence, pass write_ok = XLogRecPtrIsInvalid(lsn).
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
--- 230,284 ----
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think.
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+
+ /*
+ * If we synch commit more than one xid on this page while write busy
+ * we might find that some of the bits go to disk and others don't.
+ * That would break atomicity, so if we haven't already subcommitted
+ * the xids for this commit, we do that first and then come back
+ * to start marking commits. If using async commit then we already
+ * waited for the write I/O to complete by this point, so nothing to do.
+ */
+ if (status == TRANSACTION_STATUS_COMMITTED && !subcommitted && XLogRecPtrIsInvalid(lsn))
+ {
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i],
+ TRANSACTION_STATUS_SUB_COMMITTED, lsn, slotno);
+ }
+ }
+
+ /* Set the main transaction id, if any */
+ if (TransactionIdIsValid(xid))
+ TransactionIdSetStatusBit(xid, status, lsn, slotno);
+
+ /* Set the subtransactions on this page only */
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+ }
+
+ ClogCtl->shared->page_dirty[slotno] = true;
+
+ LWLockRelease(CLogControlLock);
+ }
+
+ /*
+ * Must be called with CLogControlLock held
+ */
+ static void
+ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+ {
+ int byteno = TransactionIdToByte(xid);
+ int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+ char *byteptr;
+ char byteval;
+
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
***************
*** 132,139 ****
byteval |= (status << bshift);
*byteptr = byteval;
- ClogCtl->shared->page_dirty[slotno] = true;
-
/*
* Update the group LSN if the transaction completion LSN is higher.
*
--- 292,297 ----
***************
*** 149,156 ****
if (XLByteLT(ClogCtl->shared->group_lsn[lsnindex], lsn))
ClogCtl->shared->group_lsn[lsnindex] = lsn;
}
-
- LWLockRelease(CLogControlLock);
}
/*
--- 307,312 ----
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.76
diff -c -r1.76 transam.c
*** src/backend/access/transam/transam.c 26 Mar 2008 18:48:59 -0000 1.76
--- src/backend/access/transam/transam.c 23 Sep 2008 20:41:17 -0000
***************
*** 40,54 ****
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
- static void TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
! * TransactionLogUpdate
* ----------------------------------------------------------------
*/
--- 40,58 ----
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
! *
! * Prior to 8.4, we also had TransactionLogUpdate and
! * TransactionLogMultiUpdate. These have now been merged
! * into a single command TransactionIdSetTreeStatus(),
! * though that is now part of clog.c because of the need
! * for closer integration with clog code to achieve
! * atomic clog updates for subtransactions.
* ----------------------------------------------------------------
*/
***************
*** 100,140 ****
return xidstatus;
}
- /*
- * TransactionLogUpdate
- *
- * Store the new status of a transaction. The commit record LSN must be
- * passed when recording an async commit; else it should be InvalidXLogRecPtr.
- */
- static inline void
- TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn)
- {
- /*
- * update the commit log
- */
- TransactionIdSetStatus(transactionId, status, lsn);
- }
-
- /*
- * TransactionLogMultiUpdate
- *
- * Update multiple transaction identifiers to a given status.
- * Don't depend on this being atomic; it's not.
- */
- static inline void
- TransactionLogMultiUpdate(int nxids, TransactionId *xids,
- XidStatus status, XLogRecPtr lsn)
- {
- int i;
-
- Assert(nxids != 0);
-
- for (i = 0; i < nxids; i++)
- TransactionIdSetStatus(xids[i], status, lsn);
- }
-
-
/* ----------------------------------------------------------------
* Interface functions
*
--- 104,109 ----
***************
*** 143,154 ****
* ========
* these functions test the transaction status of
* a specified transaction id.
! *
! * TransactionIdCommit
! * TransactionIdAbort
* ========
* these functions set the transaction status
! * of the specified xid.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
--- 112,125 ----
* ========
* these functions test the transaction status of
* a specified transaction id.
! *
! * TransactionIdCommitTree
! * TransactionIdAsyncCommitTree
! * TransactionIdAbortTree
* ========
* these functions set the transaction status
! * of the specified transaction tree. As of 8.4, these
! * are now atomic so we set the whole tree in a single call.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
***************
*** 287,374 ****
return false;
}
-
- /*
- * TransactionIdCommit
- * Commits the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- */
- void
- TransactionIdCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdAsyncCommit
- * Same as above, but for async commits. The commit record LSN is needed.
- */
- void
- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
- }
-
- /*
- * TransactionIdAbort
- * Aborts the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- * No async version of this is needed.
- */
- void
- TransactionIdAbort(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdSubCommit
- * Marks the subtransaction associated with the identifier as
- * sub-committed.
- *
- * Note:
- * No async version of this is needed.
- */
- void
- TransactionIdSubCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
- InvalidXLogRecPtr);
- }
-
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed.
! *
! * The caller has to be sure that this is used only to mark subcommitted
! * subtransactions as committed, and only *after* marking the toplevel
! * parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
* TransactionIdAsyncCommitTree
! * Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! lsn);
}
/*
--- 258,284 ----
return false;
}
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed, atomically.
*/
void
! TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, InvalidXLogRecPtr);
}
/*
* TransactionIdAsyncCommitTree
! * Same as above, but for async commits, atomically. The commit record
! * LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
***************
*** 379,392 ****
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED,
! InvalidXLogRecPtr);
}
-
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
--- 289,300 ----
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
{
! TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.45
diff -c -r1.45 twophase.c
*** src/backend/access/transam/twophase.c 11 Aug 2008 11:05:10 -0000 1.45
--- src/backend/access/transam/twophase.c 23 Sep 2008 20:41:17 -0000
***************
*** 1745,1753 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1745,1751 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
***************
*** 1822,1829 ****
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1820,1826 ----
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
}
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 23 Sep 2008 20:41:17 -0000
***************
*** 254,260 ****
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
- static void RecordSubTransactionCommit(void);
static void StartSubTransaction(void);
static void CommitSubTransaction(void);
static void AbortSubTransaction(void);
--- 254,259 ----
***************
*** 952,962 ****
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! {
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
! }
}
else
{
--- 951,957 ----
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! TransactionIdCommitTree(xid, nchildren, children);
}
else
{
***************
*** 974,984 ****
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! {
! TransactionIdAsyncCommit(xid, XactLastRecEnd);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
! }
}
/*
--- 969,975 ----
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
}
/*
***************
*** 1156,1191 ****
s->maxChildXids = 0;
}
- /*
- * RecordSubTransactionCommit
- */
- static void
- RecordSubTransactionCommit(void)
- {
- TransactionId xid = GetCurrentTransactionIdIfAny();
-
- /*
- * We do not log the subcommit in XLOG; it doesn't matter until the
- * top-level transaction commits.
- *
- * We must mark the subtransaction subcommitted in the CLOG if it had a
- * valid XID assigned. If it did not, nobody else will ever know about
- * the existence of this subxact. We don't have to deal with deletions
- * scheduled for on-commit here, since they'll be reassigned to our parent
- * (who might still abort).
- */
- if (TransactionIdIsValid(xid))
- {
- /* XXX does this really need to be a critical section? */
- START_CRIT_SECTION();
-
- /* Record subtransaction subcommit */
- TransactionIdSubCommit(xid);
-
- END_CRIT_SECTION();
- }
- }
-
/* ----------------------------------------------------------------
* AbortTransaction stuff
* ----------------------------------------------------------------
--- 1147,1152 ----
***************
*** 1288,1301 ****
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
- *
- * The ordering here isn't critical but it seems best to mark the parent
- * first. This assures an atomic transition of all the subtransactions to
- * aborted state from the point of view of concurrent
- * TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
--- 1249,1256 ----
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
***************
*** 3791,3798 ****
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /* Mark subtransaction as subcommitted */
! RecordSubTransactionCommit();
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
--- 3746,3757 ----
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /*
! * Prior to 8.4 we marked subcommit in clog at this point.
! * We now only perform that step, if required, as part of the
! * atomic update of the whole transaction tree at top level
! * commit or abort.
! */
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
***************
*** 4259,4269 ****
TransactionId max_xid;
int i;
- TransactionIdCommit(xid);
-
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4218,4226 ----
TransactionId max_xid;
int i;
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
***************
*** 4299,4309 ****
TransactionId max_xid;
int i;
- TransactionIdAbort(xid);
-
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4256,4264 ----
TransactionId max_xid;
int i;
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
Index: src/include/access/clog.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/clog.h,v
retrieving revision 1.21
diff -c -r1.21 clog.h
*** src/include/access/clog.h 1 Jan 2008 19:45:56 -0000 1.21
--- src/include/access/clog.h 23 Sep 2008 20:41:17 -0000
***************
*** 32,38 ****
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
--- 32,39 ----
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
Index: src/include/access/transam.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/transam.h,v
retrieving revision 1.65
diff -c -r1.65 transam.h
*** src/include/access/transam.h 11 Mar 2008 20:20:35 -0000 1.65
--- src/include/access/transam.h 23 Sep 2008 20:41:17 -0000
***************
*** 139,151 ****
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
- extern void TransactionIdCommit(TransactionId transactionId);
- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
--- 139,148 ----
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
On Tue, 2008-09-23 at 22:47 +0100, Simon Riggs wrote:
I've tested this some more and am much happier with it now.
The concept is fine, but I've found a coding bug in further testing.
Please wait now for new version before review.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Wed, 2008-09-24 at 13:48 +0100, Simon Riggs wrote:
On Tue, 2008-09-23 at 22:47 +0100, Simon Riggs wrote:
I've tested this some more and am much happier with it now.
The concept is fine, but I've found a coding bug in further testing.
Please wait now for new version before review.
OK, spent long time testing various batching scenarios for this using a
custom test harness to simulate various spreads of xids in transaction
trees. All looks fine now.
The main work is done in new clog.c functions:
TransactionIdSetTreeStatus() which sets whole tree atomically by calling
TransactionIdSetPageStatus(), which in turn calls
TransactionIdSetStatusBit() for each xid status change.
TransactionIdSetPageStatus() performs locking and handles write_ok
problem, as did code it replaces. TransactionIdSetPageStatus() is called
theoretical minimum number of times for any transaction tree.
Patch slightly fumbles diff-ing new and replacement code, so there are
two chunks that appear to show I'm removing locking. I'm not!!
Everything else is just API changes.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
atomic_subxids.v4.patchtext/x-patch; charset=utf-8; name=atomic_subxids.v4.patchDownload
Index: src/backend/access/transam/README
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/README,v
retrieving revision 1.11
diff -c -r1.11 README
*** src/backend/access/transam/README 21 Mar 2008 13:23:28 -0000 1.11
--- src/backend/access/transam/README 24 Sep 2008 17:33:23 -0000
***************
*** 342,351 ****
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
longer running, but its parent has not updated its state yet (either it is
! still running, or the backend crashed without updating its status). A
! sub-committed transaction's status will be updated again to the final value as
! soon as the parent commits or aborts, or when the parent is detected to be
! aborted.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
--- 342,360 ----
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
longer running, but its parent has not updated its state yet (either it is
! still running, or the backend crashed without updating its status). Prior
! to 8.4 we updated the status to sub-committed in clog as soon as
! sub-commit had happened. It was later realised that this is not actually
! required for any purpose and the action can be deferred until transaction
! commit. The main role of marking transactions as sub-committed is to
! provide an atomic commit protocol when transaction status is spread across
! multiple clog pages. As a result whenever transaction status spreads
! across multiple pages we must use a two-phase commit protocol. The first
! phase is to mark the subtransactions as sub-committed, then we mark the
! top level transaction and all its subtransactions committed (in that order).
! So in 8.4 sub-committed state still exists, but as a transitory state as
! part of final commit. Subtransaction abort is always marked in clog as
! soon as it occurs, to allow locks to be released.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -r1.47 clog.c
*** src/backend/access/transam/clog.c 1 Aug 2008 13:16:08 -0000 1.47
--- src/backend/access/transam/clog.c 24 Sep 2008 21:24:02 -0000
***************
*** 80,89 ****
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
!
!
! /*
! * Record the final state of a transaction in the commit log.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
--- 80,105 ----
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
! static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommit);
! static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
! XLogRecPtr lsn, int slotno);
!
! /*
! * TransactionIdSetTreeStatus
! *
! * Record the final state of transaction entries in the commit log for
! * a transaction and its subtransaction tree. Take care to ensure this is
! * both atomic and efficient. Prior to 8.4, this capability was provided
! * by the non-atomic TransactionIdSetStatus, which is replaced by this
! * new atomic version.
! *
! * xid is a single xid to set status for. This will typically be
! * the top level transactionid for a top level commit or abort. It can
! * also be a subtransaction when we record transaction aborts.
! *
! * subxids is an array of xids of length nsubxids, representing subtransactions
! * in the tree of xid. In various cases nsubxids may be zero.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
***************
*** 91,107 ****
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
{
- int pageno = TransactionIdToPage(xid);
- int byteno = TransactionIdToByte(xid);
- int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
int slotno;
! char *byteptr;
! char byteval;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
--- 107,238 ----
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; functions in in transam.c are the intended callers.
! *
! * Note that no lock requests are made at this level, only lower functions.
! *
! * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
! * but aren't yet in cache, as well as hinting pages not to fall out of
! * cache yet.
*/
void
! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
! {
! int pageno = TransactionIdToPage(xid); /* get page of parent */
! int i;
!
! Assert(status == TRANSACTION_STATUS_COMMITTED ||
! status == TRANSACTION_STATUS_ABORTED);
!
! /*
! * See how many subxids, if any, are on the same page as the parent, if any.
! */
! for (i = 0; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! break;
! }
!
! /*
! * Do all items fit on a single page?
! */
! if (i == nsubxids)
! {
! /*
! * Set the parent and any subtransactions on same page as it
! */
! TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
! pageno, true);
! }
! else
! {
! int num_on_first_page = i;
! int num_on_page = 0;
! int offset;
!
! if (status == TRANSACTION_STATUS_COMMITTED)
! {
! /*
! * If this is a commit then we care about doing this atomically.
! * By here, we know we're updating more than one page of clog,
! * so we must mark entries that are *not* on the first page so
! * that they show as subcommitted before we then return to
! * update the status to fully committed.
! * We don't mark the first page because we will be doing that
! * when we mark the main commit, so we wish to avoid touching
! * that page twice.
! */
! num_on_page = 0;
! i = offset = num_on_first_page;
! pageno = TransactionIdToPage(subxids[num_on_first_page]);
! while (i < nsubxids)
! {
! while (TransactionIdToPage(subxids[i]) == pageno && i < nsubxids)
! {
! num_on_page++;
! i++;
! }
!
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + offset,
! TRANSACTION_STATUS_SUB_COMMITTED, lsn, pageno, true);
! offset = i;
! num_on_page = 0;
! pageno = TransactionIdToPage(subxids[offset]);
! }
! }
!
! /*
! * Now set the parent and subtransactions on same page as it, if any
! */
! pageno = TransactionIdToPage(xid);
! TransactionIdSetPageStatus(xid, num_on_first_page, subxids, status, lsn,
! pageno, true);
!
! /*
! * By now, all subtransactions have been subcommitted, so all calls
! * to TransactionIdSetPageStatus() will use subcommit=false after
! * this point for this transaction tree.
! */
!
! /*
! * Now work through the rest of the subxids one clog page at a time,
! * starting from the second page onwards, like we did above.
! */
! num_on_page = 0;
! i = offset = num_on_first_page;
! pageno = TransactionIdToPage(subxids[num_on_first_page]);
! while (i < nsubxids)
! {
! while (TransactionIdToPage(subxids[i]) == pageno && i < nsubxids)
! {
! num_on_page++;
! i++;
! }
!
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + offset,
! status, lsn, pageno, false);
! offset = i;
! num_on_page = 0;
! pageno = TransactionIdToPage(subxids[offset]);
! }
! }
! }
!
! /*
! * Record the final state of transaction entries in the commit log for
! * all entries on *one* page only. Atomic only on this page.
! *
! * Otherwise API is same as TransactionIdSetTreeStatus()
! */
! static void
! TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommit)
{
int slotno;
! int i;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
***************
*** 116,124 ****
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think. Hence, pass write_ok = XLogRecPtrIsInvalid(lsn).
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
--- 247,303 ----
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think.
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+
+ /*
+ * If we synch commit more than one xid on this page while write busy
+ * we might find that some of the bits go to disk and others don't.
+ * That would break atomicity, so if we haven't already subcommitted
+ * the xids for this commit, we do that first and then come back
+ * to start marking commits. If using async commit then we already
+ * waited for the write I/O to complete by this point, so nothing to do.
+ */
+ if (subcommit && status == TRANSACTION_STATUS_COMMITTED &&
+ XLogRecPtrIsInvalid(lsn))
+ {
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i],
+ TRANSACTION_STATUS_SUB_COMMITTED, lsn, slotno);
+ }
+ }
+
+
+ /* Set the main transaction id, if any */
+ if (TransactionIdIsValid(xid))
+ TransactionIdSetStatusBit(xid, status, lsn, slotno);
+
+ /* Set the subtransactions on this page only */
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+ }
+
+ ClogCtl->shared->page_dirty[slotno] = true;
+
+ LWLockRelease(CLogControlLock);
+ }
+
+ /*
+ * Must be called with CLogControlLock held
+ */
+ static void
+ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+ {
+ int byteno = TransactionIdToByte(xid);
+ int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+ char *byteptr;
+ char byteval;
+
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
***************
*** 132,139 ****
byteval |= (status << bshift);
*byteptr = byteval;
- ClogCtl->shared->page_dirty[slotno] = true;
-
/*
* Update the group LSN if the transaction completion LSN is higher.
*
--- 311,316 ----
***************
*** 149,156 ****
if (XLByteLT(ClogCtl->shared->group_lsn[lsnindex], lsn))
ClogCtl->shared->group_lsn[lsnindex] = lsn;
}
-
- LWLockRelease(CLogControlLock);
}
/*
--- 326,331 ----
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.76
diff -c -r1.76 transam.c
*** src/backend/access/transam/transam.c 26 Mar 2008 18:48:59 -0000 1.76
--- src/backend/access/transam/transam.c 24 Sep 2008 17:33:23 -0000
***************
*** 40,54 ****
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
- static void TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
! * TransactionLogUpdate
* ----------------------------------------------------------------
*/
--- 40,58 ----
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
! *
! * Prior to 8.4, we also had TransactionLogUpdate and
! * TransactionLogMultiUpdate. These have now been merged
! * into a single command TransactionIdSetTreeStatus(),
! * though that is now part of clog.c because of the need
! * for closer integration with clog code to achieve
! * atomic clog updates for subtransactions.
* ----------------------------------------------------------------
*/
***************
*** 100,140 ****
return xidstatus;
}
- /*
- * TransactionLogUpdate
- *
- * Store the new status of a transaction. The commit record LSN must be
- * passed when recording an async commit; else it should be InvalidXLogRecPtr.
- */
- static inline void
- TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn)
- {
- /*
- * update the commit log
- */
- TransactionIdSetStatus(transactionId, status, lsn);
- }
-
- /*
- * TransactionLogMultiUpdate
- *
- * Update multiple transaction identifiers to a given status.
- * Don't depend on this being atomic; it's not.
- */
- static inline void
- TransactionLogMultiUpdate(int nxids, TransactionId *xids,
- XidStatus status, XLogRecPtr lsn)
- {
- int i;
-
- Assert(nxids != 0);
-
- for (i = 0; i < nxids; i++)
- TransactionIdSetStatus(xids[i], status, lsn);
- }
-
-
/* ----------------------------------------------------------------
* Interface functions
*
--- 104,109 ----
***************
*** 143,154 ****
* ========
* these functions test the transaction status of
* a specified transaction id.
! *
! * TransactionIdCommit
! * TransactionIdAbort
* ========
* these functions set the transaction status
! * of the specified xid.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
--- 112,125 ----
* ========
* these functions test the transaction status of
* a specified transaction id.
! *
! * TransactionIdCommitTree
! * TransactionIdAsyncCommitTree
! * TransactionIdAbortTree
* ========
* these functions set the transaction status
! * of the specified transaction tree. As of 8.4, these
! * are now atomic so we set the whole tree in a single call.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
***************
*** 287,374 ****
return false;
}
-
- /*
- * TransactionIdCommit
- * Commits the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- */
- void
- TransactionIdCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdAsyncCommit
- * Same as above, but for async commits. The commit record LSN is needed.
- */
- void
- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
- }
-
- /*
- * TransactionIdAbort
- * Aborts the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- * No async version of this is needed.
- */
- void
- TransactionIdAbort(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdSubCommit
- * Marks the subtransaction associated with the identifier as
- * sub-committed.
- *
- * Note:
- * No async version of this is needed.
- */
- void
- TransactionIdSubCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
- InvalidXLogRecPtr);
- }
-
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed.
! *
! * The caller has to be sure that this is used only to mark subcommitted
! * subtransactions as committed, and only *after* marking the toplevel
! * parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
* TransactionIdAsyncCommitTree
! * Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! lsn);
}
/*
--- 258,284 ----
return false;
}
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed, atomically.
*/
void
! TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, InvalidXLogRecPtr);
}
/*
* TransactionIdAsyncCommitTree
! * Same as above, but for async commits, atomically. The commit record
! * LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
***************
*** 379,392 ****
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED,
! InvalidXLogRecPtr);
}
-
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
--- 289,300 ----
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
{
! TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.45
diff -c -r1.45 twophase.c
*** src/backend/access/transam/twophase.c 11 Aug 2008 11:05:10 -0000 1.45
--- src/backend/access/transam/twophase.c 24 Sep 2008 17:33:23 -0000
***************
*** 1745,1753 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1745,1751 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
***************
*** 1822,1829 ****
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1820,1826 ----
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
}
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 24 Sep 2008 17:33:23 -0000
***************
*** 254,260 ****
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
- static void RecordSubTransactionCommit(void);
static void StartSubTransaction(void);
static void CommitSubTransaction(void);
static void AbortSubTransaction(void);
--- 254,259 ----
***************
*** 952,962 ****
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! {
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
! }
}
else
{
--- 951,957 ----
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! TransactionIdCommitTree(xid, nchildren, children);
}
else
{
***************
*** 974,984 ****
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! {
! TransactionIdAsyncCommit(xid, XactLastRecEnd);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
! }
}
/*
--- 969,975 ----
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
}
/*
***************
*** 1156,1191 ****
s->maxChildXids = 0;
}
- /*
- * RecordSubTransactionCommit
- */
- static void
- RecordSubTransactionCommit(void)
- {
- TransactionId xid = GetCurrentTransactionIdIfAny();
-
- /*
- * We do not log the subcommit in XLOG; it doesn't matter until the
- * top-level transaction commits.
- *
- * We must mark the subtransaction subcommitted in the CLOG if it had a
- * valid XID assigned. If it did not, nobody else will ever know about
- * the existence of this subxact. We don't have to deal with deletions
- * scheduled for on-commit here, since they'll be reassigned to our parent
- * (who might still abort).
- */
- if (TransactionIdIsValid(xid))
- {
- /* XXX does this really need to be a critical section? */
- START_CRIT_SECTION();
-
- /* Record subtransaction subcommit */
- TransactionIdSubCommit(xid);
-
- END_CRIT_SECTION();
- }
- }
-
/* ----------------------------------------------------------------
* AbortTransaction stuff
* ----------------------------------------------------------------
--- 1147,1152 ----
***************
*** 1288,1301 ****
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
- *
- * The ordering here isn't critical but it seems best to mark the parent
- * first. This assures an atomic transition of all the subtransactions to
- * aborted state from the point of view of concurrent
- * TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
--- 1249,1256 ----
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
***************
*** 3791,3798 ****
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /* Mark subtransaction as subcommitted */
! RecordSubTransactionCommit();
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
--- 3746,3757 ----
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /*
! * Prior to 8.4 we marked subcommit in clog at this point.
! * We now only perform that step, if required, as part of the
! * atomic update of the whole transaction tree at top level
! * commit or abort.
! */
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
***************
*** 4259,4269 ****
TransactionId max_xid;
int i;
- TransactionIdCommit(xid);
-
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4218,4226 ----
TransactionId max_xid;
int i;
/* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
***************
*** 4299,4309 ****
TransactionId max_xid;
int i;
- TransactionIdAbort(xid);
-
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4256,4264 ----
TransactionId max_xid;
int i;
/* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
Index: src/include/access/clog.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/clog.h,v
retrieving revision 1.21
diff -c -r1.21 clog.h
*** src/include/access/clog.h 1 Jan 2008 19:45:56 -0000 1.21
--- src/include/access/clog.h 24 Sep 2008 17:33:23 -0000
***************
*** 32,38 ****
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
--- 32,39 ----
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
Index: src/include/access/transam.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/transam.h,v
retrieving revision 1.65
diff -c -r1.65 transam.h
*** src/include/access/transam.h 11 Mar 2008 20:20:35 -0000 1.65
--- src/include/access/transam.h 24 Sep 2008 17:33:23 -0000
***************
*** 139,151 ****
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
- extern void TransactionIdCommit(TransactionId transactionId);
- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
--- 139,148 ----
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
Simon Riggs wrote:
OK, spent long time testing various batching scenarios for this using a
custom test harness to simulate various spreads of xids in transaction
trees. All looks fine now.
I had a look and was mostly rephrasing some comments and the README
(hopefully I didn't make any of them worse than they were), when I
noticed that the code to iterate thru pages could be refactored. I
think the change makes the algorithm in TransactionIdSetTreeStatus
easier to follow.
I also noticed that TransactionIdSetPageStatus has a "subcommit" arg
which is unexplained. I sort-of understand the point, but I think it's
better that you fill in the explanation in the header comment (marked
with XXX)
I hope I didn't break the code with the new function
set_tree_status_by_pages -- please recheck that part.
I didn't test this beyond regression tests.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachments:
atomic_subxids.v6.patchtext/x-diff; charset=us-asciiDownload
Index: src/backend/access/transam/README
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/README,v
retrieving revision 1.11
diff -c -p -r1.11 README
*** src/backend/access/transam/README 21 Mar 2008 13:23:28 -0000 1.11
--- src/backend/access/transam/README 5 Oct 2008 18:33:40 -0000
*************** from disk. They also allow information
*** 341,351 ****
pg_clog records the commit status for each transaction that has been assigned
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
! longer running, but its parent has not updated its state yet (either it is
! still running, or the backend crashed without updating its status). A
! sub-committed transaction's status will be updated again to the final value as
! soon as the parent commits or aborts, or when the parent is detected to be
! aborted.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
--- 341,360 ----
pg_clog records the commit status for each transaction that has been assigned
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
! longer running, but its parent has not updated its state yet. It is not
! necessary to update a subtransaction's transaction status to subcommit, so we
! can just defer it until main transaction commit. The main role of marking
! transactions as sub-committed is to provide an atomic commit protocol when
! transaction status is spread across multiple clog pages. As a result, whenever
! transaction status spreads across multiple pages we must use a two-phase commit
! protocol: the first phase is to mark the subtransactions as sub-committed, then
! we mark the top level transaction and all its subtransactions committed (in
! that order). Thus, subtransactions that have not aborted appear as in-progress
! even when they have already finished, and the subcommit status appears as a
! very short transitoty state during main transaction commit. Subtransaction
! abort is always marked in clog as soon as it occurs. When the transaction
! status all fit in a single CLOG page, we atomically mark them all as committed
! without bothering with the intermediate sub-commit state.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -p -r1.47 clog.c
*** src/backend/access/transam/clog.c 1 Aug 2008 13:16:08 -0000 1.47
--- src/backend/access/transam/clog.c 5 Oct 2008 18:36:12 -0000
*************** static int ZeroCLOGPage(int pageno, bool
*** 80,107 ****
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
/*
! * Record the final state of a transaction in the commit log.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
* caller guarantees the commit record is already flushed in that case. It
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
{
- int pageno = TransactionIdToPage(xid);
- int byteno = TransactionIdToByte(xid);
- int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
int slotno;
! char *byteptr;
! char byteval;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
--- 80,249 ----
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
+ static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
+ TransactionId *subxids, XidStatus status,
+ XLogRecPtr lsn, int pageno, bool subcommit);
+ static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
+ XLogRecPtr lsn, int slotno);
+ static void set_tree_status_by_pages(int nsubxids, TransactionId *subxids,
+ XidStatus status, XLogRecPtr lsn, bool subcommit);
/*
! * TransactionIdSetTreeStatus
! *
! * Record the final state of transaction entries in the commit log for
! * a transaction and its subtransaction tree. Take care to ensure this is
! * efficient, and as atomic as possible.
! *
! * xid is a single xid to set status for. This will typically be
! * the top level transactionid for a top level commit or abort. It can
! * also be a subtransaction when we record transaction aborts.
! *
! * subxids is an array of xids of length nsubxids, representing subtransactions
! * in the tree of xid. In various cases nsubxids may be zero.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
* caller guarantees the commit record is already flushed in that case. It
* should be InvalidXLogRecPtr for abort cases, too.
*
+ * In the commit case, atomicity is limited by whether all the subxids are in
+ * the same CLOG page as xid. If they all are, then the lock will be grabbed
+ * only once, and the status will be set to committed directly. Otherwise
+ * we must
+ * 1. set sub-committed all subxids that are not on the same page as the
+ * main xid
+ * 2. atomically set committed the main xid and the subxids on the same page
+ * 3. go over the first bunch again and set them committed
+ * Note that as far as concurrent checkers are concerned, main transaction
+ * commit as a whole is still atomic.
+ *
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; functions in transam.c are the intended callers.
! *
! * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
! * but aren't yet in cache, as well as hinting pages not to fall out of
! * cache yet.
*/
void
! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
! {
! int pageno = TransactionIdToPage(xid); /* get page of parent */
! int i;
!
! Assert(status == TRANSACTION_STATUS_COMMITTED ||
! status == TRANSACTION_STATUS_ABORTED);
!
! /*
! * See how many subxids, if any, are on the same page as the parent, if any.
! */
! for (i = 0; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! break;
! }
!
! /*
! * Do all items fit on a single page?
! */
! if (i == nsubxids)
! {
! /*
! * Set the parent and all subtransactions in a single call
! */
! TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
! pageno, true);
! }
! else
! {
! int num_on_first_page = i;
!
! /*
! * If this is a commit then we care about doing this correctly (i.e.
! * using the subcommitted intermediate status). By here, we know we're
! * updating more than one page of clog, so we must mark entries that
! * are *not* on the first page so that they show as subcommitted before
! * we then return to update the status to fully committed.
! *
! * To avoid touching the first page twice, skip marking subcommitted
! * those subxids in that first page.
! */
! if (status == TRANSACTION_STATUS_COMMITTED)
! set_tree_status_by_pages(nsubxids - num_on_first_page,
! subxids + num_on_first_page,
! TRANSACTION_STATUS_SUB_COMMITTED, lsn, true);
!
! /*
! * Now set the parent and subtransactions on same page as the parent,
! * if any
! */
! pageno = TransactionIdToPage(xid);
! TransactionIdSetPageStatus(xid, num_on_first_page, subxids, status,
! lsn, pageno, true);
!
! /*
! * By now, all subtransactions have been subcommitted, so all calls
! * to TransactionIdSetPageStatus() will use subcommit=false after
! * this point for this transaction tree.
! */
!
! /*
! * Now work through the rest of the subxids one clog page at a time,
! * starting from the second page onwards, like we did above.
! */
! set_tree_status_by_pages(nsubxids - num_on_first_page,
! subxids + num_on_first_page,
! status, lsn, false);
! }
! }
!
! /*
! * Helper for TransactionIdSetTreeStatus: set the status for a bunch of
! * transactions, chunking in the separate CLOG pages involved.
! */
! static void
! set_tree_status_by_pages(int nsubxids, TransactionId *subxids,
! XidStatus status, XLogRecPtr lsn, bool subcommit)
! {
! int pageno = TransactionIdToPage(subxids[0]);
! int offset = 0;
! int i = 0;
!
! while (i < nsubxids)
! {
! int num_on_page = 0;
!
! while (TransactionIdToPage(subxids[i]) == pageno && i < nsubxids)
! {
! num_on_page++;
! i++;
! }
!
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + offset,
! status, lsn, pageno, subcommit);
! offset = i;
! pageno = TransactionIdToPage(subxids[offset]);
! }
! }
!
! /*
! * Record the final state of transaction entries in the commit log for
! * all entries on a single page. Atomic only on this page.
! *
! * Otherwise API is same as TransactionIdSetTreeStatus()
! *
! * XXX describe what "subcommit" is for.
! */
! static void
! TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status,
! XLogRecPtr lsn, int pageno, bool subcommit)
{
int slotno;
! int i;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
*************** TransactionIdSetStatus(TransactionId xid
*** 116,124 ****
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think. Hence, pass write_ok = XLogRecPtrIsInvalid(lsn).
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
--- 258,317 ----
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think.
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+
+ /*
+ * If we sync-commit more than one xid on this page while it is being
+ * written out, we might find that some of the bits go to disk and others
+ * don't. That would break atomicity, so if we haven't already
+ * subcommitted the xids for this commit, we do that first and then come
+ * back to start marking commits.
+ *
+ * If using async commit then we already waited for the write I/O to
+ * complete by this point, so nothing to do.
+ */
+ if (subcommit && status == TRANSACTION_STATUS_COMMITTED &&
+ XLogRecPtrIsInvalid(lsn))
+ {
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i],
+ TRANSACTION_STATUS_SUB_COMMITTED, lsn, slotno);
+ }
+ }
+
+ /* Set the main transaction id, if any */
+ if (TransactionIdIsValid(xid))
+ TransactionIdSetStatusBit(xid, status, lsn, slotno);
+
+ /* Set the subtransactions */
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+ }
+
+ ClogCtl->shared->page_dirty[slotno] = true;
+
+ LWLockRelease(CLogControlLock);
+ }
+
+ /*
+ * Sets the commit status of a single transaction.
+ *
+ * Must be called with CLogControlLock held
+ */
+ static void
+ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+ {
+ int byteno = TransactionIdToByte(xid);
+ int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+ char *byteptr;
+ char byteval;
+
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
*************** TransactionIdSetStatus(TransactionId xid
*** 132,139 ****
byteval |= (status << bshift);
*byteptr = byteval;
- ClogCtl->shared->page_dirty[slotno] = true;
-
/*
* Update the group LSN if the transaction completion LSN is higher.
*
--- 325,330 ----
*************** TransactionIdSetStatus(TransactionId xid
*** 150,156 ****
ClogCtl->shared->group_lsn[lsnindex] = lsn;
}
- LWLockRelease(CLogControlLock);
}
/*
--- 341,346 ----
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.76
diff -c -p -r1.76 transam.c
*** src/backend/access/transam/transam.c 26 Mar 2008 18:48:59 -0000 1.76
--- src/backend/access/transam/transam.c 5 Oct 2008 18:46:01 -0000
*************** static const XLogRecPtr InvalidXLogRecPt
*** 40,54 ****
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
- static void TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
- * TransactionLogUpdate
* ----------------------------------------------------------------
*/
--- 40,51 ----
*************** TransactionLogFetch(TransactionId transa
*** 100,140 ****
return xidstatus;
}
- /*
- * TransactionLogUpdate
- *
- * Store the new status of a transaction. The commit record LSN must be
- * passed when recording an async commit; else it should be InvalidXLogRecPtr.
- */
- static inline void
- TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn)
- {
- /*
- * update the commit log
- */
- TransactionIdSetStatus(transactionId, status, lsn);
- }
-
- /*
- * TransactionLogMultiUpdate
- *
- * Update multiple transaction identifiers to a given status.
- * Don't depend on this being atomic; it's not.
- */
- static inline void
- TransactionLogMultiUpdate(int nxids, TransactionId *xids,
- XidStatus status, XLogRecPtr lsn)
- {
- int i;
-
- Assert(nxids != 0);
-
- for (i = 0; i < nxids; i++)
- TransactionIdSetStatus(xids[i], status, lsn);
- }
-
-
/* ----------------------------------------------------------------
* Interface functions
*
--- 97,102 ----
*************** TransactionLogMultiUpdate(int nxids, Tra
*** 144,154 ****
* these functions test the transaction status of
* a specified transaction id.
*
! * TransactionIdCommit
! * TransactionIdAbort
* ========
! * these functions set the transaction status
! * of the specified xid.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
--- 106,117 ----
* these functions test the transaction status of
* a specified transaction id.
*
! * TransactionIdCommitTree
! * TransactionIdAsyncCommitTree
! * TransactionIdAbortTree
* ========
! * these functions set the transaction status of the specified
! * transaction tree.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
*************** TransactionIdIsKnownCompleted(Transactio
*** 287,362 ****
return false;
}
-
- /*
- * TransactionIdCommit
- * Commits the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- */
- void
- TransactionIdCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdAsyncCommit
- * Same as above, but for async commits. The commit record LSN is needed.
- */
- void
- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
- }
-
- /*
- * TransactionIdAbort
- * Aborts the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- * No async version of this is needed.
- */
- void
- TransactionIdAbort(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdSubCommit
- * Marks the subtransaction associated with the identifier as
- * sub-committed.
- *
- * Note:
- * No async version of this is needed.
- */
- void
- TransactionIdSubCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
- InvalidXLogRecPtr);
- }
-
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed.
*
! * The caller has to be sure that this is used only to mark subcommitted
! * subtransactions as committed, and only *after* marking the toplevel
! * parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
--- 250,271 ----
return false;
}
/*
* TransactionIdCommitTree
! * Marks the given transaction and children as committed
*
! * "xid" is a toplevel transaction commit, and the xids array contains its
! * committed subtransactions.
! *
! * This commit operation is not guaranteed to be atomic, but if not, subxids
! * are correctly marked subcommit first.
*/
void
! TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
*************** TransactionIdCommitTree(int nxids, Trans
*** 364,392 ****
* Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! lsn);
}
/*
* TransactionIdAbortTree
! * Marks all the given transaction ids as aborted.
*
* We don't need to worry about the non-atomic behavior, since any onlookers
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED,
! InvalidXLogRecPtr);
}
-
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
--- 273,302 ----
* Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
! XLogRecPtr lsn)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
* TransactionIdAbortTree
! * Marks the given transaction and children as aborted.
! *
! * "xid" is a toplevel transaction commit, and the xids array contains its
! * committed subtransactions.
*
* We don't need to worry about the non-atomic behavior, since any onlookers
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
{
! TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.45
diff -c -p -r1.45 twophase.c
*** src/backend/access/transam/twophase.c 11 Aug 2008 11:05:10 -0000 1.45
--- src/backend/access/transam/twophase.c 5 Oct 2008 18:33:40 -0000
*************** RecordTransactionCommitPrepared(Transact
*** 1745,1753 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1745,1751 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
*************** RecordTransactionAbortPrepared(Transacti
*** 1822,1829 ****
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1820,1826 ----
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
}
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -p -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 5 Oct 2008 18:49:21 -0000
*************** static void CommitTransaction(void);
*** 254,260 ****
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
- static void RecordSubTransactionCommit(void);
static void StartSubTransaction(void);
static void CommitSubTransaction(void);
static void AbortSubTransaction(void);
--- 254,259 ----
*************** RecordTransactionCommit(void)
*** 952,962 ****
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! {
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
! }
}
else
{
--- 951,957 ----
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! TransactionIdCommitTree(xid, nchildren, children);
}
else
{
*************** RecordTransactionCommit(void)
*** 974,984 ****
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! {
! TransactionIdAsyncCommit(xid, XactLastRecEnd);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
! }
}
/*
--- 969,975 ----
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
}
/*
*************** AtSubCommit_childXids(void)
*** 1156,1191 ****
s->maxChildXids = 0;
}
- /*
- * RecordSubTransactionCommit
- */
- static void
- RecordSubTransactionCommit(void)
- {
- TransactionId xid = GetCurrentTransactionIdIfAny();
-
- /*
- * We do not log the subcommit in XLOG; it doesn't matter until the
- * top-level transaction commits.
- *
- * We must mark the subtransaction subcommitted in the CLOG if it had a
- * valid XID assigned. If it did not, nobody else will ever know about
- * the existence of this subxact. We don't have to deal with deletions
- * scheduled for on-commit here, since they'll be reassigned to our parent
- * (who might still abort).
- */
- if (TransactionIdIsValid(xid))
- {
- /* XXX does this really need to be a critical section? */
- START_CRIT_SECTION();
-
- /* Record subtransaction subcommit */
- TransactionIdSubCommit(xid);
-
- END_CRIT_SECTION();
- }
- }
-
/* ----------------------------------------------------------------
* AbortTransaction stuff
* ----------------------------------------------------------------
--- 1147,1152 ----
*************** RecordTransactionAbort(bool isSubXact)
*** 1288,1301 ****
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
- *
- * The ordering here isn't critical but it seems best to mark the parent
- * first. This assures an atomic transition of all the subtransactions to
- * aborted state from the point of view of concurrent
- * TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
--- 1249,1256 ----
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
*************** CommitSubTransaction(void)
*** 3791,3798 ****
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /* Mark subtransaction as subcommitted */
! RecordSubTransactionCommit();
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
--- 3746,3757 ----
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /*
! * Prior to 8.4 we marked subcommit in clog at this point.
! * We now only perform that step, if required, as part of the
! * atomic update of the whole transaction tree at top level
! * commit or abort.
! */
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
*************** xact_redo_commit(xl_xact_commit *xlrec,
*** 4259,4269 ****
TransactionId max_xid;
int i;
! TransactionIdCommit(xid);
!
! /* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4218,4226 ----
TransactionId max_xid;
int i;
! /* Mark the transaction committed in pg_clog */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
*************** xact_redo_abort(xl_xact_abort *xlrec, Tr
*** 4299,4309 ****
TransactionId max_xid;
int i;
! TransactionIdAbort(xid);
!
! /* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4256,4264 ----
TransactionId max_xid;
int i;
! /* Mark the transaction aborted in pg_clog */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
Index: src/include/access/clog.h
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/include/access/clog.h,v
retrieving revision 1.21
diff -c -p -r1.21 clog.h
*** src/include/access/clog.h 1 Jan 2008 19:45:56 -0000 1.21
--- src/include/access/clog.h 5 Oct 2008 18:33:40 -0000
*************** typedef int XidStatus;
*** 32,38 ****
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
--- 32,39 ----
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
Index: src/include/access/transam.h
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/include/access/transam.h,v
retrieving revision 1.65
diff -c -p -r1.65 transam.h
*** src/include/access/transam.h 11 Mar 2008 20:20:35 -0000 1.65
--- src/include/access/transam.h 5 Oct 2008 18:33:40 -0000
*************** extern VariableCache ShmemVariableCache;
*** 139,151 ****
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
- extern void TransactionIdCommit(TransactionId transactionId);
- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
--- 139,148 ----
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
On Sun, 2008-10-05 at 14:51 -0400, Alvaro Herrera wrote:
Simon Riggs wrote:
OK, spent long time testing various batching scenarios for this using a
custom test harness to simulate various spreads of xids in transaction
trees. All looks fine now.I had a look and was mostly rephrasing some comments and the README
(hopefully I didn't make any of them worse than they were), when I
noticed that the code to iterate thru pages could be refactored. I
think the change makes the algorithm in TransactionIdSetTreeStatus
easier to follow.
OK, thanks for the review.
I also noticed that TransactionIdSetPageStatus has a "subcommit" arg
which is unexplained. I sort-of understand the point, but I think it's
better that you fill in the explanation in the header comment (marked
with XXX)
I'll explain some more in the code, and in the README with worked
examples of what we need to do and why.
I hope I didn't break the code with the new function
set_tree_status_by_pages -- please recheck that part.
Eyeballs OK.
I didn't test this beyond regression tests.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Sun, 2008-10-05 at 14:51 -0400, Alvaro Herrera wrote:
Simon Riggs wrote:
OK, spent long time testing various batching scenarios for this using a
custom test harness to simulate various spreads of xids in transaction
trees. All looks fine now.I had a look and was mostly rephrasing some comments and the README
(hopefully I didn't make any of them worse than they were), when I
noticed that the code to iterate thru pages could be refactored. I
think the change makes the algorithm in TransactionIdSetTreeStatus
easier to follow.
Yes, all fits on one screen when reading it now.
I also noticed that TransactionIdSetPageStatus has a "subcommit" arg
which is unexplained. I sort-of understand the point, but I think it's
better that you fill in the explanation in the header comment (marked
with XXX)
I've changed the logic slightly to remove the need for the subcommit
argument. So no need to explain now.
Added an Assert to check for what should be an impossible call.
Example provided in comments for a complex update.
I hope I didn't break the code with the new function
set_tree_status_by_pages -- please recheck that part.
Renamed to set_status_by_pages because we never use this on the whole
tree. Added comments to say that.
Overall, cleaner and more readable now. Thanks.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
atomic_subxids.v7.patchtext/x-patch; charset=utf-8; name=atomic_subxids.v7.patchDownload
Index: src/backend/access/transam/README
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/README,v
retrieving revision 1.11
diff -c -r1.11 README
*** src/backend/access/transam/README 21 Mar 2008 13:23:28 -0000 1.11
--- src/backend/access/transam/README 7 Oct 2008 12:42:18 -0000
***************
*** 341,351 ****
pg_clog records the commit status for each transaction that has been assigned
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
! longer running, but its parent has not updated its state yet (either it is
! still running, or the backend crashed without updating its status). A
! sub-committed transaction's status will be updated again to the final value as
! soon as the parent commits or aborts, or when the parent is detected to be
! aborted.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
--- 341,360 ----
pg_clog records the commit status for each transaction that has been assigned
an XID. A transaction can be in progress, committed, aborted, or
"sub-committed". This last state means that it's a subtransaction that's no
! longer running, but its parent has not updated its state yet. It is not
! necessary to update a subtransaction's transaction status to subcommit, so we
! can just defer it until main transaction commit. The main role of marking
! transactions as sub-committed is to provide an atomic commit protocol when
! transaction status is spread across multiple clog pages. As a result, whenever
! transaction status spreads across multiple pages we must use a two-phase commit
! protocol: the first phase is to mark the subtransactions as sub-committed, then
! we mark the top level transaction and all its subtransactions committed (in
! that order). Thus, subtransactions that have not aborted appear as in-progress
! even when they have already finished, and the subcommit status appears as a
! very short transitory state during main transaction commit. Subtransaction
! abort is always marked in clog as soon as it occurs. When the transaction
! status all fit in a single CLOG page, we atomically mark them all as committed
! without bothering with the intermediate sub-commit state.
Savepoints are implemented using subtransactions. A subtransaction is a
transaction inside a transaction; its commit or abort status is not only
Index: src/backend/access/transam/clog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -r1.47 clog.c
*** src/backend/access/transam/clog.c 1 Aug 2008 13:16:08 -0000 1.47
--- src/backend/access/transam/clog.c 7 Oct 2008 12:44:14 -0000
***************
*** 80,111 ****
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
/*
! * Record the final state of a transaction in the commit log.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
* caller guarantees the commit record is already flushed in that case. It
* should be InvalidXLogRecPtr for abort cases, too.
*
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; TransactionLogUpdate() in transam.c is the intended caller.
*/
void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
{
- int pageno = TransactionIdToPage(xid);
- int byteno = TransactionIdToByte(xid);
- int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
int slotno;
! char *byteptr;
! char byteval;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
! status == TRANSACTION_STATUS_SUB_COMMITTED);
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
--- 80,261 ----
static bool CLOGPagePrecedes(int page1, int page2);
static void WriteZeroPageXlogRec(int pageno);
static void WriteTruncateXlogRec(int pageno);
+ static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
+ TransactionId *subxids, XidStatus status,
+ XLogRecPtr lsn, int pageno);
+ static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
+ XLogRecPtr lsn, int slotno);
+ static void set_status_by_pages(int nsubxids, TransactionId *subxids,
+ XidStatus status, XLogRecPtr lsn);
/*
! * TransactionIdSetTreeStatus
! *
! * Record the final state of transaction entries in the commit log for
! * a transaction and its subtransaction tree. Take care to ensure this is
! * efficient, and as atomic as possible.
! *
! * xid is a single xid to set status for. This will typically be
! * the top level transactionid for a top level commit or abort. It can
! * also be a subtransaction when we record transaction aborts.
! *
! * subxids is an array of xids of length nsubxids, representing subtransactions
! * in the tree of xid. In various cases nsubxids may be zero.
*
* lsn must be the WAL location of the commit record when recording an async
* commit. For a synchronous commit it can be InvalidXLogRecPtr, since the
* caller guarantees the commit record is already flushed in that case. It
* should be InvalidXLogRecPtr for abort cases, too.
*
+ * In the commit case, atomicity is limited by whether all the subxids are in
+ * the same CLOG page as xid. If they all are, then the lock will be grabbed
+ * only once, and the status will be set to committed directly. Otherwise
+ * we must
+ * 1. set sub-committed all subxids that are not on the same page as the
+ * main xid
+ * 2. atomically set committed the main xid and the subxids on the same page
+ * 3. go over the first bunch again and set them committed
+ * Note that as far as concurrent checkers are concerned, main transaction
+ * commit as a whole is still atomic.
+ *
+ * Example:
+ * TransactionId t commits and has subxids t1, t2, t3, t4
+ * t is on page p1, t1 is also on p1, t2 and t3 are on p2, t4 is on p3
+ * 1. update pages2-3:
+ * page2: set t2,t3 as sub-committed
+ * page3: set t4 as sub-committed
+ * 2. update page1:
+ * set t1 as sub-committed,
+ * then set t as committed,
+ then set t1 as committed
+ * 3. update pages2-3:
+ * page2: set t2,t3 as committed
+ * page3: set t4 as committed
+ *
* NB: this is a low-level routine and is NOT the preferred entry point
! * for most uses; functions in transam.c are the intended callers.
! *
! * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
! * but aren't yet in cache, as well as hinting pages not to fall out of
! * cache yet.
*/
void
! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
! {
! int pageno = TransactionIdToPage(xid); /* get page of parent */
! int i;
!
! Assert(status == TRANSACTION_STATUS_COMMITTED ||
! status == TRANSACTION_STATUS_ABORTED);
!
! /*
! * See how many subxids, if any, are on the same page as the parent, if any.
! */
! for (i = 0; i < nsubxids; i++)
! {
! if (TransactionIdToPage(subxids[i]) != pageno)
! break;
! }
!
! /*
! * Do all items fit on a single page?
! */
! if (i == nsubxids)
! {
! /*
! * Set the parent and all subtransactions in a single call
! */
! TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
! pageno);
! }
! else
! {
! int nsubxids_on_first_page = i;
!
! /*
! * If this is a commit then we care about doing this correctly (i.e.
! * using the subcommitted intermediate status). By here, we know we're
! * updating more than one page of clog, so we must mark entries that
! * are *not* on the first page so that they show as subcommitted before
! * we then return to update the status to fully committed.
! *
! * To avoid touching the first page twice, skip marking subcommitted
! * for the subxids on that first page.
! */
! if (status == TRANSACTION_STATUS_COMMITTED)
! set_status_by_pages(nsubxids - nsubxids_on_first_page,
! subxids + nsubxids_on_first_page,
! TRANSACTION_STATUS_SUB_COMMITTED, lsn);
!
! /*
! * Now set the parent and subtransactions on same page as the parent,
! * if any
! */
! pageno = TransactionIdToPage(xid);
! TransactionIdSetPageStatus(xid, nsubxids_on_first_page, subxids, status,
! lsn, pageno);
!
! /*
! * Now work through the rest of the subxids one clog page at a time,
! * starting from the second page onwards, like we did above.
! */
! set_status_by_pages(nsubxids - nsubxids_on_first_page,
! subxids + nsubxids_on_first_page,
! status, lsn);
! }
! }
!
! /*
! * Helper for TransactionIdSetTreeStatus: set the status for a bunch of
! * transactions, chunking in the separate CLOG pages involved. We never
! * pass the whole transaction tree to this function, only subtransactions
! * that are on different pages to the top level transaction id.
! */
! static void
! set_status_by_pages(int nsubxids, TransactionId *subxids,
! XidStatus status, XLogRecPtr lsn)
! {
! int pageno = TransactionIdToPage(subxids[0]);
! int offset = 0;
! int i = 0;
!
! while (i < nsubxids)
! {
! int num_on_page = 0;
!
! while (TransactionIdToPage(subxids[i]) == pageno && i < nsubxids)
! {
! num_on_page++;
! i++;
! }
!
! TransactionIdSetPageStatus(InvalidTransactionId,
! num_on_page, subxids + offset,
! status, lsn, pageno);
! offset = i;
! pageno = TransactionIdToPage(subxids[offset]);
! }
! }
!
! /*
! * Record the final state of transaction entries in the commit log for
! * all entries on a single page. Atomic only on this page.
! *
! * Otherwise API is same as TransactionIdSetTreeStatus()
! */
! static void
! TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status,
! XLogRecPtr lsn, int pageno)
{
int slotno;
! int i;
Assert(status == TRANSACTION_STATUS_COMMITTED ||
status == TRANSACTION_STATUS_ABORTED ||
! (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
***************
*** 116,124 ****
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think. Hence, pass write_ok = XLogRecPtrIsInvalid(lsn).
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
--- 266,324 ----
* mustn't let it reach disk until we've done the appropriate WAL flush.
* But when lsn is invalid, it's OK to scribble on a page while it is
* write-busy, since we don't care if the update reaches disk sooner than
! * we think.
*/
slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+
+ if (TransactionIdIsValid(xid))
+ {
+ /*
+ * If we update more than one xid on this page while it is being
+ * written out, we might find that some of the bits go to disk and others
+ * don't. If we are updating commits on the page with the top-level xid
+ * that could break atomicity, so we subcommit the subxids first before
+ * we mark the top-level commit.
+ */
+ if (status == TRANSACTION_STATUS_COMMITTED)
+ {
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i],
+ TRANSACTION_STATUS_SUB_COMMITTED,
+ lsn, slotno);
+ }
+ }
+
+ /* Set the main transaction id, if any */
+ TransactionIdSetStatusBit(xid, status, lsn, slotno);
+ }
+
+ /* Set the subtransactions */
+ for (i = 0; i < nsubxids; i++)
+ {
+ Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+ TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+ }
+
+ ClogCtl->shared->page_dirty[slotno] = true;
+
+ LWLockRelease(CLogControlLock);
+ }
+
+ /*
+ * Sets the commit status of a single transaction.
+ *
+ * Must be called with CLogControlLock held
+ */
+ static void
+ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+ {
+ int byteno = TransactionIdToByte(xid);
+ int bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+ char *byteptr;
+ char byteval;
+
byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
/* Current state should be 0, subcommitted or target state */
***************
*** 132,139 ****
byteval |= (status << bshift);
*byteptr = byteval;
- ClogCtl->shared->page_dirty[slotno] = true;
-
/*
* Update the group LSN if the transaction completion LSN is higher.
*
--- 332,337 ----
***************
*** 150,156 ****
ClogCtl->shared->group_lsn[lsnindex] = lsn;
}
- LWLockRelease(CLogControlLock);
}
/*
--- 348,353 ----
Index: src/backend/access/transam/transam.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/transam.c,v
retrieving revision 1.76
diff -c -r1.76 transam.c
*** src/backend/access/transam/transam.c 26 Mar 2008 18:48:59 -0000 1.76
--- src/backend/access/transam/transam.c 7 Oct 2008 11:38:52 -0000
***************
*** 40,54 ****
/* Local functions */
static XidStatus TransactionLogFetch(TransactionId transactionId);
- static void TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn);
/* ----------------------------------------------------------------
* Postgres log access method interface
*
* TransactionLogFetch
- * TransactionLogUpdate
* ----------------------------------------------------------------
*/
--- 40,51 ----
***************
*** 100,140 ****
return xidstatus;
}
- /*
- * TransactionLogUpdate
- *
- * Store the new status of a transaction. The commit record LSN must be
- * passed when recording an async commit; else it should be InvalidXLogRecPtr.
- */
- static inline void
- TransactionLogUpdate(TransactionId transactionId,
- XidStatus status, XLogRecPtr lsn)
- {
- /*
- * update the commit log
- */
- TransactionIdSetStatus(transactionId, status, lsn);
- }
-
- /*
- * TransactionLogMultiUpdate
- *
- * Update multiple transaction identifiers to a given status.
- * Don't depend on this being atomic; it's not.
- */
- static inline void
- TransactionLogMultiUpdate(int nxids, TransactionId *xids,
- XidStatus status, XLogRecPtr lsn)
- {
- int i;
-
- Assert(nxids != 0);
-
- for (i = 0; i < nxids; i++)
- TransactionIdSetStatus(xids[i], status, lsn);
- }
-
-
/* ----------------------------------------------------------------
* Interface functions
*
--- 97,102 ----
***************
*** 144,154 ****
* these functions test the transaction status of
* a specified transaction id.
*
! * TransactionIdCommit
! * TransactionIdAbort
* ========
! * these functions set the transaction status
! * of the specified xid.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
--- 106,117 ----
* these functions test the transaction status of
* a specified transaction id.
*
! * TransactionIdCommitTree
! * TransactionIdAsyncCommitTree
! * TransactionIdAbortTree
* ========
! * these functions set the transaction status of the specified
! * transaction tree.
*
* See also TransactionIdIsInProgress, which once was in this module
* but now lives in procarray.c.
***************
*** 287,362 ****
return false;
}
-
- /*
- * TransactionIdCommit
- * Commits the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- */
- void
- TransactionIdCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdAsyncCommit
- * Same as above, but for async commits. The commit record LSN is needed.
- */
- void
- TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_COMMITTED, lsn);
- }
-
- /*
- * TransactionIdAbort
- * Aborts the transaction associated with the identifier.
- *
- * Note:
- * Assumes transaction identifier is valid.
- * No async version of this is needed.
- */
- void
- TransactionIdAbort(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_ABORTED,
- InvalidXLogRecPtr);
- }
-
- /*
- * TransactionIdSubCommit
- * Marks the subtransaction associated with the identifier as
- * sub-committed.
- *
- * Note:
- * No async version of this is needed.
- */
- void
- TransactionIdSubCommit(TransactionId transactionId)
- {
- TransactionLogUpdate(transactionId, TRANSACTION_STATUS_SUB_COMMITTED,
- InvalidXLogRecPtr);
- }
-
/*
* TransactionIdCommitTree
! * Marks all the given transaction ids as committed.
*
! * The caller has to be sure that this is used only to mark subcommitted
! * subtransactions as committed, and only *after* marking the toplevel
! * parent as committed. Otherwise there is a race condition against
! * TransactionIdDidCommit.
*/
void
! TransactionIdCommitTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
--- 250,271 ----
return false;
}
/*
* TransactionIdCommitTree
! * Marks the given transaction and children as committed
*
! * "xid" is a toplevel transaction commit, and the xids array contains its
! * committed subtransactions.
! *
! * This commit operation is not guaranteed to be atomic, but if not, subxids
! * are correctly marked subcommit first.
*/
void
! TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED,
! InvalidXLogRecPtr);
}
/*
***************
*** 364,392 ****
* Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_COMMITTED,
! lsn);
}
/*
* TransactionIdAbortTree
! * Marks all the given transaction ids as aborted.
*
* We don't need to worry about the non-atomic behavior, since any onlookers
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(int nxids, TransactionId *xids)
{
! if (nxids > 0)
! TransactionLogMultiUpdate(nxids, xids, TRANSACTION_STATUS_ABORTED,
! InvalidXLogRecPtr);
}
-
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
--- 273,302 ----
* Same as above, but for async commits. The commit record LSN is needed.
*/
void
! TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
! XLogRecPtr lsn)
{
! return TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_COMMITTED, lsn);
}
/*
* TransactionIdAbortTree
! * Marks the given transaction and children as aborted.
! *
! * "xid" is a toplevel transaction commit, and the xids array contains its
! * committed subtransactions.
*
* We don't need to worry about the non-atomic behavior, since any onlookers
* will consider all the xacts as not-yet-committed anyway.
*/
void
! TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
{
! TransactionIdSetTreeStatus(xid, nxids, xids,
! TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
/*
* TransactionIdPrecedes --- is id1 logically < id2?
*/
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.45
diff -c -r1.45 twophase.c
*** src/backend/access/transam/twophase.c 11 Aug 2008 11:05:10 -0000 1.45
--- src/backend/access/transam/twophase.c 7 Oct 2008 11:38:52 -0000
***************
*** 1745,1753 ****
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
--- 1745,1751 ----
XLogFlush(recptr);
/* Mark the transaction committed in pg_clog */
! TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->inCommit = false;
***************
*** 1822,1829 ****
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
}
--- 1820,1826 ----
* Mark the transaction aborted in clog. This is not absolutely necessary
* but we may as well do it while we are here.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
}
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.265
diff -c -r1.265 xact.c
*** src/backend/access/transam/xact.c 11 Aug 2008 11:05:10 -0000 1.265
--- src/backend/access/transam/xact.c 7 Oct 2008 11:38:52 -0000
***************
*** 254,260 ****
static TransactionId RecordTransactionAbort(bool isSubXact);
static void StartTransaction(void);
- static void RecordSubTransactionCommit(void);
static void StartSubTransaction(void);
static void CommitSubTransaction(void);
static void AbortSubTransaction(void);
--- 254,259 ----
***************
*** 952,962 ****
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! {
! TransactionIdCommit(xid);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdCommitTree(nchildren, children);
! }
}
else
{
--- 951,957 ----
* Now we may update the CLOG, if we wrote a COMMIT record above
*/
if (markXidCommitted)
! TransactionIdCommitTree(xid, nchildren, children);
}
else
{
***************
*** 974,984 ****
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! {
! TransactionIdAsyncCommit(xid, XactLastRecEnd);
! /* to avoid race conditions, the parent must commit first */
! TransactionIdAsyncCommitTree(nchildren, children, XactLastRecEnd);
! }
}
/*
--- 969,975 ----
* flushed before the CLOG may be updated.
*/
if (markXidCommitted)
! TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
}
/*
***************
*** 1156,1191 ****
s->maxChildXids = 0;
}
- /*
- * RecordSubTransactionCommit
- */
- static void
- RecordSubTransactionCommit(void)
- {
- TransactionId xid = GetCurrentTransactionIdIfAny();
-
- /*
- * We do not log the subcommit in XLOG; it doesn't matter until the
- * top-level transaction commits.
- *
- * We must mark the subtransaction subcommitted in the CLOG if it had a
- * valid XID assigned. If it did not, nobody else will ever know about
- * the existence of this subxact. We don't have to deal with deletions
- * scheduled for on-commit here, since they'll be reassigned to our parent
- * (who might still abort).
- */
- if (TransactionIdIsValid(xid))
- {
- /* XXX does this really need to be a critical section? */
- START_CRIT_SECTION();
-
- /* Record subtransaction subcommit */
- TransactionIdSubCommit(xid);
-
- END_CRIT_SECTION();
- }
- }
-
/* ----------------------------------------------------------------
* AbortTransaction stuff
* ----------------------------------------------------------------
--- 1147,1152 ----
***************
*** 1288,1301 ****
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
- *
- * The ordering here isn't critical but it seems best to mark the parent
- * first. This assures an atomic transition of all the subtransactions to
- * aborted state from the point of view of concurrent
- * TransactionIdDidAbort calls.
*/
! TransactionIdAbort(xid);
! TransactionIdAbortTree(nchildren, children);
END_CRIT_SECTION();
--- 1249,1256 ----
* waiting for already-aborted subtransactions. It is OK to do it without
* having flushed the ABORT record to disk, because in event of a crash
* we'd be assumed to have aborted anyway.
*/
! TransactionIdAbortTree(xid, nchildren, children);
END_CRIT_SECTION();
***************
*** 3791,3798 ****
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /* Mark subtransaction as subcommitted */
! RecordSubTransactionCommit();
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
--- 3746,3757 ----
/* Must CCI to ensure commands of subtransaction are seen as done */
CommandCounterIncrement();
! /*
! * Prior to 8.4 we marked subcommit in clog at this point.
! * We now only perform that step, if required, as part of the
! * atomic update of the whole transaction tree at top level
! * commit or abort.
! */
/* Post-commit cleanup */
if (TransactionIdIsValid(s->transactionId))
***************
*** 4259,4269 ****
TransactionId max_xid;
int i;
! TransactionIdCommit(xid);
!
! /* Mark committed subtransactions as committed */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4218,4226 ----
TransactionId max_xid;
int i;
! /* Mark the transaction committed in pg_clog */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdCommitTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
***************
*** 4299,4309 ****
TransactionId max_xid;
int i;
! TransactionIdAbort(xid);
!
! /* Mark subtransactions as aborted */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
--- 4256,4264 ----
TransactionId max_xid;
int i;
! /* Mark the transaction aborted in pg_clog */
sub_xids = (TransactionId *) &(xlrec->xnodes[xlrec->nrels]);
! TransactionIdAbortTree(xid, xlrec->nsubxacts, sub_xids);
/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
Index: src/include/access/clog.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/clog.h,v
retrieving revision 1.21
diff -c -r1.21 clog.h
*** src/include/access/clog.h 1 Jan 2008 19:45:56 -0000 1.21
--- src/include/access/clog.h 7 Oct 2008 11:38:52 -0000
***************
*** 32,38 ****
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
--- 32,39 ----
#define NUM_CLOG_BUFFERS 8
! extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
extern Size CLOGShmemSize(void);
Index: src/include/access/transam.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/transam.h,v
retrieving revision 1.65
diff -c -r1.65 transam.h
*** src/include/access/transam.h 11 Mar 2008 20:20:35 -0000 1.65
--- src/include/access/transam.h 7 Oct 2008 11:38:52 -0000
***************
*** 139,151 ****
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
- extern void TransactionIdCommit(TransactionId transactionId);
- extern void TransactionIdAsyncCommit(TransactionId transactionId, XLogRecPtr lsn);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdSubCommit(TransactionId transactionId);
! extern void TransactionIdCommitTree(int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
--- 139,148 ----
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
extern void TransactionIdAbort(TransactionId transactionId);
! extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
! extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
! extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
Simon Riggs wrote:
Renamed to set_status_by_pages because we never use this on the whole
tree. Added comments to say that.Overall, cleaner and more readable now. Thanks.
Committed, thanks.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, 2008-10-20 at 16:23 -0300, Alvaro Herrera wrote:
Simon Riggs wrote:
Renamed to set_status_by_pages because we never use this on the whole
tree. Added comments to say that.Overall, cleaner and more readable now. Thanks.
Committed, thanks.
Cheers.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support