nested xacts and phantom Xids
Ok people,
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.
Please review, test and comment.
Missing: rollback of SET CONSTRAINTS and GUC vars.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Y una voz del caos me habl� y me dijo
"Sonr�e y s� feliz, podr�a ser peor".
Y sonre�. Y fui feliz.
Y fue peor.
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.
I looked at the phantom XIDs stuff a bit. I still have little confidence
that the concept is correct :-( but here are some comments on the code
level.
+ * They are marked immediately in pg_subtrans as direct childs of the topmost + * Xid (no matter how deep we are in the transaction tree),
Why is that a good idea --- won't it cause problems when you
commit/abort an intermediate level of subtransaction?
+ * All this happens when HeapTupleHeaderSetXmax is called and the Xmin of the + * tuple is one of the Xids of the current transaction. + * + * Note that HeapTupleHeaderGetXmax/GetXmin return the masqueraded Xmin and + * Xmax, not the real one in the tuple header.
I think that putting this sort of logic into Set/Get Xmin/Xmax is a
horrid idea. They are supposed to be transparent fetch/store macros,
not arbitrarily-complex filter functions. They're also supposed to
be reasonably fast :-(
+ * ... We know to do this when SetXmax is called + * on a tuple that has the HEAP_MARKED_FOR_UPDATE bit set.
Uglier and uglier.
***************
*** 267,274 ****
return true;/* deleting subtransaction aborted */
! if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
! return true;Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
--- 268,276 ---- return true;/* deleting subtransaction aborted */
! if (tuple->t_infomask & HEAP_XMIN_IS_PHANTOM)
! if (TransactionPhantomIsCommittedPhantom(HeapTupleHeaderGetPhantomId(tuple)))
! return true;Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
Er, what happened to checking for the "deleting subtransaction aborted"
case?
On the whole, I think this was an interesting exercise but also an utter
failure. We'd be far better off to eat the extra 4 bytes per tuple
header and put back the separate Xmax field. The costs in both runtime
and complexity/reliability of this patch are simply not justified by
a 4-byte savings.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.I looked at the phantom XIDs stuff a bit. I still have little confidence
that the concept is correct :-( but here are some comments on the code
level.+ * They are marked immediately in pg_subtrans as direct childs of the topmost + * Xid (no matter how deep we are in the transaction tree),Why is that a good idea --- won't it cause problems when you
commit/abort an intermediate level of subtransaction?
I don't think so. The phantom xid is used only by the outside
transaction waiting for that tuple to be committe or aborted. The
outside tranaction will sleep on the topmost xid completing, then check
the phantom xid status for commit/abort. Within the transaction, I think
he uses command counter to know the creation and destruction sub-xid.
I think right now phantom xids are used only for making parts of a
subtransaction committed or aborted and only in cases where the tuple is
created and destroyed by parts of the same transaction tree.
I don't feel too bad about the runtime cost if only subtransactions are
paying that cost. I know we are really stretching the system here but I
would like to try a little more rather than give up and taking a space
hit for all tuples.
---------------------------------------------------------------------------
+ * All this happens when HeapTupleHeaderSetXmax is called and the Xmin of the + * tuple is one of the Xids of the current transaction. + * + * Note that HeapTupleHeaderGetXmax/GetXmin return the masqueraded Xmin and + * Xmax, not the real one in the tuple header.I think that putting this sort of logic into Set/Get Xmin/Xmax is a
horrid idea. They are supposed to be transparent fetch/store macros,
not arbitrarily-complex filter functions. They're also supposed to
be reasonably fast :-(+ * ... We know to do this when SetXmax is called + * on a tuple that has the HEAP_MARKED_FOR_UPDATE bit set.Uglier and uglier.
***************
*** 267,274 ****
return true;/* deleting subtransaction aborted */
! if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple)))
! return true;Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
--- 268,276 ---- return true;/* deleting subtransaction aborted */
! if (tuple->t_infomask & HEAP_XMIN_IS_PHANTOM)
! if (TransactionPhantomIsCommittedPhantom(HeapTupleHeaderGetPhantomId(tuple)))
! return true;Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)));
Er, what happened to checking for the "deleting subtransaction aborted"
case?On the whole, I think this was an interesting exercise but also an utter
failure. We'd be far better off to eat the extra 4 bytes per tuple
header and put back the separate Xmax field. The costs in both runtime
and complexity/reliability of this patch are simply not justified by
a 4-byte savings.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote:
Tom Lane wrote:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.I looked at the phantom XIDs stuff a bit. I still have little confidence
that the concept is correct :-( but here are some comments on the code
level.+ * They are marked immediately in pg_subtrans as direct childs of the topmost + * Xid (no matter how deep we are in the transaction tree),Why is that a good idea --- won't it cause problems when you
commit/abort an intermediate level of subtransaction?I don't think so. The phantom xid is used only by the outside
transaction waiting for that tuple to be committe or aborted. The
outside tranaction will sleep on the topmost xid completing, then check
the phantom xid status for commit/abort. Within the transaction, I think
he uses command counter to know the creation and destruction sub-xid.I think right now phantom xids are used only for making parts of a
subtransaction committed or aborted and only in cases where the tuple is
created and destroyed by parts of the same transaction tree.I don't feel too bad about the runtime cost if only subtransactions are
paying that cost. I know we are really stretching the system here but I
would like to try a little more rather than give up and taking a space
hit for all tuples.
Let me add that outside transactions read those phantom xid only when
they are doing dirty reads. What I don't understand is when do outside
transactions see tuples created inside a transaction? INSERT into a
table with a unique key?
Once the main transaction commits, these phantom tuples should work just
like ordinary tuples except they get their cmin overwritten when they
are expired, I think.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I don't feel too bad about the runtime cost if only subtransactions are
paying that cost.
That's exactly why I'm so exercised about what's been done to the
HeapTupleSet/Get macros. That's significant cost that's paid even when
you're not using *any* of this stuff.
I know we are really stretching the system here but I
would like to try a little more rather than give up and taking a space
hit for all tuples.
I don't even have any confidence that there are no fundamental bugs
in the phantom-xid concept :-(. I'd be willing to play along if an
implementation that seemed acceptable speedwise were being offered,
but this thing is not preferable to four-more-bytes even if it works.
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Let me add that outside transactions read those phantom xid only when
they are doing dirty reads. What I don't understand is when do outside
transactions see tuples created inside a transaction? INSERT into a
table with a unique key?
Once the main transaction commits, these phantom tuples should work just
like ordinary tuples except they get their cmin overwritten when they
are expired, I think.
You're not doing anything to increase my confidence level in this
concept. You invented it, and even you aren't sure how it works.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Let me add that outside transactions read those phantom xid only when
they are doing dirty reads. What I don't understand is when do outside
transactions see tuples created inside a transaction? INSERT into a
table with a unique key?Once the main transaction commits, these phantom tuples should work just
like ordinary tuples except they get their cmin overwritten when they
are expired, I think.You're not doing anything to increase my confidence level in this
concept. You invented it, and even you aren't sure how it works.
And your point is? I am trying to help Alvaro. I came up with an
idea, and some thought it would help solve the problem. What ideas do
you have to help him?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.
I took a very preliminary look through the nested-xacts patch, too.
Missing: rollback of SET CONSTRAINTS and GUC vars.
There's a good deal more than that missing :-(. Here are the modules or
actions that are called in CommitTransaction and/or AbortTransaction
that have not yet been touched by the patch:
shared-inval message queue processing (inval.c)
lo_commit
localbuf.c (refcounts need fixed same as bufmgr)
eoxactcallbacks API needs extension
GUC
namespace (temp namespace cleanup)
files (close/drop temp files)
reset current userid during xact abort
password file updating
SetReindexProcessing disable (can these be nested??)
Of these the one that I think most urgently needs attention is inval.c;
that is complicated code already and figuring out what should happen
for subtrans commit/abort is not trivial. The others look relatively
straightforward to fix, if tedious.
Given patches for inval.c and guc.c, I would say that the patch is
functionally close enough to done that we could commit to including
it in 7.5 --- the other stuff could be wrapped up post-feature-freeze.
BUT (you knew there was a but coming, right?) ... I am really really
disturbed about the potential performance hit from this patch.
I don't think we want to have a nested-transaction feature that imposes
significant overhead on people who aren't actually using nested
transactions. As I looked through the patch I found quite a few places
that would impose such overhead.
The main thing that's bothering me is that TransactionIdIsInProgress
has been turned from a relatively cheap test (just scan the in-memory
PGPROC array) into an exceedingly expensive one. It's gratutitously
badly coded (for N open main transactions it does N searches of the
subtrans tree, when one would do), but I don't really want it going to
the subtrans tree at all during typical cases. We had talked about
keeping a limited number of active subtrans IDs in the PGPROC array,
and I think we're really going to need to do that to buy back
performance here.
(BTW, what is the intended behavior of TransactionIdIsInProgress for
a committed or aborted subtransaction whose parent is still open?
If it has to recognize aborted subtranses as still in progress then
things get very messy, but I think it probably doesn't have to.)
I'm quite unhappy with the wide use of SubTransXidsHaveCommonAncestor to
replace TransactionIdEquals in tqual.c, too. This turns very cheap tests
into very expensive ones --- whether or not you use subtransactions ---
and unfortunately tqual.c is coded on the assumption that these tests
are cheap. We can *not* afford to repeat SubTransXidsHaveCommonAncestor
for every open main transaction every time we check a tuple. In many
ways this is the same problem as with TransactionIdIsInProgress, and
it probably needs a similar solution.
Also I found it annoying that XactLockTableWait waits for a top xact
even when target subxact has already aborted; possibly this is even a
recipe for deadlock. Perhaps should iterate up the tree one level at a
time, waiting for each? Would mean that subxacts have to acquire a
lock on their xid that other people can wait for, but this is not
necessarily bad. (In fact, if we do that then I don't think
XactLockTableWait has to change at all.)
So I think there are several must-fix performance issues as well before
we can make a decision to accept this for 7.5.
I'd feel better about this if we had another month, but with ten days
before feature freeze it's gonna be touch and go.
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
What ideas do you have to help him?
Eat the four-byte overhead.
Quite frankly, given the other functionality and performance issues
he has to solve in the next ten days (see my other message just now),
I think he'd be a fool to spend one more minute on phantom XIDs.
Maybe for 7.6 someone will have the time to make the idea work.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
What ideas do you have to help him?
Eat the four-byte overhead.
Quite frankly, given the other functionality and performance issues
he has to solve in the next ten days (see my other message just now),
I think he'd be a fool to spend one more minute on phantom XIDs.
Maybe for 7.6 someone will have the time to make the idea work.
Well, the problem with eating the 4-byte overhead is that it has a
performance/storage impact for all installations, not just those that
use nested transactions. You were discussing a performance impact for
nested transactions, but I assume that is an impact when some on the
server are using nested transactions and others are not, while this hit
would be for all sites, even when no one is using nested transactions.
However, given the list you just supplied, I agree Alvaro should focus
on these items and add the 4-bytes to store needed values rather than
continue to hack on phantom xids. We can certainly revisit that later,
even in 7.6, or never. We can put Manfred on it. He did the compaction
in the first place. :-)
One point is that if we want to re-add those 4 bytes, we have to get
community buy-in for it because it is a step backward for those who are
not going to use nested transactions. (I don't want to go the
compile-time switch route.)
As far as cleaning up some of the items after feature freeze, well, I
would like to avoid that, but it seems safe to do because the changes
should be very localized, and because win32 will be cleaning up things
during feature freeze too, I am sure.
However, going into feature freeze knowing a features isn't 100%
functional is something we try to avoid, so this would have to be a
special exception based on the fact that the features is very difficult,
and that Alvaro is working full-time on this. It does take us down the
slippery slope of having the release process dictated by completion of
nested transactions.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Sun, Jun 20, 2004 at 04:37:16PM -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.I looked at the phantom XIDs stuff a bit. I still have little confidence
that the concept is correct :-( but here are some comments on the code
level.
Ok. I for one think this got much more complex than I had originally
thought it would be. I agree the changes to Set/Get Xmin/Xmax are way
beyond what one would want, but the alternative would be to spread the
complexity into their callers and I think that would be much worse.
I don't have a lot of confidence in this either. The patch will be
available in archives if anybody wants to implement this in a cleaner
and safer way; I'll continue working on the rest of the things you
pointed out in the subtransactions patch.
Sadly, something broke in a really strange way between my last cvs
update and today's, because I can't get the patch to initdb. It
compiles without warnings (and I did make distclean), but there's a
weird bug I'll have to pursue.
Regarding the invalidation messages, what I'm currently looking at is to
add a TransactionId to each message, which will be CurrentTransactionId
for each new message. When a subxact commits, all its messages are
relabeled to its parent. When a subxact aborts, all messages labeled
with its TransactionId are locally processed and discarded. This is
tricky because chunks are prepended to the queue, but it also means we
can stop processing as soon as a message belongs to another transaction.
I think GUC vars are easier: when a variable is about to change value
inside a subtransaction, save the previous value in a list from which it
will be restored if the subtransaction later aborts.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hay dos momentos en la vida de un hombre en los que no deber�a
especular: cuando puede permit�rselo y cuando no puede" (Mark Twain)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Regarding the invalidation messages, what I'm currently looking at is to
add a TransactionId to each message, which will be CurrentTransactionId
for each new message. When a subxact commits, all its messages are
relabeled to its parent. When a subxact aborts, all messages labeled
with its TransactionId are locally processed and discarded.
Sounds plausible offhand.
This is tricky because chunks are prepended to the queue, but it also
means we can stop processing as soon as a message belongs to another
transaction.
AFAIR there isn't any essential semantics to the ordering of the queue
entries, so you can probably get away with reordering if that makes life
any easier.
Also, rather than labeling each entry individually, it might be better
to keep a separate list for each level of transaction. Then instead of
relabeling, you'd just concat the subtrans list to its parent's. Seems
like this should be faster and less storage.
regards, tom lane
On Mon, Jun 21, 2004 at 10:28:59PM -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
This is tricky because chunks are prepended to the queue, but it also
means we can stop processing as soon as a message belongs to another
transaction.AFAIR there isn't any essential semantics to the ordering of the queue
entries, so you can probably get away with reordering if that makes life
any easier.Also, rather than labeling each entry individually, it might be better
to keep a separate list for each level of transaction. Then instead of
relabeling, you'd just concat the subtrans list to its parent's. Seems
like this should be faster and less storage.
Right, but it makes harder to detect when a duplicate relcache entry is
going to be inserted. Judging from the commentary in the file this is
an issue.
Another idea I had was:
1. at subtransaction start, add a special "boundary" message carrying
the new Xid.
2. at subtransaction abort, process the first chunk backwards until the
boundary message with the corresponding Xid is found; if it's not, jump
to the next and repeat.
At subtransaction commit nothing special needs to be done.
This avoids having to label each message and to change the message
appending scheme.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
On Mon, Jun 21, 2004 at 10:28:59PM -0400, Tom Lane wrote:
Also, rather than labeling each entry individually, it might be better
to keep a separate list for each level of transaction. Then instead of
relabeling, you'd just concat the subtrans list to its parent's. Seems
like this should be faster and less storage.
Right, but it makes harder to detect when a duplicate relcache entry is
going to be inserted. Judging from the commentary in the file this is
an issue.
It's only a minor efficiency hack; don't get too tense about avoiding dups.
(We don't bother to check for dup catcache-inval entries at all...)
The only thing I'd suggest you need to preserve is the business about
invalidating catcache before relcache; again that's just an efficiency
hack, but it seems simple enough to be worth doing.
regards, tom lane
On Sun, Jun 20, 2004 at 08:49:22PM -0400, Tom Lane wrote:
Of these the one that I think most urgently needs attention is inval.c;
that is complicated code already and figuring out what should happen
for subtrans commit/abort is not trivial. The others look relatively
straightforward to fix, if tedious.
A patch for this is attached. It _seems_ to work ok; conceptually it
seems ok too. I have done some testing which tends to indicate that it
is right, but I'm not sure of all the implications this code has so I
don't know how to test it exhaustively. Of course, regression tests
pass, but this doesn't actually prove anything.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"�Que diferencia tiene para los muertos, los hu�rfanos, y aquellos que han
perdido su hogar, si la loca destrucci�n ha sido realizada bajo el nombre
del totalitarismo o del santo nombre de la libertad y la democracia?" (Gandhi)
Attachments:
inval.patchtext/plain; charset=us-asciiDownload+215-46
Alvaro Herrera wrote:
On Sun, Jun 20, 2004 at 04:37:16PM -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
Here I present the nested transactions patch and the phantom Xids patch
that goes with it.I looked at the phantom XIDs stuff a bit. I still have little confidence
that the concept is correct :-( but here are some comments on the code
level.Ok. I for one think this got much more complex than I had originally
thought it would be. I agree the changes to Set/Get Xmin/Xmax are way
beyond what one would want, but the alternative would be to spread the
complexity into their callers and I think that would be much worse.I don't have a lot of confidence in this either. The patch will be
available in archives if anybody wants to implement this in a cleaner
and safer way; I'll continue working on the rest of the things you
pointed out in the subtransactions patch.
I am sorry to have given Alvaro another idea that didn't work. However,
thinking of options, I wonder if instead of phantom xids, we should do
phantom cids. Because only the local backend looks at the command
counter (cid). I think it might be alot cleaner. The phantom cid would
have a tuple bit set indicating that instead of being a cid, it is an
index into an array of cmin/cmax pairs.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
If we add nested transactions for 7.5, are we going to need savepoints
too in the same release? If we don't, are we going to get a lot of
complaints?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Wed, Jun 23, 2004 at 08:58:15AM -0400, Bruce Momjian wrote:
If we add nested transactions for 7.5, are we going to need savepoints
too in the same release? If we don't, are we going to get a lot of
complaints?
It'd be good to have savepoints right now. I'm not sure it'd be good to
expose the nested transactions implementation if we are going to offer
savepoints later, because it means we will have to keep nested
transactions forever.
Maybe it is a good idea to hide the implementation details and offer
only the standard SAVEPOINT feature. I'm not sure how hard it is to
change the syntax; I don't think it'd be a big deal.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La conclusi�n que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusi�n de ellos" (Tanenbaum)
On Wed, Jun 23, 2004 at 08:57:11AM -0400, Bruce Momjian wrote:
I am sorry to have given Alvaro another idea that didn't work.
It allowed me to learn a lot of cool tricks, so it wasn't wasted work.
The only think I'm sorry about is that I should have used the time for
something more useful, like tackling the remaining problems in the
nested xacts implementation proper.
However, thinking of options, I wonder if instead of phantom xids, we
should do phantom cids. Because only the local backend looks at the
command counter (cid). I think it might be alot cleaner. The phantom
cid would have a tuple bit set indicating that instead of being a cid,
it is an index into an array of cmin/cmax pairs.
Yeah, maybe this can work. I'm not going to try however, at least not
now. If somebody else wants to try, be my guest.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El conflicto es el camino real hacia la uni�n"
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
It'd be good to have savepoints right now. I'm not sure it'd be good to
expose the nested transactions implementation if we are going to offer
savepoints later, because it means we will have to keep nested
transactions forever.
Nested transactions are *good*. We should not hide them.
I don't object to offering spec-compatible savepoints, but the plain
fact of the matter is that that's a dumbed-down API. We should not
feel constrained to offer only that.
regards, tom lane