AW: AW: AW: AW: WAL-based allocation of XIDs is insecur e
After thinking about this a little, I believe I see why Vadim did it
the way he did. Suppose we tried to make the code sequence beobtain write lock on buffer;
XLogOriginalPage(buffer); // copy page to xlog if first since ckpt
modify buffer;
XLogInsert(xlog entry for modification);
mark buffer dirty and release write lock;so that the saving of the original page is a separate xlog entry from
the modification data. Looks easy, and it'd sure simplify XLogInsert
a lot. The only problem is it's wrong. What if a checkpoint occurs
between the two XLOG records?The decision whether to log the whole buffer has to be atomic with the
actual entry of the xlog record. Unless we want to hold the xlog insert
lock for the entire time that we're (eg) splitting a btree page, that
means we log the buffer after the modification work is done, not before.
Yes, I see. Can't currently come up with a workaround eighter. Hmm ..
Duplicating the buffer is probably not a workable solution.
I do not however see how the current solution fixes the original problem,
that we don't have a rollback for index modifications.
The index would potentially point to an empty heaptuple slot.
When this slot, because marked empty is reused after startup, the index points
to the wrong record.
Unless of course startup rollforward visits all heap pages pointed at
by index xlog records and inserts a tuple into heap marked deleted.
Additionally I do not see how this all works for userland index types.
In short I do not think that the current implementation of "physical log" does
what it was intended to do :-(
Andreas
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
I do not however see how the current solution fixes the original problem,
that we don't have a rollback for index modifications.
The index would potentially point to an empty heaptuple slot.
How? There will be an XLOG entry inserting the heap tuple before the
XLOG entry that updates the index. Rollforward will redo both. The
heap tuple might not get committed, but it'll be there.
Additionally I do not see how this all works for userland index types.
None of it works for index types that don't do XLOG entries (which I
think may currently be true for everything except btree :-( ...). I
don't see how that changes if we alter the way this bit is done.
regards, tom lane