Transaction system (proposal for 6.5)
Hi...
I was thinking in a major rewrite of the PostrgreSQL transaction
system, in order to provide less tuple overhead and recoverabilty.
My first goal is to reduce tuple overhead, getting rid of xmin/xman and
cmin/cmax. To provide this functionality, I'm planning to keep only a
flag indicating if the transaction is in curse or not. If, during a
transaction, a certain tuple is affected, this flag will store the
current transaction id. Thus, if this tuple is commited, an invalid OID
(say, 0), will be written to this flag.
The only problem a saw using this approach is if some pages got flushed
during the transaction, because these pages will have to be reload from
disk.
To address the problem of non-functional update, I pretend to store a
command identifier with the tuple, and, during update, see if the cid of
a tuple is equal of the current cid of this transaction (like we do
today).
To keep track of current transactions, there will have a list of tuples
affected by this transaction, and the operation executed. This way,
during commit, we only confirm these operations in relations (writing an
invalid OID in current xid of each tuple affected). To rollback, we
delete the new tuples (and mark this operation as a commit) and mark the
old tuples affected as "live" (and leave these commited).
I'm thinking of leave a transaction id for each new backend, and
postmaster will keep track of used transaction ids. This way, there is
no need to keep a list of transactions in shared memory.
For recovery (my second goal), I pretend to, at startup of postmaster,
to rollback all marked in-curse transactions. After that, I'm thinking
about a redo log, but I'm still searching a way to keep it with the
minimum size possible.
Sugestions? Comments?
Robson.
On Wed, Sep 16, 1998 at 06:35:53PM -0300, Robson Miranda wrote:
I was thinking in a major rewrite of the PostrgreSQL transaction
system, in order to provide less tuple overhead and recoverabilty.
I do not have much of an idea how postgres handles stuff right now, so
forgive me if I'm asking stupid questions.
My first goal is to reduce tuple overhead, getting rid of xmin/xman and
cmin/cmax. To provide this functionality, I'm planning to keep only a
flag indicating if the transaction is in curse or not. If, during a
transaction, a certain tuple is affected, this flag will store the
current transaction id. Thus, if this tuple is commited, an invalid OID
(say, 0), will be written to this flag.
That means you store one flag per tuple? Does this happen only in memory?
The only problem a saw using this approach is if some pages got flushed
during the transaction, because these pages will have to be reload from
disk.
Ah yes, it seems to be in memory only. And you exactly point to one problem.
Any idea how to solve this?
To keep track of current transactions, there will have a list of tuples
affected by this transaction, and the operation executed. This way,
during commit, we only confirm these operations in relations (writing an
invalid OID in current xid of each tuple affected). To rollback, we
delete the new tuples (and mark this operation as a commit) and mark the
old tuples affected as "live" (and leave these commited).
That means we always have both in the relation? That is we write the new
tuple in and keep the old one? Is this done the same way in the actual
version? I'd prefer to have a clean cut with new and old not being in the
same table at the same time.
For recovery (my second goal), I pretend to, at startup of postmaster,
to rollback all marked in-curse transactions. After that, I'm thinking
about a redo log, but I'm still searching a way to keep it with the
minimum size possible.
Where's the problem with a redo log?
Michael
--
Dr. Michael Meskes | Th.-Heuss-Str. 61, D-41812 Erkelenz | Go SF49ers!
Senior-Consultant | business: Michael.Meskes@mummert.de | Go Rhein Fire!
Mummert+Partner | private: Michael.Meskes@usa.net | Use Debian
Unternehmensberatung AG | Michael.Meskes@gmx.net | GNU/Linux!
Hi...
I was thinking in a major rewrite of the PostrgreSQL transaction
system, in order to provide less tuple overhead and recoverabilty.My first goal is to reduce tuple overhead, getting rid of xmin/xman and
cmin/cmax. To provide this functionality, I'm planning to keep only a
flag indicating if the transaction is in curse or not. If, during a
transaction, a certain tuple is affected, this flag will store the
current transaction id. Thus, if this tuple is commited, an invalid OID
(say, 0), will be written to this flag.The only problem a saw using this approach is if some pages got flushed
during the transaction, because these pages will have to be reload from
disk.To address the problem of non-functional update, I pretend to store a
command identifier with the tuple, and, during update, see if the cid of
a tuple is equal of the current cid of this transaction (like we do
today).To keep track of current transactions, there will have a list of tuples
affected by this transaction, and the operation executed. This way,
during commit, we only confirm these operations in relations (writing an
invalid OID in current xid of each tuple affected). To rollback, we
delete the new tuples (and mark this operation as a commit) and mark the
old tuples affected as "live" (and leave these commited).I'm thinking of leave a transaction id for each new backend, and
postmaster will keep track of used transaction ids. This way, there is
no need to keep a list of transactions in shared memory.For recovery (my second goal), I pretend to, at startup of postmaster,
to rollback all marked in-curse transactions. After that, I'm thinking
about a redo log, but I'm still searching a way to keep it with the
minimum size possible.
Interesting. I know we have talked in the past about the various system
columns and their removal. If you check the hackers archive under cmin,
etc, I think you will find some discussion.
Now, as far as their removal, is it worth removing 8 bytes of tuple
overhead for the gain of having to do a redo log, etc. I am not sure.
I know many commercial databases have it, but I am not sure how
benificial it would be.
What I would really like is the ability to re-use superceeded tuples
without vacuum. It seems that should be possible, but it has not been
done by anyone yet. That would be a HUGE win, I think.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
http://www.op.net/~candle | (610) 353-9879(w)
+ If your life is a hard drive, | (610) 853-3000(h)
+ Christ can be your backup. |
On Sun, 20 Sep 1998, Bruce Momjian wrote:
Hi...
I was thinking in a major rewrite of the PostrgreSQL transaction
system, in order to provide less tuple overhead and recoverabilty.My first goal is to reduce tuple overhead, getting rid of xmin/xman and
cmin/cmax. To provide this functionality, I'm planning to keep only a
flag indicating if the transaction is in curse or not. If, during a
transaction, a certain tuple is affected, this flag will store the
current transaction id. Thus, if this tuple is commited, an invalid OID
(say, 0), will be written to this flag.The only problem a saw using this approach is if some pages got flushed
during the transaction, because these pages will have to be reload from
disk.To address the problem of non-functional update, I pretend to store a
command identifier with the tuple, and, during update, see if the cid of
a tuple is equal of the current cid of this transaction (like we do
today).To keep track of current transactions, there will have a list of tuples
affected by this transaction, and the operation executed. This way,
during commit, we only confirm these operations in relations (writing an
invalid OID in current xid of each tuple affected). To rollback, we
delete the new tuples (and mark this operation as a commit) and mark the
old tuples affected as "live" (and leave these commited).I'm thinking of leave a transaction id for each new backend, and
postmaster will keep track of used transaction ids. This way, there is
no need to keep a list of transactions in shared memory.For recovery (my second goal), I pretend to, at startup of postmaster,
to rollback all marked in-curse transactions. After that, I'm thinking
about a redo log, but I'm still searching a way to keep it with the
minimum size possible.Interesting. I know we have talked in the past about the various system
columns and their removal. If you check the hackers archive under cmin,
etc, I think you will find some discussion.Now, as far as their removal, is it worth removing 8 bytes of tuple
overhead for the gain of having to do a redo log, etc. I am not sure.
I know many commercial databases have it, but I am not sure how
benificial it would be.
I may be missing something in the original posting that you are
seeing, but I don't see the two as necesarily being inter-related...my
understanding of the Oracle redo logs is that if a database corrupts, you
can rebuild it from the last backup + the redo logs to get to the same
point as where the corruption happened...
What I would really like is the ability to re-use superceeded tuples
without vacuum. It seems that should be possible, but it has not been
done by anyone yet. That would be a HUGE win, I think.
Not sure, but IMHO, having a redo log capability would be a HUGE
win also...consider a mission critical application that doesn't have, in
essence, "live backups" in the form of a redo log...
Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes:
Not sure, but IMHO, having a redo log capability would be a HUGE
win also...consider a mission critical application that doesn't have, in
essence, "live backups" in the form of a redo log...
Considering that every postgresql application I write has two backups:
o a full database dump,
o an incremental change log,
so I can do exactly that...
--Michael
Import Notes
Reply to msg id not found: TheHermitHackersmessageofMon21Sep1998030216-0300ADT
---- you wrote:
On Wed, Sep 16, 1998 at 06:35:53PM -0300, Robson Miranda wrote:
I was thinking in a major rewrite of the PostrgreSQL transaction
system, in order to provide less tuple overhead and recoverabilty.
I do not have much of an idea how postgres handles stuff right now, so
forgive me if I'm asking stupid questions.
Ok...
My first goal is to reduce tuple overhead, getting rid of xmin/xman and
cmin/cmax. To provide this functionality, I'm planning to keep only a
flag indicating if the transaction is in curse or not. If, during a
transaction, a certain tuple is affected, this flag will store the
current transaction id. Thus, if this tuple is commited, an invalid OID
(say, 0), will be written to this flag.
That means you store one flag per tuple? Does this happen only in memory?
There will be something to indicate that the tuple is valid or not. An NULL, for example, can indicate if this tuple is part of a commited transaction. Any other number indicates that this is an in-progress transaction (an valid transaction ID). I was thinking in use a unsigned 31 bit for transaction ID and the other bit to flag if this tuple is valid or not. Thus, an NULL indicates that this line was inserted by a commited xaction, and a number with only the 32nd bit setted will indicate the this line was deleted by a commited xaction.
This way, if the xaction rollbacks, the lines inserted will have only the 32nd bit setted, and the delted lines will have NULL in this value. But, at commit or rollback, the pages with deleted tuples will have this space added to the free space of the page.
Actualy, there are 2 fields (xmin/xmax) and 4 flags (HEAP_XMIN_COMMITTED, HEAP_XMAX_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_INVALID) that are used for determine the visibility of a tuple. If a tuple doesn't have these flags setted, then a test in log relation will be done to determine the correct status of the transaction and then the flags will be adjusted. These flags operates like a "cache" for the transaction status. Look that these flags can be changed during selects, and these pages will have to be written to disk after a query.
The only problem a saw using this approach is if some pages got flushed
during the transaction, because these pages will have to be reload from> disk.
Ah yes, it seems to be in memory only. And you exactly point to one problem.
Any idea how to solve this?
After looking more deepely in the current transaction model, I think my approach will have a similar performance, since these pages will have to be reload only one time (at the end of transaction). Currently, the log pages could be loaded from disk even during a select.
To keep track of current transactions, there will have a list of tuples
affected by this transaction, and the operation executed. This way,
during commit, we only confirm these operations in relations (writing an
invalid OID in current xid of each tuple affected). To rollback, we
delete the new tuples (and mark this operation as a commit) and mark the
old tuples affected as "live" (and leave these commited).
That means we always have both in the relation? That is we write the new
tuple in and keep the old one? Is this done the same way in the actual
version? I'd prefer to have a clean cut with new and old not being in the
same table at the same time.
No, the invalid tuples will be deleted at the end of transaction, freeing the space on the page. I'm thinking in keep the page free space in some special blocks inside the table. With pagesize of 8k, I can keep one table of free-space at every 4096 blocks. This way, each table will have the free-space entrys for the 4096 next blocks.
For recovery (my second goal), I pretend to, at startup of postmaster,
to rollback all marked in-curse transactions. After that, I'm thinking
about a redo log, but I'm still searching a way to keep it with the
minimum size possible.
Where's the problem with a redo log?
I'm still researching how store this information, in order to, during recover, do the right things in the right time, and to provides on-line backup capability.
Michael
Robson.
---------------------------------------------------
Get free personalized email at http://www.iname.com
Import Notes
Resolved by subject fallback
Robson Miranda wrote:
I was thinking in a major rewrite of the PostrgreSQL transaction
system, in order to provide less tuple overhead and recoverabilty.My first goal is to reduce tuple overhead, getting rid of xmin/xman and
cmin/cmax. To provide this functionality, I'm planning to keep only a
I need in xmin & xmax for multi-version concurrency control...
Let's decide what should be implemented in 6.5...
To address the problem of non-functional update, I pretend to store a
command identifier with the tuple, and, during update, see if the cid of
a tuple is equal of the current cid of this transaction (like we do
today).
cmin & cmax very simplifies implementation of data changes
visibility rules - I'm not sure is it ever possible to
do this having only one attribute for command id,
keeping in mind triggers, (PL/)SQL-funcs...
Vadim