Nested transactions: low level stuff

Started by Manfred Koizarabout 23 years ago34 messageshackers
Jump to latest
#1Manfred Koizar
mkoi-pg@aon.at

Here is, what I've put together from various messages posted
November/December last year.

. Subtrans trees
. Transaction states
. Tuple visibility
. HeapTupleSatifiesUpdate
. Shortcuts
. Still missing
. Objections and suggestions

Subtrans trees
--------------

A subtrans tree holds information about sub-transaction to parent
transaction relationships. The nodes are transaction ids, edges
denote the fact that the child node represents a sub-transaction of
the transaction represented by the parent node. The root of a
subtrans tree represents a main-transaction.

A subtrans tree is navigable from child to parent. I.e. if you have
a sub-transaction xid you can find its parent efficiently, but there
is no fast and easy way to enumerate all children of a given parent
xid.

The subtrans dependencies have to be visible to all backends.

At least two possible implementations have been discussed.

Together with clog: Store a 30 bit parent transaction offset with the
two state bits. Offset 0 means the transaction is a main-transaction.

In a separate data structure: pg_subtrans is a huge (possibly
sparsely populated) array of parent xids, indexed by child xid. This
array is divided into manageable chunks, represented by files,
"pg_subtrans_NNNN". These files are only created when necessary. At
any time only a tiny part of the whole array is kept in shared
buffers. This concept is similar or almost equal to pg_clog, which is
an array of doublebits. Most of its implementation can be stolen from
the clog code.

We believe that parent xid lookups are far less frequent than
transaction state lookups and therefor favour the second
implementation.

Transaction states
------------------

Let the currently unused fourth state in
pg_clog indicate a committed subtransaction. In pg_clog there are two
bits per transaction, commit and abort, with the following meaning:

a c
0 0 transaction in progress, the owning backend knows whether it is
a main- or a sub-transaction, other backends don't care
1 0 aborted, nobody cares whether main- or sub-transaction
0 1 committed main-transaction or - with shortcut 2 - a sub-
transaction that's known committed to all active transactions
1 1 committed sub-transaction, have to look for parent in
pg_subtrans

Updating the state of a single (main or sub) transaction is atomic,
just like it is now. This solves the atomicity problem without
requiring long locks.

Here is what is to be done for some operations:

BEGIN main transaction:
. Get a new xid (no change to current behaviour).
. pg_clog[xid] is still 00, meaning active.
. pg_subtrans[xid] is still 0, meaning no parent.

BEGIN subtransaction:
. Push current transaction info onto local stack.
. Get a new xid.
. Record parent xid in pg_subtrans[xid].
. pg_clog[xid] is still 00.

ROLLBACK subtransaction:
. Set pg_clog[xid] to 10 (aborted).
. Optionally set clog bits for subsubtransactions to 10 (*).
. Pop transaction info from stack.

COMMIT subtransaction:
. Set pg_clog[xid] to 11 (committed subtrans).
. Don't touch clog bits for subsubtransactions!
. Pop transaction info from stack.

ROLLBACK main transaction:
. Set pg_clog[xid] to 10 (aborted).
. Optionally set clog bits for subtransactions to 10 (*).

COMMIT main transaction:
. Set pg_clog[xid] to 01 (committed).
. Don't touch clog bits for subtransactions!

(*) This can only be done if there is a transaction local data
structure holding information about all subtransactions or we have a
subtrans tree implementation that is navigable from parent to child.
Anyway, whether this is done or not does not influence the consistency
of this proposal.

Tuple visibility
----------------

Visibility check by other transactions: If a tuple is visited and its
XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
be consulted to find out the status of the inserting/deleting
transaction xid. If pg_clog[xid] is ...

00: transaction still active

10: aborted

01: committed

11: committed subtransaction, have to check parent

Only in this last case do we have to get parentxid from pg_subtrans.
Now we look at pg_clog[parentxid]. If we find ...

00: parent still active, so xid is considered active, too

10: parent aborted, so xid is considered aborted

01: parent committed, so xid is considered committed

11: recursively check grandparent(s) ...

If a visibility check sees a transaction as committed, it checks
whether the main-transaction xid is visible in the current snapshot.
Fortunately we get the main-transaction xid as a by-product of
looking up the parentxid states.

If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
a sub-transaction xid in xmin or xmax respectively with the
main-transaction xid at the same time. Otherwise we'd have to look
for the main xid, whenever a tuple is touched.

XXX "at the same time" may need a bit more thought regarding locking.

HeapTupleSatifiesUpdate
-----------------------

If an UPDATE is about to change a tuple touched by another active
transaction, it waits for the other transaction to commit or abort.
We must always wait for the main transaction, not the subtrans.

Shortcuts
---------

Under certain conditions the 1/1 state can be replaced with 0/1 or
1/0. This could save a lot of parent lookups.

Shortcut 1: As soon as a transaction is rolled back all its
subtransactions are considered aborted, too. So their pg_clog bits
can be set to 10 and there is no need to look up their parent any
more. This can be done
. immediately on ROLLBACK (if the implementation has a list of
committed subtransactions handy),
. as a side effect of a later visibility check,
. by VACUUM.

Shortcut 2: If a committed subtransaction has only committed
(grand)parents up to the committed main transaction and the main
transaction is visible to all running transactions, i.e. the main
transaction's xid is before RecentGlobalXmin (*), then the
subtransaction's pg_clog bits can be set to 01 and further parent xid
lookups are not needed any more. This can be done
. as a side effect of a later visibility check,
. by VACUUM.

(*) AFAICS the statements "main transaction's xid is before
RecentGlobalXmin" and "subtransaction's xid is before
RecentGlobalXmin" are equivalent.

The point here is: pg_subtrans is only queried, if we find 11 in
pg_clog; we try to replace 11 a soon as possible, but only if it is
safe to do so. Status 11 is short-lived, it is replaced with
the final status by one or more of

- ROLLBACK of the parent transaction
- a later visibility check (as a side effect)
- VACUUM

pg_subtrans cleanup: A pg_subtrans_NNNN file covers a known range of
transaction ids. As soon as none of these transactions has a pg_clog
status of 11, the pg_subtrans_NNNN file can be removed. VACUUM can do
this, and it won't even have to check the heap.

Still missing
-------------

. Visibility checks for tuples inserted/deleted by a (sub)transaction
belonging to the current transaction tree (have to check local
transaction stack whenever we look at a xid or switch to a parent xid)

. WAL

Objections and suggestions
--------------------------

Tom:

Also, using a 11 state doubles the amount of pg_clog I/O needed to
commit a collection of subtransactions. You have to write 11 as the
state of each commitable subtransaction, then commit the parent (write
01 as its state), then go back and change the state of each
subtransaction to 01. (Whether this last bit is done as part of parent
transaction commit, or during later inspections of the state of the
subtransaction, doesn't change the argument.)

Yes, there may be more pg_clog I/O. But I don't think that it is
doubled. I'd expect some locality; after all one page holds the
status bits for 32K transactions.

Tom:

I think it would be preferable to use only three states: active,
aborted, committed. The parent commit protocol is (1) write 10 as state
of each aborted subtransaction (this should be done as soon as the
subtransaction is known aborted, rather than delaying to parent commit);
(2) write 01 as state of parent (this is the atomic commit); (3) write
01 as state of each committed subtransaction. Readers who see 00 must
check the parent state; if the parent is committed then they have to go
back and recheck the child state (to see if it became "aborted" after
they looked). This halves the write traffic during a commit, at the
cost of additional read traffic when subtransaction state is checked in
a narrow window after the time of parent transaction commit. I believe
it nets out to be faster.

Pro: Saves the fourth state for something different.

Con: Does a pg_subtrans lookup even for main transactions (at least
until xid < RecentGlobalXmin)

Con: Needs subtrans tree navigation from parent to child.

It's noticeably past midnight as I write this ... will reconsider
that approach after I had same sleep.

Tom:

But wait a moment. The child xid is by definition always greater than
(newer than) its parent. So if we consult pg_clog and find the
transaction marked committed, *and* the xid is before the window of XIDs
in our snapshot, then even if it's not a top-level xid, the parent must
be before our window too. Therefore we can conclude the transaction is
visible in our snapshot. So indeed there is a good-size range of xids
for which we'll never need to chase the parent link: everything before
the RecentGlobalXmin computed by GetSnapshotData.

I think this has been integrated into this proposal (cf. shortcut 2).

(We do have to set
subtransactions to committed during parent commit to make this true;
we can't update them lazily. But I think that's okay.)

This is not clear to me ... I'll try again later ...

------------------------------

That's all for now, folks. Fell free to comment.

Sorry for the long post. Would you prefer such kind of stuff on a web
page and just a short note with the URL to the list?

Servus
Manfred

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Manfred Koizar (#1)
Re: Nested transactions: low level stuff

Manfred Koizar <mkoi-pg@aon.at> writes:

If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
a sub-transaction xid in xmin or xmax respectively with the
main-transaction xid at the same time. Otherwise we'd have to look
for the main xid, whenever a tuple is touched.

This worries me --- it changes a safe operation (OR'ing in a commit bit)
into an unsafe one that requires exclusive lock on the page containing
the tuple. I'm also concerned that we'd now need a WAL entry to record
the xid change (are we dependent on this change occurring for correctness?
or is it only performance?)

Perhaps it would be better to leave the tuple-commit bit unset until we
have been able to change the clog state to 01 ("committed to everyone").

Tom:

I think it would be preferable to use only three states: active,
aborted, committed.

Con: Needs subtrans tree navigation from parent to child.

But only in the backend owning the transaction; there's no need for
shared state that allows it.

Sorry for the long post. Would you prefer such kind of stuff on a web
page and just a short note with the URL to the list?

No. This way it gets into the list archives.

regards, tom lane

#3Manfred Koizar
mkoi-pg@aon.at
In reply to: Tom Lane (#2)
Re: Nested transactions: low level stuff

On Wed, 19 Mar 2003 11:18:38 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
a sub-transaction xid in xmin or xmax respectively with the
main-transaction xid at the same time. Otherwise we'd have to look
for the main xid, whenever a tuple is touched.

This worries me --- it changes a safe operation (OR'ing in a commit bit)
into an unsafe one that requires exclusive lock on the page containing
the tuple.

[Only talking about xmin here, but everything refers to xmax as well.]
I was hoping we could set xmin atomically without holding a lock. If
we can, we first set xmin to the main xid. The new state is still
consistent; now it looks as if the change has been made directly by
the main transaction and not by one of its subtransactions, which is
ok after the main transaction has committed (we are only talking about
cases where all interesting transactions have committed). As a second
step we update the commit bit which is as safe as it is now.

I see no concurrency problems. If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

So this boils down to whether setting the value of a properly aligned
32 bit variable in shared memory is an atomic operation on all
supported platforms. I don't know enough about compilers to answer
this question.

I'm also concerned that we'd now need a WAL entry to record
the xid change

If the sequence is "first update xmin, then set the commit bit", we
never have an inconsistent state. And if the change is lost, it can
be redone by the next backend visiting the tuple. So I think we don't
need a WAL entry.

(are we dependent on this change occurring for correctness?
or is it only performance?)

The latter.

Perhaps it would be better to leave the tuple-commit bit unset until we
have been able to change the clog state to 01 ("committed to everyone").

At least we can fall back to this, if we can't find out how to update
the xid safely.

Servus
Manfred

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Manfred Koizar (#3)
Re: Nested transactions: low level stuff

Manfred Koizar <mkoi-pg@aon.at> writes:

If the sequence is "first update xmin, then set the commit bit", we
never have an inconsistent state. And if the change is lost, it can
be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed. I think a WAL entry will be essential. (An alternative
might be to keep subtransaction state as long as we keep pg_clog
state, but that's pretty unpleasant too.)

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...

regards, tom lane

#5Vadim Mikheev
vmikheev@reveredata.com
In reply to: Tom Lane (#4)
Re: Nested transactions: low level stuff

I see no concurrency problems. If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

AFAIR, on multi-CPU platforms it's possible that second transaction could
see COMMITTED state but still old (subtrans id) in xmin: it's not
guaranteed that changes made on CPU1 (V1 was changed first, then V2 was
changed) will appear at the same order on CPU2 (V2 may come first, then V1).

Vadim

_____________________________________________________
Revere Data, LLC, formerly known as Sector Data, LLC, is not affiliated with
Sector, Inc., or SIAC.

#6Vadim Mikheev
vmikheev@reveredata.com
In reply to: Vadim Mikheev (#5)
Re: Nested transactions: low level stuff

I see no concurrency problems. If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

AFAIR, on multi-CPU platforms it's possible that second transaction could
see COMMITTED state but still old (subtrans id) in xmin: it's not
guaranteed that changes made on CPU1 (V1 was changed first, then V2 was
changed) will appear at the same order on CPU2 (V2 may come first, then V1).

Vadim

_____________________________________________________
Revere Data, LLC, formerly known as Sector Data, LLC, is not affiliated with
Sector, Inc., or SIAC.

#7Manfred Koizar
mkoi-pg@aon.at
In reply to: Tom Lane (#4)
Re: Nested transactions: low level stuff

On Wed, 19 Mar 2003 13:00:07 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

And if the change is lost, it can
be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed.

But this problem is not triggered by a tuple that has its xmin changed
by a visitor and then looses that change again. We'd have the same
problems with tuples that have never been visited (*). So we must
make sure that pg_subtrans segments are not discarded as long as they
are needed.

(*) I guess your argument is: VACUUM makes sure that all tuples have
been visited before it discards pg_subtrans segments.

With my 4-state-proposal VACUUM can decide whether a pg_subtrans
segment is still needed by only looking at pg_clog.

I think a WAL entry will be essential.

I'm still in doubt, but it's moot (see below).

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...

If Vadim remembers correctly we cannot safely change xmin, unless we
want to grab a write lock. Ok, we'll not change xmin and we'll not
set the commit bit before xmin is visible to all if xmin is a
subtransaction. We can always add this performance hack later, if
someone finds a safe implementation ...

Servus
Manfred

#8Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Manfred Koizar (#1)
Re: Nested transactions: low level stuff

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

regards,
Hiroshi Inoue

Manfred Koizar wrote:

On Wed, 19 Mar 2003 13:00:07 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

And if the change is lost, it can
be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed.

But this problem is not triggered by a tuple that has its xmin changed
by a visitor and then looses that change again. We'd have the same
problems with tuples that have never been visited (*). So we must
make sure that pg_subtrans segments are not discarded as long as they
are needed.

(*) I guess your argument is: VACUUM makes sure that all tuples have
been visited before it discards pg_subtrans segments.

With my 4-state-proposal VACUUM can decide whether a pg_subtrans
segment is still needed by only looking at pg_clog.

I think a WAL entry will be essential.

I'm still in doubt, but it's moot (see below).

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...

If Vadim remembers correctly we cannot safely change xmin, unless we
want to grab a write lock. Ok, we'll not change xmin and we'll not
set the commit bit before xmin is visible to all if xmin is a
subtransaction. We can always add this performance hack later, if
someone finds a safe implementation ...

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#8)
Re: Nested transactions: low level stuff

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

My feeling is that the right way to proceed is to nail down a paper
design for each of the major aspects of the problem, before anyone
actually spends any time coding. There would be little point in
implementing subtransaction logging if we don't know how to do the
other things.

regards, tom lane

#10Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Manfred Koizar (#1)
Re: Nested transactions: low level stuff

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#11Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#10)
Re: Nested transactions: low level stuff

Hiroshi Inoue wrote:

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#12Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#11)
Re: Nested transactions: low level stuff

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#13Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#12)
Re: Nested transactions: low level stuff

Hiroshi Inoue wrote:

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system, and that assigning a separate transaction id
to every subtransaction was cleaner and more closely matched our
existing system. It also has zero time for undo, which is the case we
have for main transactions now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#12)
Re: Nested transactions: low level stuff

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Bruce Momjian wrote:

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path. It's not workable with big transactions.

There are other problems besides WAL bloat, too. I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels). But the same WAL entry might include
both kinds of operation. This could be got round, perhaps, but that
code is overcomplicated already ...

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#14)
Re: Nested transactions: low level stuff

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Bruce Momjian wrote:

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path. It's not workable with big transactions.

There are other problems besides WAL bloat, too. I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels). But the same WAL entry might include
both kinds of operation. This could be got round, perhaps, but that
code is overcomplicated already ...

I assumed the UNDO would have had to be in a separate place, or allow
compression of the WAL file to keep needed UNDO stuff but get rid of
unneeded stuff --- it was all quite complicated.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#16Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#15)
Re: Nested transactions: low level stuff

Hiroshi Inoue wrote:

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

We took a vote and UNDO lost --- do you want to do another vote?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#17Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#13)
Re: Nested transactions: low level stuff

Bruce Momjian wrote:

Hiroshi Inoue wrote:

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#15)
Re: Nested transactions: low level stuff

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

There are other problems besides WAL bloat, too.

I assumed the UNDO would have had to be in a separate place, or allow
compression of the WAL file to keep needed UNDO stuff but get rid of
unneeded stuff --- it was all quite complicated.

There is a mechanism in the XLOG code to distinguish "in transaction"
from "out of transaction" WAL entries. The thing I had not realized
before working on the btree code is that that mechanism is inadequate
for UNDO.

regards, tom lane

#19Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#16)
Re: Nested transactions: low level stuff

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

We took a vote and UNDO lost --- do you want to do another vote?

Sorry I missed the vote. Where is it ?

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#20Vadim Mikheev
vmikheev@reveredata.com
In reply to: Bruce Momjian (#11)
Re: Nested transactions: low level stuff

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path. It's not workable with big transactions.

Somehow it's working in other DB systems.

There are other problems besides WAL bloat, too. I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels). But the same WAL entry might include
both kinds of operation. This could be got round, perhaps, but that
code is overcomplicated already ...

Each access-method requires specific UNDO code (like REDO).
Once again, it works in other DB-es.

Vadim

#21Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vadim Mikheev (#20)
#23Vadim Mikheev
vmikheev@reveredata.com
In reply to: Bruce Momjian (#11)
#24Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#9)
#25Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#24)
#27Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Tom Lane (#26)
#28Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#21)
#29Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#29)
#31Vadim Mikheev
vmikheev@reveredata.com
In reply to: Bruce Momjian (#29)
#32Bruce Momjian
bruce@momjian.us
In reply to: Vadim Mikheev (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#33)