Nested transactions: low level stuff

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Manfred Koizar (#1)

Re: Nested transactions: low level stuff

Manfred Koizar <mkoi-pg@aon.at> writes:

If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
a sub-transaction xid in xmin or xmax respectively with the
main-transaction xid at the same time. Otherwise we'd have to look
for the main xid, whenever a tuple is touched.

This worries me --- it changes a safe operation (OR'ing in a commit bit)
into an unsafe one that requires exclusive lock on the page containing
the tuple. I'm also concerned that we'd now need a WAL entry to record
the xid change (are we dependent on this change occurring for correctness?
or is it only performance?)

Perhaps it would be better to leave the tuple-commit bit unset until we
have been able to change the clog state to 01 ("committed to everyone").

Tom:

I think it would be preferable to use only three states: active,
aborted, committed.

Con: Needs subtrans tree navigation from parent to child.

But only in the backend owning the transaction; there's no need for
shared state that allows it.

Sorry for the long post. Would you prefer such kind of stuff on a web
page and just a short note with the URL to the list?

No. This way it gets into the list archives.

regards, tom lane

Manfred Koizar

mkoi-pg@aon.at

about 23 years ago

In reply to: Tom Lane (#2)

Re: Nested transactions: low level stuff

On Wed, 19 Mar 2003 11:18:38 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

If we set XMIN/MAX_IS_COMMITTED in a tuple header, we have to replace
a sub-transaction xid in xmin or xmax respectively with the
main-transaction xid at the same time. Otherwise we'd have to look
for the main xid, whenever a tuple is touched.

This worries me --- it changes a safe operation (OR'ing in a commit bit)
into an unsafe one that requires exclusive lock on the page containing
the tuple.

[Only talking about xmin here, but everything refers to xmax as well.]
I was hoping we could set xmin atomically without holding a lock. If
we can, we first set xmin to the main xid. The new state is still
consistent; now it looks as if the change has been made directly by
the main transaction and not by one of its subtransactions, which is
ok after the main transaction has committed (we are only talking about
cases where all interesting transactions have committed). As a second
step we update the commit bit which is as safe as it is now.

I see no concurrency problems. If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

So this boils down to whether setting the value of a properly aligned
32 bit variable in shared memory is an atomic operation on all
supported platforms. I don't know enough about compilers to answer
this question.

I'm also concerned that we'd now need a WAL entry to record
the xid change

If the sequence is "first update xmin, then set the commit bit", we
never have an inconsistent state. And if the change is lost, it can
be redone by the next backend visiting the tuple. So I think we don't
need a WAL entry.

(are we dependent on this change occurring for correctness?
or is it only performance?)

The latter.

Perhaps it would be better to leave the tuple-commit bit unset until we
have been able to change the clog state to 01 ("committed to everyone").

At least we can fall back to this, if we can't find out how to update
the xid safely.

Servus
Manfred

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Manfred Koizar (#3)

Re: Nested transactions: low level stuff

Manfred Koizar <mkoi-pg@aon.at> writes:

If the sequence is "first update xmin, then set the commit bit", we
never have an inconsistent state. And if the change is lost, it can
be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed. I think a WAL entry will be essential. (An alternative
might be to keep subtransaction state as long as we keep pg_clog
state, but that's pretty unpleasant too.)

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...

regards, tom lane

Vadim Mikheev

vmikheev@reveredata.com

about 23 years ago

In reply to: Tom Lane (#4)

Re: Nested transactions: low level stuff

I see no concurrency problems. If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

AFAIR, on multi-CPU platforms it's possible that second transaction could
see COMMITTED state but still old (subtrans id) in xmin: it's not
guaranteed that changes made on CPU1 (V1 was changed first, then V2 was
changed) will appear at the same order on CPU2 (V2 may come first, then V1).

Vadim

_____________________________________________________
Revere Data, LLC, formerly known as Sector Data, LLC, is not affiliated with
Sector, Inc., or SIAC.

Import Notes

Resolved by subject fallback

Vadim Mikheev

vmikheev@reveredata.com

about 23 years ago

In reply to: Vadim Mikheev (#5)

Re: Nested transactions: low level stuff

I see no concurrency problems. If two or more backends visit the same
tuple, they either write the same value to the same position which
doesn't hurt, or one sees the other's changes which is a good thing.

Vadim

_____________________________________________________
Revere Data, LLC, formerly known as Sector Data, LLC, is not affiliated with
Sector, Inc., or SIAC.

Import Notes

Resolved by subject fallback

Manfred Koizar

mkoi-pg@aon.at

about 23 years ago

In reply to: Tom Lane (#4)

Re: Nested transactions: low level stuff

On Wed, 19 Mar 2003 13:00:07 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

And if the change is lost, it can
be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed.

But this problem is not triggered by a tuple that has its xmin changed
by a visitor and then looses that change again. We'd have the same
problems with tuples that have never been visited (*). So we must
make sure that pg_subtrans segments are not discarded as long as they
are needed.

(*) I guess your argument is: VACUUM makes sure that all tuples have
been visited before it discards pg_subtrans segments.

With my 4-state-proposal VACUUM can decide whether a pg_subtrans
segment is still needed by only looking at pg_clog.

I think a WAL entry will be essential.

I'm still in doubt, but it's moot (see below).

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...

If Vadim remembers correctly we cannot safely change xmin, unless we
want to grab a write lock. Ok, we'll not change xmin and we'll not
set the commit bit before xmin is visible to all if xmin is a
subtransaction. We can always add this performance hack later, if
someone finds a safe implementation ...

Servus
Manfred

Inoue@tpf.co.jp

about 23 years ago

In reply to: Manfred Koizar (#1)

Re: Nested transactions: low level stuff

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

regards,
Hiroshi Inoue

Manfred Koizar wrote:

On Wed, 19 Mar 2003 13:00:07 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

Manfred Koizar <mkoi-pg@aon.at> writes:

And if the change is lost, it can
be redone by the next backend visiting the tuple.

Not if the subtransaction log state has been removed as no longer
needed.

But this problem is not triggered by a tuple that has its xmin changed
by a visitor and then looses that change again. We'd have the same
problems with tuples that have never been visited (*). So we must
make sure that pg_subtrans segments are not discarded as long as they
are needed.

(*) I guess your argument is: VACUUM makes sure that all tuples have
been visited before it discards pg_subtrans segments.

With my 4-state-proposal VACUUM can decide whether a pg_subtrans
segment is still needed by only looking at pg_clog.

I think a WAL entry will be essential.

I'm still in doubt, but it's moot (see below).

I think we'd be a lot better off to design this so that we don't need to
alter heap tuple xmin values...

If Vadim remembers correctly we cannot safely change xmin, unless we
want to grab a write lock. Ok, we'll not change xmin and we'll not
set the commit bit before xmin is visible to all if xmin is a
subtransaction. We can always add this performance hack later, if
someone finds a safe implementation ...

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Hiroshi Inoue (#8)

Re: Nested transactions: low level stuff

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

My feeling is that the right way to proceed is to nail down a paper
design for each of the major aspects of the problem, before anyone
actually spends any time coding. There would be little point in
implementing subtransaction logging if we don't know how to do the
other things.

regards, tom lane

#10

Inoue@tpf.co.jp

about 23 years ago

In reply to: Manfred Koizar (#1)

Re: Nested transactions: low level stuff

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#11

bruce@momjian.us

about 23 years ago

In reply to: Hiroshi Inoue (#10)

Re: Nested transactions: low level stuff

Hiroshi Inoue wrote:

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#12

Inoue@tpf.co.jp

about 23 years ago

In reply to: Bruce Momjian (#11)

Re: Nested transactions: low level stuff

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Sorry I have a basic question.
Was there any consensus we would introduce nested transactions
(or savepoints) in the way currently discussed ?

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#13

bruce@momjian.us

about 23 years ago

In reply to: Hiroshi Inoue (#12)

Re: Nested transactions: low level stuff

Hiroshi Inoue wrote:

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system, and that assigning a separate transaction id
to every subtransaction was cleaner and more closely matched our
existing system. It also has zero time for undo, which is the case we
have for main transactions now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#14

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Hiroshi Inoue (#12)

Re: Nested transactions: low level stuff

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Bruce Momjian wrote:

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path. It's not workable with big transactions.

There are other problems besides WAL bloat, too. I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels). But the same WAL entry might include
both kinds of operation. This could be got round, perhaps, but that
code is overcomplicated already ...

regards, tom lane

#15

bruce@momjian.us

about 23 years ago

In reply to: Tom Lane (#14)

Re: Nested transactions: low level stuff

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Bruce Momjian wrote:

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path. It's not workable with big transactions.

There are other problems besides WAL bloat, too. I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels). But the same WAL entry might include
both kinds of operation. This could be got round, perhaps, but that
code is overcomplicated already ...

I assumed the UNDO would have had to be in a separate place, or allow
compression of the WAL file to keep needed UNDO stuff but get rid of
unneeded stuff --- it was all quite complicated.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#16

bruce@momjian.us

about 23 years ago

In reply to: Bruce Momjian (#15)

Re: Nested transactions: low level stuff

Hiroshi Inoue wrote:

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

We took a vote and UNDO lost --- do you want to do another vote?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Import Notes

Reply to msg id not found: 3E793AEE.1F58D730@tpf.co.jp | Resolved by subject fallback

#17

Inoue@tpf.co.jp

about 23 years ago

In reply to: Bruce Momjian (#13)

Re: Nested transactions: low level stuff

Bruce Momjian wrote:

Hiroshi Inoue wrote:

I think we are a long way from saying we can or will actually do it.
Error handling and resource management (eg locks) are a couple of other
huge cans of worms that have yet to be opened. But certainly a solid
design for the transaction logging and tuple validity checking is a
necessary step.

Is the way to undo data rejected already ?

You mean abort subtransactions? Each subtransaction gets its own
transaction id, so we just mark that as aborted --- there is no undo of
tuples, though I had originally suggested that approach years ago.

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#18

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Bruce Momjian (#15)

Re: Nested transactions: low level stuff

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

There are other problems besides WAL bloat, too.

I assumed the UNDO would have had to be in a separate place, or allow
compression of the WAL file to keep needed UNDO stuff but get rid of
unneeded stuff --- it was all quite complicated.

There is a mechanism in the XLOG code to distinguish "in transaction"
from "out of transaction" WAL entries. The thing I had not realized
before working on the btree code is that that mechanism is inadequate
for UNDO.

regards, tom lane

#19

Inoue@tpf.co.jp

about 23 years ago

In reply to: Bruce Momjian (#16)

Re: Nested transactions: low level stuff

Bruce Momjian wrote:

Hiroshi Inoue wrote:

Vadim planned to implement the savepoints functionality
using UNDO mechanism. AFAIR it was never denied explicitly.

If you go to the TODO.detail/transactions archive, there was discussion
of using UNDO, and most felt that there were too many problems of having
to manage the undo system,

This is closely related to the basics of PostgreSQL.
Pleas don't decide it implicitly.

We took a vote and UNDO lost --- do you want to do another vote?

Sorry I missed the vote. Where is it ?

regards,
Hiroshi Inoue
http://www.geocities.jp/inocchichichi/psqlodbc/

#20

Vadim Mikheev

vmikheev@reveredata.com

about 23 years ago

In reply to: Bruce Momjian (#11)

Re: Nested transactions: low level stuff

Given all the flak we got about WAL growth during the time we had that
code enabled, I think there's no chance that UNDO will be the preferred
path. It's not workable with big transactions.

Somehow it's working in other DB systems.

There are other problems besides WAL bloat, too. I realized while I was
working on the btree code a few weeks ago that it's fundamentally
unfriendly to UNDO, because there are some operations you'd want to
UNDO (viz, insertion of a leaf item pointing at a heap tuple) and some
you would not (viz, splitting of index pages and subsequent insertion of
items into upper tree levels). But the same WAL entry might include
both kinds of operation. This could be got round, perhaps, but that
code is overcomplicated already ...

Each access-method requires specific UNDO code (like REDO).
Once again, it works in other DB-es.

Vadim

#21

bruce@momjian.us

about 23 years ago

In reply to: Hiroshi Inoue (#19)

#22