mvcc & DML on the same row

Started by matteo durighettoabout 15 years ago5 messages
#1matteo durighetto
desmodemone@gmail.com

Hi,
I have an idea about mvcc and different DML of the same row in the
same transaction.
Normally when a backend do an unpdate on a row ( call it X ) , we done
an insert and logical delete on this row (0,1,2..N are the "version
of the row) :

X0 (delete "old" row)
X1 (insert "new" row)

if we continue the transaction and we do for example another update
on this row (X) , we again redo the same operation:

X0 (deleted "old" row)
X1 (row inserted, NOW deleted) => not needed for rollback
X2 (insert "new" row )

But why we need all these versions of the same row on table, if for
rollback we need only the original row X (X0) ?

So I think we need it in memory, not on physical space of table (ok
there is the cache, but ..) or something similar, or this method is
for transaction with isolation level at "read uncommited"?

Kind Regards

Matteo Durighetto

-----------------------

desmodemone@gmail.com
m.durighetto@miriade.it

#2Robert Haas
robertmhaas@gmail.com
In reply to: matteo durighetto (#1)
Re: mvcc & DML on the same row

On Wed, Dec 15, 2010 at 2:50 PM, matteo durighetto
<desmodemone@gmail.com> wrote:

Hi,
   I have an idea about mvcc and different DML of the same row in the
same transaction.
Normally when a backend do an unpdate on a row ( call it X ) , we done
an insert and logical delete on this row  (0,1,2..N are the "version
of the row) :

  X0  (delete "old" row)
  X1  (insert  "new" row)

if  we continue the transaction and we do for example another update
on this row (X) , we again redo the same operation:

  X0  (deleted "old" row)
  X1  (row inserted, NOW deleted) => not needed for rollback
  X2  (insert "new" row  )

But why we need all these versions of the same row on table, if for
rollback we need only the original row X (X0) ?

The fact that we can't get rid of X1 until after the transaction
commits is an implementation limitation. But you obviously need both
X0 and X2, because the transaction might either commit or abort.

So I think we need it in memory, not on physical space of table (ok
there is the cache, but ..) or something similar, or this method is
for transaction with isolation level at "read uncommited"?

I can't figure out exactly what this part is talking about. It's
completely impractical to add rows to a table without writing them
into shared buffers, which means they will eventually get flushed to
disk if not vacuumed, dropped, etc. first. We don't support read
uncommitted anyway (well, we do, but it's really still snapshot
isolation).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: matteo durighetto (#1)
Re: mvcc & DML on the same row

2010/12/15 matteo durighetto <desmodemone@gmail.com>:

But why we need all these versions of the same row on table, if for
rollback we need only the original row X (X0) ?

And the "previous" value of row X during the execution of a statement
(because statements don't see their own changes, think INSERT INTO a
SELECT * FROM a). And any values that we might need to ROLLBACK TO
SAVEPOINT to.

So I think we need it in memory, not on physical space of table (ok
there is the cache, but ..) or something similar

It must be possible to push out those changes from memory to disk
anyway, because there is no limit on how many rows a transaction can
update in PostgreSQL (vs. Oracle's "snapshot too old" problems). But
then, keeping the locally updated rows in some kind of special
per-transaction cache or in the global page cache isn't that
different.

Also, updating the same row many times in one transaction is probably
not regarded a very typical use case.

Note that other DBMSs may implement MVCC more along the lines you
specified; AFAIR, InnoDB uses such an approach. This may mean that
they don't need VACUUM.

I think that the consensus is that there is a trade-off between doing
VACUUM-like things synchronously, or having the possibility to do it
asynchronously at times when load is low. In PostgreSQL, the latter
was chosen.

Btw, this topic has been discussed at length in the past, please check
the archive.

or this method is for transaction with isolation level at "read
uncommited"?

PostgreSQL implements READ UNCOMMITTED as READ COMMITTED (providing a
higher level of isolation than requested is allowed by the standard),
so that is definitely not the reason.

Nicolas

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: matteo durighetto (#1)
Re: mvcc & DML on the same row

On Wed, 2010-12-15 at 20:50 +0100, matteo durighetto wrote:

if we continue the transaction and we do for example another update
on this row (X) , we again redo the same operation:

X0 (deleted "old" row)
X1 (row inserted, NOW deleted) => not needed for rollback
X2 (insert "new" row )

This situation has a simple user-space solution: only make one update to
a row, rather than two.

But why we need all these versions of the same row on table, if for
rollback we need only the original row X (X0) ?

X1 cannot be removed because X0 points to it, via its t_ctid field.

In order to remove X1 we would need to change X0 to point to X2, which
we don't do because we're not allowed to update in place. Even if we
could, I'm not sure this case is frequent enough to be worth the effort.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#5Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#4)
Re: mvcc & DML on the same row

On Thursday 16 December 2010 15:11:01 Simon Riggs wrote:

In order to remove X1 we would need to change X0 to point to X2, which
we don't do because we're not allowed to update in place. Even if we
could, I'm not sure this case is frequent enough to be worth the effort.

Especially as X3 would need to fit into X2's space for it to be beneficial...

Andres