More FOR UPDATE/FOR SHARE problems
This post is a follow-up of an off-list discussion with Nathan Boley.
All references to FOR UPDATE apply to FOR SHARE as well.
create table a(i int, j int);
insert into a values(1, 10);
insert into a values(2, 10);
insert into a values(3, 10);
insert into a values(4, 20);
insert into a values(5, 20);
insert into a values(6, 20);
Session 1:
BEGIN;
UPDATE a SET j = (j - 10) WHERE i = 3 OR i = 4;
Session 2:
SELECT * FROM a WHERE j = 10 FOR UPDATE; -- blocks
Session 1:
COMMIT;
Session 2 (results):
i | j
---+----
1 | 10
2 | 10
(2 rows)
There you see a snapshot of the table that never existed. Either the
snapshot was taken before the UPDATE, in which case i=3 should be
included, or it was taken after the UPDATE, in which case i=4 should be
included. So atomicity is broken for WHERE.
So, FOR UPDATE produces known incorrect results for:
* WHERE
* ORDER BY:
http://archives.postgresql.org/pgsql-bugs/2009-01/msg00017.php
* LIMIT
* SAVEPOINT/ROLLBACK TO
And I expect we'll find more, as well.
It's not simply that FOR UPDATE works strangely in a couple isolated
edge cases, as the docs imply. It works contrary to the basic
assumptions that people familiar with PostgreSQL rely on. Furthermore,
the people using FOR UPDATE are likely to be the people who care about
these edge cases.
I think that FOR UPDATE deserves a jarring disclaimer in the docs if we
maintain the current behavior. Something along the lines of "this does
not follow normal transactional semantics and will produce incorrect
results". Existing users may find current FOR UPDATE behavior useful to
avoid full-table locks, but they should be properly warned.
If there is a fix, the only thing that I can imagine working (aside from
a full table lock) would be to iteratively acquire new snapshots and
re-run the query until no concurrent transaction interferes.
Regards,
Jeff Davis
There already is quite an extensive discussion of how FOR UPDATE
behaves including these kinds of violations.
What you propose is interesting though. It would have been impossible
before subtransactions but it's doable now. Still the performance
might be unusable for complex queries. It's basically generalizing the
logic a serializable transaction would take to a read committed command.
--
Greg
On 24 Jan 2009, at 18:50, Jeff Davis <pgsql@j-davis.com> wrote:
Show quoted text
This post is a follow-up of an off-list discussion with Nathan Boley.
All references to FOR UPDATE apply to FOR SHARE as well.create table a(i int, j int);
insert into a values(1, 10);
insert into a values(2, 10);
insert into a values(3, 10);
insert into a values(4, 20);
insert into a values(5, 20);
insert into a values(6, 20);Session 1:
BEGIN;
UPDATE a SET j = (j - 10) WHERE i = 3 OR i = 4;Session 2:
SELECT * FROM a WHERE j = 10 FOR UPDATE; -- blocksSession 1:
COMMIT;Session 2 (results):
i | j
---+----
1 | 10
2 | 10
(2 rows)There you see a snapshot of the table that never existed. Either the
snapshot was taken before the UPDATE, in which case i=3 should be
included, or it was taken after the UPDATE, in which case i=4 should
be
included. So atomicity is broken for WHERE.So, FOR UPDATE produces known incorrect results for:
* WHERE
* ORDER BY:
http://archives.postgresql.org/pgsql-bugs/2009-01/msg00017.php
* LIMIT
* SAVEPOINT/ROLLBACK TOAnd I expect we'll find more, as well.
It's not simply that FOR UPDATE works strangely in a couple isolated
edge cases, as the docs imply. It works contrary to the basic
assumptions that people familiar with PostgreSQL rely on. Furthermore,
the people using FOR UPDATE are likely to be the people who care about
these edge cases.I think that FOR UPDATE deserves a jarring disclaimer in the docs if
we
maintain the current behavior. Something along the lines of "this does
not follow normal transactional semantics and will produce incorrect
results". Existing users may find current FOR UPDATE behavior useful
to
avoid full-table locks, but they should be properly warned.If there is a fix, the only thing that I can imagine working (aside
from
a full table lock) would be to iteratively acquire new snapshots and
re-run the query until no concurrent transaction interferes.Regards,
Jeff Davis--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, 2009-01-24 at 19:45 +0000, Greg Stark wrote:
There already is quite an extensive discussion of how FOR UPDATE
behaves including these kinds of violations.
Not in the documentation, that I can see. And I think it's important
that it be there for the reasons I mentioned.
Can you refer me to the dicussion that you're talking about? I don't
remember any discussion that points out that FOR UPDATE/FOR SHARE is
broken in the simple case of a simple WHERE clause.
What you propose is interesting though. It would have been impossible
before subtransactions but it's doable now. Still the performance
might be unusable for complex queries. It's basically generalizing the
logic a serializable transaction would take to a read committed command.
It might be effective for queries that are highly selective on large
tables. Still has strange deadlock possibilities, but I think that's the
case already.
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes:
There you see a snapshot of the table that never existed. Either the
snapshot was taken before the UPDATE, in which case i=3 should be
included, or it was taken after the UPDATE, in which case i=4 should be
included. So atomicity is broken for WHERE.
This assertion is based on a misunderstanding of what FOR UPDATE in
read-committed mode is defined to do. It is supposed to give you the
latest available rows.
regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes:
On Sat, 2009-01-24 at 19:45 +0000, Greg Stark wrote:
There already is quite an extensive discussion of how FOR UPDATE
behaves including these kinds of violations.Not in the documentation, that I can see. And I think it's important
that it be there for the reasons I mentioned.Can you refer me to the dicussion that you're talking about? I don't
remember any discussion that points out that FOR UPDATE/FOR SHARE is
broken in the simple case of a simple WHERE clause.
http://www.postgresql.org/docs/8.3/static/transaction-iso.html#XACT-READ-COMMITTED
Because of the above rule, it is possible for an updating command to see an
inconsistent snapshot: it can see the effects of concurrent updating commands
that affected the same rows it is trying to update, but it does not see
effects of those commands on other rows in the database. This behavior makes
Read Committed mode unsuitable for commands that involve complex search
conditions. However, it is just right for simpler cases. For example, consider
updating bank balances with transactions like
...
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
There you see a snapshot of the table that never existed. Either
the
snapshot was taken before the UPDATE, in which case i=3 should be
included, or it was taken after the UPDATE, in which case i=4 should
be
included. So atomicity is broken for WHERE.
This assertion is based on a misunderstanding of what FOR UPDATE in
read-committed mode is defined to do. It is supposed to give you
the
latest available rows.
Well, technically it's violating the Isolation part of ACID, not the
Atomicity, since the UPDATE transaction will either commit or roll
back in its entirety, but another transaction can see it in an
intermediate (partially applied) state.[1]http://en.wikipedia.org/wiki/ACID
I guess the issue of whether this violation of ACID properties should
be considered a bug or a feature is a separate discussion, but calling
it a feature seems like a hard sell to me.
-Kevin
On Mon, 2009-01-26 at 10:48 -0600, Kevin Grittner wrote:
I guess the issue of whether this violation of ACID properties should
be considered a bug or a feature is a separate discussion, but calling
it a feature seems like a hard sell to me.
I think I understand the other perspective on this now: SELECT FOR
UPDATE/SHARE is an entirely separate command that is more similar (in
transactional semantics) to UPDATE than to SELECT.
In fact, it's probably most similar to UPDATE ... RETURNING, which will
give the same result (that breaks atomicity or isolation, depending on
your point of view), which is correct for READ COMMITTED isolation
level.
Because the command begins with SELECT, I would expect it to follow the
rules of SELECT with the side effect of locking. I would think that the
standard would have something to say about this*.
I certainly don't think it's intuitive behavior.
Regards,
Jeff Davis.
*: It appears that SELECT ... FOR UPDATE is not in the standard, which
would indicate to me that the SELECT statement should still behave
according to SELECT isolation/snapshot rules. But when I guess about the
standard, I'm usually wrong.
Jeff Davis <pgsql@j-davis.com> wrote:
In fact, it's probably most similar to UPDATE ... RETURNING, which
will
give the same result (that breaks atomicity or isolation, depending
on
your point of view), which is correct for READ COMMITTED isolation
level.
READ COMMITTED is not supposed to be able to view the work of a
concurrent transactions as PARTLY applied and PARTLY committed, which
is what's happening here. If one statement in a READ COMMITTED
transaction sees the uncommitted view of the data and the next
statement sees the committed view, that's compliant. It may not
surprise someone who is intimately familiar with PostgreSQL internals
for a single SELECT statement to see PART of a transactions work, but
it would surprise most users, and is certainly not compliant with the
standard.
-Kevin
On 2009-01-26, at 17:34, Kevin Grittner wrote:
. It may not
surprise someone who is intimately familiar with PostgreSQL internals
for a single SELECT statement to see PART of a transactions work, but
it would surprise most users, and is certainly not compliant with the
standard.
+1000
On Mon, 2009-01-26 at 11:34 -0600, Kevin Grittner wrote:
READ COMMITTED is not supposed to be able to view the work of a
concurrent transactions as PARTLY applied and PARTLY committed, which
is what's happening here. If one statement in a READ COMMITTED
transaction sees the uncommitted view of the data and the next
statement sees the committed view, that's compliant. It may not
surprise someone who is intimately familiar with PostgreSQL internals
for a single SELECT statement to see PART of a transactions work, but
it would surprise most users, and is certainly not compliant with the
standard.
See 13.2.1:
http://www.postgresql.org/docs/8.3/static/transaction-iso.html
That explanation seems to be the behavior I would expect from UPDATE in
read committed mode. Perhaps I'm just used to PostgreSQL -- what do
other database systems do?
And what does the standard say? I can't find anything in the standard
that handles UPDATEs differently, so that seems to support your
position.
After the concurrent transaction commits, you basically have three
options:
1. Get a new snapshot, and re-run the entire query to find new rows
that might match the search condition that were committed between the
time you took the original snapshot for UPDATE and the time that the
concurrent transaction committed.
2. Blindly proceed with the original snapshot. This would mean that two
concurrent updates like "set i = i+1" might both see the same value,
let's say 5, and both update it to 6, and then commit. However, a serial
execution would produce 7.
3. Find the latest version of the same tuples you found from the
original snapshot in the original search, and if they still match the
search condition, update based on the new version. This is what
PostgreSQL currently does.
I don't think this is PostgreSQL-specific, I think non-MVCC database
systems face this same choice (although the terminology would be
different).
#3 has a certain intuition about it (i.e. "produces the expected result
most of the time in simple cases"), but in my opinion, that intuition
only holds for UPDATE, it doesn't hold for SELECT.
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> wrote:
I don't think this is PostgreSQL-specific, I think non-MVCC database
systems face this same choice (although the terminology would be
different).
A somewhat dated description for Sybase (it predates their support of
row level locks and the related predicate locks on indexes) is here:
Simplified, in a READ COMMITTED transaction a SELECT takes a lock
which conflicts with update before reading, and holds it until the
executing statement is done with that table; an UPDATE takes a lock
which conflicts with any access (read or write) before it updates a
row, and holds it until COMMIT or ROLLBACK. This guarantees that the
SELECT doesn't see the results of an incomplete UPDATE, at the expense
of taking locks, blocking, and possible serialization failures (in the
form of deadlocks).
-Kevin
I wrote:
Simplified, in a READ COMMITTED transaction a SELECT takes a lock
which conflicts with update before reading, and holds it until the
executing statement is done with that table;
That should have said "until after the executing statement completes."
Apologies, but but I just know someone would have picked up on the
hole my misstatement left....
-Kevin
On Mon, 2009-01-26 at 13:50 -0600, Kevin Grittner wrote:
A somewhat dated description for Sybase (it predates their support of
row level locks and the related predicate locks on indexes) is here:Simplified, in a READ COMMITTED transaction a SELECT takes a lock
which conflicts with update before reading, and holds it until the
executing statement is done with that table; an UPDATE takes a lock
which conflicts with any access (read or write) before it updates a
row, and holds it until COMMIT or ROLLBACK. This guarantees that the
SELECT doesn't see the results of an incomplete UPDATE, at the expense
of taking locks, blocking, and possible serialization failures (in the
form of deadlocks).
That doesn't quite answer the question about UPDATE specifically.
Pretend for a second that SELECT ... FOR UPDATE is like UPDATE ...
RETURNING. The example in my original email can be easily changed to use
UPDATE ... RETURNING.
The tricky part is when an UPDATE with a search condition reads,
modifies, and writes a value that is used in a search condition for
another UPDATE.
Every DBMS will block waiting for the first UPDATE to finish. Then what?
Do you re-run the query to find new tuples that might now satisfy the
search condition that didn't before? Do you update the new value in all
the tuples you originally found, regardless of whether they still match
the search condition? Do you update the new value in all the tuples that
you found that still match the search condition? Do you update the old
value, perhaps overwriting changes made by the first UPDATE?
What does sybase do in that case?
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> wrote:
The tricky part is when an UPDATE with a search condition reads,
modifies, and writes a value that is used in a search condition for
another UPDATE.Every DBMS will block waiting for the first UPDATE to finish. Then
what?
Either it's totally safe to proceed, or the second UPDATE was rolled
back.
Do you re-run the query to find new tuples that might now satisfy
the search condition that didn't before?
There can't be any. Blocks taken during the reading of rows so far
have not been released, and would preclude the update from changing
results read so far.
Do you update the new value in all the tuples you originally found,
regardless of whether they still match the search condition?
They have to still match, your locks would preclude their
modification.
Do you update the new value in all the tuples that
you found that still match the search condition?
They can't have a new value. Locks.
Do you update the old value, perhaps overwriting changes made by the
first UPDATE?
If you haven't gotten to reading rows yet, you're still OK by the time
the other transaction's locks are released. Look over those rules and
try some thought experiments to convince yourself. It really is safe,
if potentially slow.
-Kevin
On Mon, 2009-01-26 at 14:31 -0600, Kevin Grittner wrote:
Do you re-run the query to find new tuples that might now satisfy
the search condition that didn't before?There can't be any. Blocks taken during the reading of rows so far
have not been released, and would preclude the update from changing
results read so far.
Let's say the sequence is:
Data:
i j
--------
1 10
2 10
3 20
4 20
Session1:
BEGIN;
UPDATE a SET j = (j - 10) WHERE i = 2 OR i = 3;
Session2:
BEGIN;
UPDATE a SET j = j + 100 WHERE j = 10;
Session1:
COMMIT;
Session2:
COMMIT;
In PostgreSQL, the result is:
i | j
---+-----
4 | 20
2 | 0
3 | 10
1 | 110
(4 rows)
Which cannot be obtained by any serial execution. What is the result in
Sybase, Oracle, etc.?
It seems like it would be a challenge to know that the tuple with i=3
would be updated to a value that matches the search condition j=10. So
can you tell me a little more about the mechanism by which Sybase solves
this problem?
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> wrote:
On Mon, 2009-01-26 at 14:31 -0600, Kevin Grittner wrote:
Do you re-run the query to find new tuples that might now satisfy
the search condition that didn't before?There can't be any. Blocks taken during the reading of rows so far
have not been released, and would preclude the update from changing
results read so far.Let's say the sequence is:
Data:
i j
--------
1 10
2 10
3 20
4 20Session1:
BEGIN;
UPDATE a SET j = (j - 10) WHERE i = 2 OR i = 3;
OK, the description would be trivial if we assume page level locks
(the default), so I'll assume row level locks. I'll also assume READ
COMMITTED. (In SERIALIZABLE, no lock is ever released until COMMIT or
ROLLBACK, although the update locks can be downgraded or upgraded.)
Simplified a bit, this places an exclusive lock on rows 2 and 3 and
then updates these two rows in place (no new tuples are created).
Session2:
BEGIN;
UPDATE a SET j = j + 100 WHERE j = 10;
This might update row 1 before blocking on the attempt to read row 2.
Let's say, for sake of argument, that it does. So it is holding an
exclusive lock on row 1 and attempting to acquire an update lock to
read row 2. (It will upgrade this to an exclusive lock if it decides
to update it, or downgrade it to shared if it decides not to do so.)
Session2 is blocked for now.
Session1:
COMMIT;
After the COMMIT succeeds, the locks from Session1 are released.
Session2 acquires its update lock and reads row 2, finds that it
doesn't match its update criteria, downgrades the lock to shared,
acquires an update lock on row 3, finds that it does match the
selection criteria, upgrades the lock to exclusive, updates it,
acquires and update lock on row 4 finds that it doesn't match the
update criteria, downgrades the lock to shared, hits the end of table,
releases the shared locks.
Session2:
COMMIT;
After the COMMIT succeeds, the locks from Session2 are released.
In PostgreSQL, the result is:
i | j
---+-----
4 | 20
2 | 0
3 | 10
1 | 110
(4 rows)Which cannot be obtained by any serial execution. What is the result
in Sybase, Oracle, etc.?
I can't be sure about Oracle, but I think its results would match
PostgreSQL. In Sybase, with either READ COMMITTED or SERIALIZABLE,
the result would be:
i | j
---+-----
1 | 110
2 | 0
3 | 110
4 | 20
(4 rows)
If that explanation wasn't clear, let me know.
Let me restate -- I don't propose that PostgreSQL implement this
locking scheme. I think it can and should do better in approaching
compliance with the standard, and with ACID properties, without
compromising concurrency and performance to the degree required by
this sort of locking and blocking.
-Kevin
Jeff Davis <pgsql@j-davis.com> writes:
It seems like it would be a challenge to know that the tuple with i=3
would be updated to a value that matches the search condition j=10. So
can you tell me a little more about the mechanism by which Sybase solves
this problem?
This example is a case of the same issue we were discussing earlier involving
"predicate locking". To solve it you need a way to lock records that your
query *isn't* accessing and may not even exist yet to prevent them from being
turned into (or inserted as) records your query should be accessing.
As Kevin described it earlier Sybase locks the index pages containing the key
range you're accessing preventing anyone from inserting new index pointers in
that range. If there's no index it locks the entire table on every select to
prevent any updates or inserts in the table until your transaction finishes.
In any case note that your example is not *serializable*. (Though in Postgres
it can happen even in serializable mode, so that's not much of a defence.) I'm
unclear what whether it manifests any of the phenomenon which are prohibited
for READ COMMITTED.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
Gregory Stark <stark@enterprisedb.com> wrote:
This example is a case of the same issue we were discussing earlier
involving "predicate locking". To solve it you need a way to lock
records that your query *isn't* accessing and may not even exist yet
to prevent them from being turned into (or inserted as) records your
query should be accessing.As Kevin described it earlier Sybase locks the index pages
containing the key range you're accessing preventing anyone from
inserting new index pointers in that range. If there's no index it
locks the entire table on every select to prevent any updates or
inserts in the table until your transaction finishes.
Well, for READ COMMITTED in Sybase it's only until the end of the
statement.
Hmmm.... I'm clearly getting a bit rusty on my Sybase row level
locking rules. I got some details wrong in my example, but the
outcome would be the same. Broader locks though, leading to more
potential blocking.
I'm unclear what whether it manifests any of the phenomenon which
are prohibited for READ COMMITTED.
Interesting question. It's behavior not possible in 2 phase locking,
but not explicitly prohibited by the standard. Better watch that kind
of talk, though, or they may go and change the standard again. ;-)
-Kevin
On Mon, 2009-01-26 at 09:26 -0800, Jeff Davis wrote:
On Mon, 2009-01-26 at 10:48 -0600, Kevin Grittner wrote:
I guess the issue of whether this violation of ACID properties should
be considered a bug or a feature is a separate discussion, but calling
it a feature seems like a hard sell to me.I think I understand the other perspective on this now: SELECT FOR
UPDATE/SHARE is an entirely separate command that is more similar (in
transactional semantics) to UPDATE than to SELECT.
You can think of SELECT FOR UPDATE as first half of UPDATE command
UPDATE is in this case split in two
SELECT FOR UPDATE
UPDATE WHERE CURRENT
which means that yes, it has to follow UPDATE semantics to be of any use
in FOR UPDATE case.
In fact, it's probably most similar to UPDATE ... RETURNING, which will
give the same result (that breaks atomicity or isolation, depending on
your point of view), which is correct for READ COMMITTED isolation
level.Because the command begins with SELECT, I would expect it to follow the
rules of SELECT with the side effect of locking. I would think that the
standard would have something to say about this*.I certainly don't think it's intuitive behavior.
Regards,
Jeff Davis.*: It appears that SELECT ... FOR UPDATE is not in the standard, which
would indicate to me that the SELECT statement should still behave
according to SELECT isolation/snapshot rules. But when I guess about the
standard, I'm usually wrong.
--
------------------------------------------
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training
On Mon, 2009-01-26 at 15:46 -0600, Kevin Grittner wrote:
After the COMMIT succeeds, the locks from Session1 are released.
Session2 acquires its update lock and reads row 2, finds that it
doesn't match its update criteria, downgrades the lock to shared,
acquires an update lock on row 3, finds that it does match the
selection criteria, upgrades the lock to exclusive, updates it,
acquires and update lock on row 4 finds that it doesn't match the
update criteria, downgrades the lock to shared, hits the end of table,
releases the shared locks.
This is the part I'm having a problem with. This depends on row 3 being
read after row 2. If that weren't the case (say, with a more complex
update and a more complex search criteria), then the index scan would
have already passed by the value and would never know that it was
updated to a value that does match the search criteria.
Data:
i j
--------
1 20
2 40
3 50
4 80
S1:
BEGIN;
UPDATE a SET j = (j - 10) WHERE i = 2 OR i = 3;
S2:
BEGIN;
UPDATE a SET j = j + 100 WHERE j = 10 or j = 40;
-- Here, the index scan is already past j=10 by the time
-- it blocks on a concurrently-updated tuple
S1:
COMMIT;
S2:
COMMIT;
In PostgreSQL this sequence results in:
i | j
---+----
1 | 20
4 | 80
2 | 30
3 | 40
The second update matched no tuples at all.
Let me restate -- I don't propose that PostgreSQL implement this
locking scheme. I think it can and should do better in approaching
compliance with the standard, and with ACID properties, without
compromising concurrency and performance to the degree required by
this sort of locking and blocking.
I think Greg has it right: without predicate locking we can't really
achieve the behavior you're expecting. So how would we better approach
the semantics you want without it?
Regards,
Jeff Davis