Serializable Isolation without blocking
While discussing potential changes to PostgreSQL documentation of
transaction isolation levels, Emmanuel Cecchet pointed out an
intriguing new paper[1]Michael J. Cahill, Uwe Rᅵhm, Alan D. Fekete. Serializable Isolation for Snapshot Databases. In the Proceedings of the 2008 ACM SIGMOD international conference on management of data, pages 729-738. http://doi.acm.org/10.1145/1376616.1376690 on a new algorithm to provide true
serializable behavior in a MVCC based database, with no additional
blocking; although, there being no such things as a free lunch, there
is an increase in serialization failures under contention. I have
been hesitant to raise the issue while everyone was busy trying to
wrap up release 8.4; but since that is now in beta testing and PGCon
is fast approaching, I wanted to put the issue out there so that
people have a chance to review it and discuss it.
Michael Cahill has given me permission to quote from the paper. He
has also provided a URL to his personal version of the work[2]http://cahill.net.au/wp-content/uploads/2009/01/real-serializable.pdf, which
people may directly access for their personal use, although
redistribution is prohibited by the ACM copyright. He has asked to be
copied on the discussion here.
I know that some here have questioned why anyone would want
serializable transactions. Our environment has 21 programmers, 21
business process analysts, and four DBAs. A major part of the time
for this staff is enhancement of existing software and development of
new software. We have many distinct databases, the largest of which
has a schema of over 300 tables. (That's well normalized and not
partitioned -- the structure of the data really is that complex.) We
have over 8,700 queries against these various databases, including
OLTP, reporting, statistics, public access, web queries, etc. If one
were to go through the well-know techniques to identify all possible
interactions between these queries against these tables, it would not
only be a massive undertaking, the results would immediately be
obsolete.
The nice thing about serializable transactions is that if you can show
that a transaction does the right thing when run by itself, you
automatically know that it will function correctly when run in any
mix, or it will fail with a serializable error and can be safely
retried. Our framework is designed so that serialization errors are
automatically detected and the transaction is retried without any user
interaction or application programming needed -- a serialization
failure appears to the application code and the user the same as
simple blocking.
Quoting the paper's abstract:
"Many popular database management systems offer snapshot isolation
rather than full serializability. There are well known anomalies
permitted by snapshot isolation that can lead to violations of data
consistency by interleaving transactions that individually maintain
consistency. Until now, the only way to prevent these anomalies was to
modify the applications by introducing artificial locking or update
conflicts, following careful analysis of conflicts between all pairs
of transactions.
"This paper describes a modification to the concurrency control
algorithm of a database management system that automatically detects
and prevents snapshot isolation anomalies at runtime for arbitrary
applications, thus providing serializable isolation. The new algorithm
preserves the properties that make snapshot isolation attractive,
including that readers do not block writers and vice versa. An
implementation and performance study of the algorithm are described,
showing that the throughput approaches that of snapshot isolation in
most cases."
Quoting a portion of the conclusions:
"A prototype of the algorithm has been implemented in Oracle Berkeley
DB and shown to perform significantly better that two-phase locking in
a variety of cases, and often comparably with snapshot isolation.
"One property of Berkeley DB that simplified our implementation was
working with page level locking and versioning. In databases that
version and lock at row-level granularity (or finer), additional
effort would be required to avoid phantoms, analogous to standard two
phase locking approaches such as multigranularity locking."
Quoting a snippet from the implementation section:
"Making these changes to Berkeley DB involved only modest changes to
the source code. In total, only 692 lines of code (LOC) were modified
out of a total of over 200,000 lines of code in Berkeley DB."
Michael J. Cahill has since implemented these techniques in InnoDB as
part of his PhD work. While Microsoft SQL Server does provide full
serializability in an MVCC implementation, I believe they do it with
blocking rather than this newer and faster technique.
The paper is a very interesting read, and I fear that if we don't
pursue these techniques, InnoDB users will have legitimate bragging
rights over PostgreSQL users in an important area.
Oh, and I know that these issues are well known, and I know that the
solution involves predicate locks; although these won't necessarily be
locks which cause blocking.
-Kevin
[1]: Michael J. Cahill, Uwe Rᅵhm, Alan D. Fekete. Serializable Isolation for Snapshot Databases. In the Proceedings of the 2008 ACM SIGMOD international conference on management of data, pages 729-738. http://doi.acm.org/10.1145/1376616.1376690
Isolation for Snapshot Databases. In the Proceedings of the 2008 ACM
SIGMOD international conference on management of data, pages 729-738.
http://doi.acm.org/10.1145/1376616.1376690
[2]: http://cahill.net.au/wp-content/uploads/2009/01/real-serializable.pdf
http://cahill.net.au/wp-content/uploads/2009/01/real-serializable.pdf
On Tue, May 5, 2009 at 8:50 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
While discussing potential changes to PostgreSQL documentation of
transaction isolation levels, Emmanuel Cecchet pointed out an
intriguing new paper[1] on a new algorithm to provide true
serializable behavior in a MVCC based database
I agree, this is very interesting work. I blogged about it a while ago[1]http://everythingisdata.wordpress.com/2009/02/25/february-25-2009/.
"Making these changes to Berkeley DB involved only modest changes to
the source code. In total, only 692 lines of code (LOC) were modified
out of a total of over 200,000 lines of code in Berkeley DB."
Tracking the read sets of each transaction would be very expensive.
Worse still, that information needs to be kept around after
end-of-transaction, which raises questions about where it should be
stored and how it should be cleaned up. Note that the benchmarks in
the paper involve transactions that perform "a small number of simple
read and update operations", which reduces the bookkeeping overhead.
Neil
[1]: http://everythingisdata.wordpress.com/2009/02/25/february-25-2009/
Neil Conway <neil.conway@gmail.com> wrote:
Tracking the read sets of each transaction would be very expensive.
Worse still, that information needs to be kept around after
end-of-transaction, which raises questions about where it should be
stored and how it should be cleaned up. Note that the benchmarks in
the paper involve transactions that perform "a small number of
simple read and update operations", which reduces the bookkeeping
overhead.
I know that some of the simplifying assumptions listed in 3.1 do not
currently hold for PostgreSQL. A prerequisite for using the algorithm
would be to make them hold for PostgreSQL, or find some way to work
around their absence. I hope people will read the paper and mull it
over, but these assumptions are probably the crux or whether this
enhancement is feasible. I guess it would be best to throw that much
out to the list for discussion.
To quote:
1. For any data item x, we can efficiently get the list of
locks held on x.
2. For any lock l in the system, we can efficiently get
l.owner, the transaction object that requested the lock.
3. For any version xt of a data item in the system, we
can efficiently get xt.creator, the transaction object
that created that version.
4. When *nding a version of item x valid at some given
timestamp, we can efficiently get the list of other ver-
sions of x that have later timestamps.
Based on my limited understanding, I don't get the impression 2 or 3
are a problem.
I'm assuming that we would use the structures which back the pg_locks
view for the SIREAD locks implicit in 1, possibly with some scope
escalation as counts for a table rise (row to page to extent to
table). This may require a larger allocation for this information by
those wanting to use serializable transactions.
I'm not sure how 4 could be handled.
I don't know how much work would be required to track the transaction
information listed in section 4.1 (or its functional equivalent).
I'd be happy to hear the opinion of those more familiar with the
internals than I.
-Kevin
Kevin Grittner wrote:
While discussing potential changes to PostgreSQL documentation of
transaction isolation levels, Emmanuel Cecchet pointed out an
intriguing new paper[1] on a new algorithm to provide true
serializable behavior in a MVCC based database, with no additional
blocking; although, there being no such things as a free lunch, there
is an increase in serialization failures under contention.
I have read through the paper and will share my comments.
I hope I got it right:
To put it in a nutshell, the approach to true serializability presented
in the paper involves "intention locks" which do not block writers,
but cause transactions with conflict potential to be aborted.
The main cost incurred is the maintenance of these intention locks, which
need to be kept for a while even after a transaction has committed.
Moreover, there will potentially be many of these locks (think of
SELECT COUNT(*) FROM ...), so some lock escalation mechanism (to
page or table locks) will be necessary.
I don't know how hard that would be to implement, but I'd argue
that it is only worth considering if it guarantees true serializability.
The paper describes only intention locks for rows in the select list
of a statement and deliberately ignores rows which are examined in
the WHERE clause. The authors argue in section 3.4 that this is no
problem in their implementation since "Berkeley DB does all locking
and versioning on page granularity".
I don't buy that; maybe I misunderstood something.
Consider a function
"makehighlander(personid integer) RETURNS void"
defined like this:
SELECT ishighlander INTO b FROM scots WHERE id=personid;
IF b THEN
RETURN; /* no need to do anything */
END IF;
UPDATE scots SET ishighlander=TRUE WHERE id=personid;
SELECT count(*) INTO n FROM scots WHERE ishighlander;
IF (n > 1) THEN
RAISE EXCEPTION 'There can be only one';
END IF;
If we assume that "ishighlander" is false for all records in
the beginning, and there are two calls to the function with
two personid's of records *in different pages*, then there cannot be
any conflicts since all (write and intention) locks taken by each of
these calls should only affect the one page that contains the one
record that is updated and then found in the subsequent SELECT.
Yet if the two execute concurrently and the two first SELECTs are
executed before the two UPDATEs, then both functions have a snapshot
so that the final SELECT statements will return 1 and both functions will
succeed, leaving the table with two highlanders.
So I think one would have to add intention locks for rows considered
in the WHERE clause to guarantee true serializability.
It would be interesting to know how many lines of code would have
to be added to implement that and how performance would be affected.
I'm afraid that concurrency would suffer because more conflicting
transactions would be aborted.
Another thing I wonder is whether the authors have considered the
possibility that there are serializable and "cursor stability"
transactions at the same time, where the latter would not take
intention locks. Could that affect the considerations about
correctness of the algorithm?
Yours,
Laurenz Albe
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
So I think one would have to add intention locks for rows considered
in the WHERE clause to guarantee true serializability.
Does the paper explain how to deal with rows "considered" in the WHERE clause
which don't yet exist? Ie, "SELECT count(*) WHERE foo" needs to take out a
lock which would cause any transaction which inserts a new record where foo is
true to be abort.
In MSSQL this requires locking the page of the index where such records would
be inserted (or the entire table if there's no index). In Predicate locking
schemes this requires a separate storage structure for storing such predicates
which can be arbitrarily complex expressions to check any new tuple being
inserted against.
Are these intention locks predicate locks, in that they're not associated with
actual pages or records but with potential records which might be inserted in
the future?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!
On Tue, 2009-05-05 at 10:50 -0500, Kevin Grittner wrote:
"This paper describes a modification to the concurrency control
algorithm of a database management system that automatically detects
and prevents snapshot isolation anomalies at runtime for arbitrary
applications, thus providing serializable isolation. The new algorithm
preserves the properties that make snapshot isolation attractive,
including that readers do not block writers and vice versa. An
implementation and performance study of the algorithm are described,
showing that the throughput approaches that of snapshot isolation in
most cases."
I think this is important work in database theory and a good future
direction for us. It's the right thing to keep an eye on developments
and to consider implementation.
We would need to decide whether intention locks were indeed necessary
when we have MVCC also. Other DBMS without visibility may need to be
more restrictive than we would have to be to implement this. Not sure.
It wouldn't be 692 lines of code and even if it were the impact of that
code would be such that it would need to be optional, since it would
differ in definition from an existing SQL Standard isolation mode and it
would have additional performance implications.
If the use is optional, I would currently prefer the existing mechanism
for implementing serialization, which is to serialize access directly
using either a LOCK statement or an exclusive advisory lock. It's clear
that any new-theory solution will cost significantly more as the number
of users increases, at least O(N^2), whereas simply waiting is only
O(N), AFAICS. (Michael et al don't discuss the algorithmic complexity).
So it seems its use would require some thought and care and possibly
further research to uncover areas of applicability in real usage.
So for me, I would say we leave this be until the SQLStandard changes to
recognise the additional mode. I don't see much advantage for us in
breaking the ground on this feature and it will be costly to implement,
so is a good PhD project.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
It wouldn't be 692 lines of code and even if it were the impact of that
code would be such that it would need to be optional, since it would
differ in definition from an existing SQL Standard isolation mode and it
would have additional performance implications.
I thought it would be equal to the SQL standard Serializable mode,
whereas what we currently call serializable is in fact not as strong as
the SQL standard Serializable mode.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Greg Stark wrote:
So I think one would have to add intention locks for rows considered
in the WHERE clause to guarantee true serializability.Does the paper explain how to deal with rows "considered" in the WHERE clause
which don't yet exist? Ie, "SELECT count(*) WHERE foo" needs to take out a
lock which would cause any transaction which inserts a new record where foo is
true to be abort.
Quote:
"To prevent phantoms in a system with row-level locking and versioning,
the algorithm described here would need to be extended to take SIREAD locks
on larger granules analogously to multigranularity intention locks in
traditional two-phase locking systems."
[...]
"We have not pursued the details in this paper because the phantom
issue does not arise in our prototype implementation, since Oracle
Berkeley DB does all locking and versioning at page granularity."
End quote.
Are these intention locks predicate locks, in that they're not associated with
actual pages or records but with potential records which might be inserted in
the future?
No, they are associated with the page that contains the actual record.
I think that's also meant with the "larger granules" in the above quote:
Take an intention lock on every page which might affect the condition.
Yours,
Laurenz Albe
On Thu, 2009-05-07 at 15:26 +0300, Heikki Linnakangas wrote:
Simon Riggs wrote:
It wouldn't be 692 lines of code and even if it were the impact of that
code would be such that it would need to be optional, since it would
differ in definition from an existing SQL Standard isolation mode and it
would have additional performance implications.I thought it would be equal to the SQL standard Serializable mode,
whereas what we currently call serializable is in fact not as strong as
the SQL standard Serializable mode.
Our serializable is the same as Oracle's implementation. I think it
would be confusing and non-useful to redefine ours, when it has already
been accepted that the Oracle definition implements the standard
reasonably closely. If that changes, we should also, however.
Perhaps the key point is the O(N^2) complexity of the new algorithm.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
maybe I misunderstood something.
Consider a function
"makehighlander(personid integer) RETURNS void"
defined like this:SELECT ishighlander INTO b FROM scots WHERE id=personid;
IF b THEN
RETURN; /* no need to do anything */
END IF;
UPDATE scots SET ishighlander=TRUE WHERE id=personid;
SELECT count(*) INTO n FROM scots WHERE ishighlander;
IF (n > 1) THEN
RAISE EXCEPTION 'There can be only one';
END IF;If we assume that "ishighlander" is false for all records in
the beginning, and there are two calls to the function with
two personid's of records *in different pages*, then there cannot be
any conflicts since all (write and intention) locks taken by each of
these calls should only affect the one page that contains the one
record that is updated and then found in the subsequent SELECT.Yet if the two execute concurrently and the two first SELECTs are
executed before the two UPDATEs, then both functions have a snapshot
so that the final SELECT statements will return 1 and both functions
will succeed, leaving the table with two highlanders.
I do think you misunderstood. If there are two concurrent executions
and each reads one row, there will be an SIREAD lock for each of those
rows. As an example, let's say that one of them (T0) updates its row
and does its count, finds everything looks fine, and commits. In
reading the row the other transaction (T1) modified it sets the
T0.outConflict flag to true and the T1.inConflict flag to true. No
blocking occurs. Now T1 updates its row. Still no problem, because
if it committed there, there would still be a sequence of transactions
(T0 followed by T1) which would be consistent with the results; but it
selects rows which include the one modified by T0, which causes
T0.inConflict and T1.outConflict to be set to true. These would both
be pivots in an unsafe pattern of updates. No mystery which one needs
to be rolled back -- T0 has already committed; so T1 is rolled back
with a serialization failure (probably indicating that it is an unsafe
update versus an update conflict or a deadlock, which are two other
forms of serialization failure). Assuming that the software
recognizes the serialization failure code and retries, it now finds
that there is already a highlander and fails for real.
-Kevin
Gregory Stark <stark@enterprisedb.com> wrote:
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
So I think one would have to add intention locks for rows
considered in the WHERE clause to guarantee true serializability.Does the paper explain how to deal with rows "considered" in the
WHERE clause which don't yet exist? Ie, "SELECT count(*) WHERE foo"
needs to take out a lock which would cause any transaction which
inserts a new record where foo is true to be abort.
The issue is mentioned, along with the note, quoted in my original
post, of why they were able to dodge the issue in the Berkeley DB
implementation.
In MSSQL this requires locking the page of the index where such
records would be inserted (or the entire table if there's no index).
This is the only form of predicate locking I've seen in real-world
production databases which provide true serializable behavior.
In Predicate locking schemes this requires a separate storage
structure for storing such predicates which can be arbitrarily
complex expressions to check any new tuple being inserted against.
I've never seen that done in real-world production databases, although
I'm sure it's pretty in theory.
Are these intention locks predicate locks, in that they're not
associated with actual pages or records but with potential records
which might be inserted in the future?
They are predicate locks in the sense that they detect all conflicts
which could occur based on the actual predicate, though they tend to
indicate conflicts in some situations where a rigorous (and expensive)
analisys of the actual predicates might not; but please note that such
locks would be SIREAD locks, which don't block any data modification,
but are only used to detect dangerous update patterns.
-Kevin
Kevin Grittner wrote:
maybe I misunderstood something.
Consider a function
"makehighlander(personid integer) RETURNS void"
defined like this:SELECT ishighlander INTO b FROM scots WHERE id=personid;
IF b THEN
RETURN; /* no need to do anything */
END IF;
UPDATE scots SET ishighlander=TRUE WHERE id=personid;
SELECT count(*) INTO n FROM scots WHERE ishighlander;
IF (n > 1) THEN
RAISE EXCEPTION 'There can be only one';
END IF;If we assume that "ishighlander" is false for all records in
the beginning, and there are two calls to the function with
two personid's of records *in different pages*, then there cannot be
any conflicts since all (write and intention) locks taken by each of
these calls should only affect the one page that contains the one
record that is updated and then found in the subsequent SELECT.Yet if the two execute concurrently and the two first SELECTs are
executed before the two UPDATEs, then both functions have a snapshot
so that the final SELECT statements will return 1 and both functions
will succeed, leaving the table with two highlanders.I do think you misunderstood. If there are two concurrent executions
and each reads one row, there will be an SIREAD lock for each of those
rows. As an example, let's say that one of them (T0) updates its row
and does its count, finds everything looks fine, and commits. In
reading the row the other transaction (T1) modified it sets the
T0.outConflict flag to true and the T1.inConflict flag to true.
Where does T0 read the row that T1 modified?
No
blocking occurs. Now T1 updates its row.
Wait a minute, I am confused. I thought T1 had already modified the row
before T0 committed? Or is "modify" not the update?
Still no problem, because
if it committed there, there would still be a sequence of transactions
(T0 followed by T1) which would be consistent with the results; but it
selects rows which include the one modified by T0, which causes
T0.inConflict and T1.outConflict to be set to true.
Where does T1 select rows that were modified by T0? It selects only one
row, the one it modified itself, right?
These would both
be pivots in an unsafe pattern of updates. No mystery which one needs
to be rolled back -- T0 has already committed; so T1 is rolled back
with a serialization failure (probably indicating that it is an unsafe
update versus an update conflict or a deadlock, which are two other
forms of serialization failure). Assuming that the software
recognizes the serialization failure code and retries, it now finds
that there is already a highlander and fails for real.
You see, there must be something fundamental I am getting wrong.
Yours,
Laurenz Albe
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Simon Riggs wrote:
It wouldn't be 692 lines of code and even if it were the impact of
that code would be such that it would need to be optional, since it
would differ in definition from an existing SQL Standard isolation
mode and it would have additional performance implications.I thought it would be equal to the SQL standard Serializable mode,
whereas what we currently call serializable is in fact not as strong
as the SQL standard Serializable mode.
Exactly. The standard probably *should* add SNAPSHOT between
REPEATABLE READ and SERIALIZABLE, but so far have not. As of the 2003
version of the SQL spec, they added explicit language that makes it
clear that what you get when you ask for SERIALIZABLE mode in
PostgreSQL is *not* compliant (although it is more than adequate for
REPEATABLE READ).
By the way, the other modes are all optional, as you're allowed to
escalate to a higher level whenever a lower level is requested;
SERIALIZABLE is required by the standard and is specified as the
default.
-Kevin
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
Kevin Grittner wrote:
I do think you misunderstood. If there are two concurrent
executions and each reads one row, there will be an SIREAD lock for
each of those rows. As an example, let's say that one of them (T0)
updates its row and does its count, finds everything looks fine,
and commits. In reading the row the other transaction (T1)
modified it sets the T0.outConflict flag to true and the
T1.inConflict flag to true.Where does T0 read the row that T1 modified?
As I said in the original post, I think we would need to track SIREAD
locks in the structures which back the pg_locks view.
blocking occurs. Now T1 updates its row.
Wait a minute, I am confused. I thought T1 had already modified the
row before T0 committed? Or is "modify" not the update?
There are so many sequences that I didn't think it was worthwhile to
step through them all, I did say "As an example, let's say that one of
them (T0) updates its row and does its count, finds everything looks
fine, and commits." If you want to work through the case where they
both UPDATE their rows before either commits, OK; it's not that
different. Things are OK as far as the first select of a modified row
by the other transaction; you record inConflict for one and
outConflict for the other. At the point where it goes both
directions, it is clear that there is a dangerous interaction and one
or the other is rolled back.
Where does T1 select rows that were modified by T0? It selects only
one row, the one it modified itself, right?
You have to select it to know whether to count it, right?
You see, there must be something fundamental I am getting wrong.
It is such a radical departure from traditional blocking approaches,
that it can be hard to get your head around it. :)
-Kevin
Kevin Grittner wrote:
Where does T1 select rows that were modified by T0? It selects only
one row, the one it modified itself, right?You have to select it to know whether to count it, right?
We are getting closer.
So an SIREAD lock is taken for every row that is examined during
the execution of an execution plan?
Ah.
What if there is an index on the "ishighlander" row?
Then an index scan would find only one candidate to examine,
and the other rows would not even be touched by the execution plan.
Then how would they contract an SIREAD lock?
Yours,
Laurenz Albe
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
What if there is an index on the "ishighlander" row?
Then an index scan would find only one candidate to examine,
and the other rows would not even be touched by the execution plan.
I assume you're talking about this line of your function:
SELECT count(*) INTO n FROM scots WHERE ishighlander;
I'm further assuming that you meant an index on the ishighlander
*column*.
I can think of more than one way to handle that. Off the top of my
head, I think it would work to acquire an update lock on both old and
new index entries for a row when it is updated, and to lock the range
of an index used for a scan with the new SIREAD lock. Or perhaps,
since the row must be visited to test visibility, the update lock
could be on the old and new rows, and the index scan would find the
conflict in that way. Or it could keep track of the various tuples
which represent different mutations of a row, and link back from the
"not visible to me" row which has been updated to true, and find that
it is a mutation of a visible row.
These are important implementation details to be worked out (very
carefully!). I don't claim to have worked through all such details
yet, or even to be familiar enough with the PostgreSQL internals to do
so in a reasonable time. :-(
-Kevin
Simon Riggs <simon@2ndQuadrant.com> wrote:
It wouldn't be 692 lines of code
Agreed. The original implementation was in an MVCC database which
already supported full serializability using strict 2 phase locking
and used page level locks. Both of these made the implementation
simpler than it would be in PostgreSQL. (And that's not even
mentioning sub-transactions and distributed transactions!)
and even if it were the impact of that
code would be such that it would need to be optional
I was thinking perhaps a GUC to allow "traditional" behavior when
SERIALIZABLE is requested versus using snapshot isolation for
REPEATABLE READ and this new technique for SERIALIZABLE. Would that
be sane?
If the use is optional, I would currently prefer the existing
mechanism for implementing serialization, which is to serialize
access directly using either a LOCK statement or an exclusive
advisory lock.
I'm sure many will, particularly where the number of tables is less
than 100 and the number of queries which can be run concurrently is
only a thousand or two. Picking out the potential conflicts and
hand-coding serialization techniques becomes more feasible on a small
scale like that.
That said, there's a lot less room for mistakes here, once this new
technique is implemented and settled in. When I was discussing the
receipting and deposit scenario while trying to clarify the
documentation of current behavior, I received several suggestions from
respected members of this community for how that could be handled with
existing techniques which didn't, in fact, correct the problem. That
just points out to me how tricky it is to solve on an ad hoc basis, as
opposed to a more rigorous technique like the one described in the
paper.
The only suggested fix which *did* work forced actual serialization of
all receipts as well as actual serialization of those with the deposit
report query. The beauty of this new technique is that there would
not be any blocking in the described scenario, and there would be a
rollback with serialization failure if (and only if) there was an
attempt to run the deposit report query while a transaction for a
receipt on the old date was still pending. I suspect that the
concurrency improvements of the new technique over existing safe
techniques would allow it to scale well, at least in our environment.
It's clear that any new-theory solution will cost significantly more
as the number of users increases, at least O(N^2), whereas simply
waiting is only O(N), AFAICS.
I'm not following your reasoning on the O(N^2). Could you explain why
you think it would follow that curve?
So it seems its use would require some thought and care and possibly
further research to uncover areas of applicability in real usage.
Care -- of course. Real usage for serializable transactions -- well
known already. (Or are you just questioning performance here?)
So for me, I would say we leave this be until the SQLStandard
changes to recognise the additional mode.
It already recognizes this mode; it doesn't yet recognize snapshot
isolation (more's the pity).
I don't see much advantage for us in breaking the ground on this
feature and it will be costly to > implement, so is a good PhD
project.
Apparently it's already been done as a PhD project -- by Michael
Cahill, against InnoDB.
-Kevin
Please keep Michael Cahill copied on this thread, per his request.
I just noticed the omission on a few messages and will forward them to
him.
-Kevin
On Thu, 2009-05-07 at 11:15 -0500, Kevin Grittner wrote:
Please keep Michael Cahill copied on this thread, per his request.
I just noticed the omission on a few messages and will forward them to
him.
Apologies Michael, I see that my mail did remove you. That was a
unconscious error; I was particularly interested in your comments
regarding my assessment of the algorithmic complexity of the new theory
and existing serialization technique.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Thu, 2009-05-07 at 10:56 -0500, Kevin Grittner wrote:
It's clear that any new-theory solution will cost significantly more
as the number of users increases, at least O(N^2), whereas simply
waiting is only O(N), AFAICS.I'm not following your reasoning on the O(N^2). Could you explain why
you think it would follow that curve?
Each user must compare against work performed by all other users. O(N).
There are N users, so O(N^2).
With reasonable tuning we can make that work with 10 users each checking
the other's data, but with a 100 we'll end up spending more time
checking for aborts (and aborting) than we would if we had just queued
up for it.
If you want this, the simplest implementation is to quite literally
allow only a single SERIALIZABLE txn onto the system at any time. All
other SERIALIZABLEs queue. Note that simple serialization requires no
special handling for aborted transactions. Implementing that will be
fast, proving it works is trivial and it seems will work better in the
general case.
Yeh, it sucks for medium arrival rate transactions, but its great for
low or high arrival rate transactions. The new model is good for medium
arrival rates only and will take a lot of work to implement, correctly
and sufficiently optimally to keep the applicability window wide enough
to justify the effort.
Optimising it would basically entail implementing the equivalent of
block-level locking.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support