Overhead cost of Serializable Snapshot Isolation
I'm looking into upgrading a fairly busy system to 9.1. They use
serializable mode for a few certain things, and suffer through some
serialization errors as a result. While looking over the new
serializable/SSI documentation, one thing that stood out is:
http://www.postgresql.org/docs/current/interactive/transaction-iso.html
"The monitoring of read/write dependencies has a cost, as does the restart of
transactions which are terminated with a serialization failure, but balanced
against the cost and blocking involved in use of explicit locks and SELECT
FOR UPDATE or SELECT FOR SHARE, Serializable transactions are the best
performance choice for some environments."
I agree it is better versus SELECT FOR, but what about repeatable read versus
the new serializable? How much overhead is there in the 'monitoring of
read/write dependencies'? This is my only concern at the moment. Are we
talking insignificant overhead? Minor? Is it measurable? Hard to say without
knowing the number of txns, number of locks, etc.?
--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8
On 10.10.2011 21:25, Greg Sabino Mullane wrote:
I agree it is better versus SELECT FOR, but what about repeatable read versus
the new serializable? How much overhead is there in the 'monitoring of
read/write dependencies'? This is my only concern at the moment. Are we
talking insignificant overhead? Minor? Is it measurable? Hard to say without
knowing the number of txns, number of locks, etc.?
I'm sure it does depend heavily on all of those things, but IIRC Kevin
ran some tests earlier in the spring and saw a 5% slowdown. That feels
like reasonable initial guess to me. If you can run some tests and
measure the overhead in your application, it would be nice to hear about it.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Oct 10, 2011 at 02:25:59PM -0400, Greg Sabino Mullane wrote:
I agree it is better versus SELECT FOR, but what about repeatable read versus
the new serializable? How much overhead is there in the 'monitoring of
read/write dependencies'? This is my only concern at the moment. Are we
talking insignificant overhead? Minor? Is it measurable? Hard to say without
knowing the number of txns, number of locks, etc.?
I'd expect that in most cases the main cost is not going to be overhead
from the lock manager but rather the cost of having transactions
aborted due to conflicts. (But the rollback rate is extremely
workload-dependent.)
We've seen CPU overhead from the lock manager to be a few percent on a
CPU-bound workload (in-memory pgbench). Also, if you're using a system
with many cores and a similar workload, SerializableXactHashLock might
become a scalability bottleneck.
Dan
--
Dan R. K. Ports MIT CSAIL http://drkp.net/
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 10.10.2011 21:25, Greg Sabino Mullane wrote:
I agree it is better versus SELECT FOR, but what about repeatable
read versus the new serializable? How much overhead is there in
the 'monitoring of read/write dependencies'? This is my only
concern at the moment. Are we talking insignificant overhead?
Minor? Is it measurable? Hard to say without knowing the number
of txns, number of locks, etc.?I'm sure it does depend heavily on all of those things, but IIRC
Kevin ran some tests earlier in the spring and saw a 5% slowdown.
That feels like reasonable initial guess to me. If you can run
some tests and measure the overhead in your application, it would
be nice to hear about it.
Right: the only real answer is "it depends". At various times I ran
different benchmarks where the overhead ranged from "lost in the
noise" to about 5% for one variety of "worst case". Dan ran DBT-2,
following the instructions on how to measure performance quite
rigorously, and came up with a 2% hit versus repeatable read for
that workload. I rarely found a benchmark where the hit exceeded
2%, but I have a report of a workload where they hit was 20% -- for
constantly overlapping long-running transactions contending for the
same table.
I do have some concern about whether the performance improvements
from reduced LW locking contention elsewhere in the code may (in
whack-a-mole fashion) cause the percentages to go higher in SSI.
The biggest performance issues in some of the SSI benchmarks were on
LW lock contention, so those may become more noticeable as other
contention is reduced. I've been trying to follow along on the
threads regarding Robert's work in that area, with hopes of applying
some of the same techniques to SSI, but it's not clear whether I'll
have time to work on that for the 9.2 release. (It's actually
looking improbably at this point.)
If you give it a try, please optimize using the performance
considerations for SSI in the manual. They can make a big
difference.
-Kevin
On Mon, Oct 10, 2011 at 02:59:04PM -0500, Kevin Grittner wrote:
I do have some concern about whether the performance improvements
from reduced LW locking contention elsewhere in the code may (in
whack-a-mole fashion) cause the percentages to go higher in SSI.
The biggest performance issues in some of the SSI benchmarks were on
LW lock contention, so those may become more noticeable as other
contention is reduced. I've been trying to follow along on the
threads regarding Robert's work in that area, with hopes of applying
some of the same techniques to SSI, but it's not clear whether I'll
have time to work on that for the 9.2 release. (It's actually
looking improbably at this point.)
I spent some time thinking about this a while back, but didn't have
time to get very far. The problem isn't contention in the predicate
lock manager (which is partitioned) but the single lock protecting the
active SerializableXact state.
It would probably help things a great deal if we could make that lock
more fine-grained. However, it's tricky to do this without deadlocking
because the serialization failure checks need to examine a node's
neighbors in the dependency graph.
Dan
--
Dan R. K. Ports MIT CSAIL http://drkp.net/
Dan Ports <drkp@csail.mit.edu> wrote:
I spent some time thinking about this a while back, but didn't
have time to get very far. The problem isn't contention in the
predicate lock manager (which is partitioned) but the single lock
protecting the active SerializableXact state.It would probably help things a great deal if we could make that
lock more fine-grained. However, it's tricky to do this without
deadlocking because the serialization failure checks need to
examine a node's neighbors in the dependency graph.
Did you ever see much contention on
SerializablePredicateLockListLock, or was it just
SerializableXactHashLock? I think the former might be able to use
the non-blocking techniques, but I fear the main issue is with the
latter, which seems like a harder problem.
-Kevin
On Mon, Oct 10, 2011 at 04:10:18PM -0500, Kevin Grittner wrote:
Did you ever see much contention on
SerializablePredicateLockListLock, or was it just
SerializableXactHashLock? I think the former might be able to use
the non-blocking techniques, but I fear the main issue is with the
latter, which seems like a harder problem.
No, not that I recall -- if SerializablePredicateLockListLock was on
the list of contended locks, it was pretty far down.
SerializableXactHashLock was the main bottleneck, and
SerializableXactFinishedListLock was a lesser but still significant
one.
Dan
--
Dan R. K. Ports MIT CSAIL http://drkp.net/
On Mon, Oct 10, 2011 at 8:30 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 10.10.2011 21:25, Greg Sabino Mullane wrote:
I agree it is better versus SELECT FOR, but what about repeatable read
versus
the new serializable? How much overhead is there in the 'monitoring of
read/write dependencies'? This is my only concern at the moment. Are we
talking insignificant overhead? Minor? Is it measurable? Hard to say
without
knowing the number of txns, number of locks, etc.?I'm sure it does depend heavily on all of those things, but IIRC Kevin ran
some tests earlier in the spring and saw a 5% slowdown. That feels like
reasonable initial guess to me. If you can run some tests and measure the
overhead in your application, it would be nice to hear about it.
How do we turn it on/off to allow the overhead to be measured?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> wrote:
How do we turn it on/off to allow the overhead to be measured?
User REPEATABLE READ transactions or SERIALIZABLE transactions. The
easiest way, if you're doing it for all transactions (which I
recommend) is to set default_transaction_isolation.
-Kevin
On Mon, Oct 10, 2011 at 3:59 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
I do have some concern about whether the performance improvements
from reduced LW locking contention elsewhere in the code may (in
whack-a-mole fashion) cause the percentages to go higher in SSI.
The biggest performance issues in some of the SSI benchmarks were on
LW lock contention, so those may become more noticeable as other
contention is reduced. I've been trying to follow along on the
threads regarding Robert's work in that area, with hopes of applying
some of the same techniques to SSI, but it's not clear whether I'll
have time to work on that for the 9.2 release. (It's actually
looking improbably at this point.)
I ran my good old pgbench -S, scale factor 100, shared_buffers = 8GB
test on Nate Boley's box. I ran it on both 9.1 and 9.2dev, and at all
three isolation levels. As usual, I took the median of three 5-minute
runs, which I've generally found adequate to eliminate the noise. On
both 9.1 and 9.2dev, read committed and repeatable read have basically
identical performance; if anything, repeatable read may be slightly
better - which would make sense, if it cuts down the number of
snapshots taken.
Serializable mode is much slower on this test, though. On
REL9_1_STABLE, it's about 8% slower with a single client. At 8
clients, the difference rises to 43%, and at 32 clients, it's 51%
slower. On 9.2devel, raw performance is somewhat higher (e.g. +51% at
8 clients) but the performance when not using SSI has improved so much
that the performance gap between serializable and the other two
isolation levels is now huge: with 32 clients, in serializable mode,
the median result was 21114.577645 tps; in read committed,
218748.929692 tps - that is, read committed is running more than ten
times faster than serializable. Data are attached, in text form and
as a plot. I excluded the repeatable read results from the plot as
they just clutter it up - they're basically on top of the read
committed results.
I haven't run this with LWLOCK_STATS, but my seat-of-the-pants guess
is that there's a single lightweight lock that everything is
bottlenecking on. One possible difference between this test case and
the ones you may have used is that this case involves lots and lots of
really short transactions that don't do much. The effect of anything
that only happens once or a few times per transaction is really
magnified in this type of workload (which is why the locking changes
make so much of a difference here - in a longer or heavier-weight
transaction that stuff would be lost in the noise).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
isolation-scaling.pngimage/png; name=isolation-scaling.pngDownload
�PNG
IHDR � � ,� #PLTE�������� |�@��������������������������p��`��@��@�� �`��`��@��@�� Uk/�P@���� ������ �� �������k�����z��r�E ����P�� ��� � ����� � �p � �.�W"�"