SSI and Hot Standby
Here's an issue for feedback from the community -- do we want to
support truly serializable transactions on hot standby machines?
The best way Dan and I have been able to think to do this is to
build on the SERIALIZABLE READ ONLY DEFERRABLE behavior. We are
able to obtain a snapshot and then check to see if it is at a place
in the transaction processing that it would be guaranteed to be
serializable without participating in predicate locking, rw-conflict
detection, etc. If it's not, we block until a READ WRITE
transaction completes, and then check again. Repeat. We may reach
a point where we determine that the snapshot can't work, and we get
a new one and start over. Due to the somewhat complex rules for
this, you are likely to see a safe snapshot fairly quickly even in a
mix which always has short-lived READ WRITE transactions running,
although a single long-running READ WRITE transaction can block
things until it completes.
The idea is that whenever we see a valid snapshot which would yield
a truly serializable view of the data for a READ ONLY transaction,
we add a WAL record with that snapshot information. Of course, we
might want some limit of how often they are sent, to avoid WAL
bloat. A hot standby could just keep the most recently received of
these and use it when a SERIALIZABLE transaction is requested.
Perhaps DEFERRABLE in this context could mean that it waits for the
*next* one and uses it, to assure "freshness".
Actually, we could try to get tricky to avoid sending a complete
snapshot by having two WAL messages with no payload -- one would
mean "the snapshot you would get now is being tested for
serializability". If it failed reach that state we would send
another when we started working a new snapshot. The other type of
message would mean "the snapshot you built when we last told you we
were starting to test one is good." I *think* that can work, and it
may require less WAL space.
If we don't do something like this, do we just provide REPEATABLE
READ on the standby as the strictest level of transaction isolation?
If so, do we generate an error on a request for SERIALIZABLE, warn
and provide degraded behavior, or just quietly give them REPEATABLE
READ behavior?
Thoughts?
-Kevin
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
Here's an issue for feedback from the community -- do we want to
support truly serializable transactions on hot standby machines?
In this release? Maybe? In later releases? Yes.
If it blocks your excellent contribution in this release, then from me,
"no". If you can achieve this in this release, yes. However, if this is
difficult or complex, then I would rather say "not yet" quickly now,
than spend months working out the weirdnesses and possibly still get
them wrong.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> wrote:
In this release? Maybe? In later releases? Yes.
If it blocks your excellent contribution in this release, then
from me, "no". If you can achieve this in this release, yes.
However, if this is difficult or complex, then I would rather say
"not yet" quickly now, than spend months working out the
weirdnesses and possibly still get them wrong.
We already have a mechanism for generating a good snapshot, the hard
part (for me at least) would be to get that snapshot over to the hot
standby and have it use the latest one on a request for a
serializable transaction. I have no experience with WAL file
output, and don't know what it would take for hot standby to use it
as I describe.
I agree it's pretty late in the cycle, but I'm going through all the
loose ends and found this one -- which has been hanging out on the
Wiki page as an R&D item for over a full year without discussion.
:-( If we provide the snapshots (which we can safely and easily
do), can someone else who knows what they're doing with WAL and HS
get the rest of it safely into the release? That seems to me to be
the only way it can still happen for 9.1.
If not, I agree this can be 9.2 material. We just have to decide
how to document it and answer the questions near the bottom of my
initial post of the thread.
-Kevin
On Wed, Jan 19, 2011 at 8:34 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
I agree it's pretty late in the cycle, but I'm going through all the
loose ends and found this one -- which has been hanging out on the
Wiki page as an R&D item for over a full year without discussion.
:-( If we provide the snapshots (which we can safely and easily
do), can someone else who knows what they're doing with WAL and HS
get the rest of it safely into the release? That seems to me to be
the only way it can still happen for 9.1.
I think it's way too late to be embarking on what will probably turn
out to be a reasonably complex and possibly controversial new
development arc. I don't have a strong position on what we should do
instead, but let's NOT do that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Kevin's suggestion seems eminently reasonable to me and probably the
best approach one can do for SSI and hot standby. Pulling it off in
time for 9.1 would be a stretch; 9.2 seems quite doable.
It's worth noting that one way or another, the semantics of
SERIALIZABLE transactions on hot standby replicas could be surprising
to some. There's no getting around this; serializability in distributed
systems is just a hard problem in general. Either we go with Kevin's
suggestion of treating SERIALIZABLE transactions as DEFERRABLE (whether
now or for 9.2), causing them to have to use an older snapshot or block
until an acceptable snapshot becomes available; or we require them to
be downgraded to REPEATABLE READ either implicitly or explicitly.
Now, neither of these is as alarming as they might sound, given that
replication lag is a fact of life for hot standby systems and
REPEATABLE READ is exactly the same as the current (9.0) SERIALIZABLE
behavior. But it's definitely something that should be addressed in
documentation.
Dan
--
Dan R. K. Ports MIT CSAIL http://drkp.net/
On Wed, 2011-01-19 at 19:34 -0600, Kevin Grittner wrote:
I agree it's pretty late in the cycle, but I'm going through all the
loose ends and found this one -- which has been hanging out on the
Wiki page as an R&D item for over a full year without discussion.
:-( If we provide the snapshots (which we can safely and easily
do), can someone else who knows what they're doing with WAL and HS
get the rest of it safely into the release? That seems to me to be
the only way it can still happen for 9.1.
I gave you a quick response to let you know that HS need not be a
blocker, for this release. If you are saying you have knowingly ignored
a requirement for a whole year, then I am shocked. How exactly did you
think this would ever be committed?
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Robert Haas wrote:
Kevin Grittner wrote:
I agree it's pretty late in the cycle, but I'm going through all
the loose ends and found this one -- which has been hanging out on
the Wiki page as an R&D item for over a full year without
discussion. :-( If we provide the snapshots (which we can safely
and easily do), can someone else who knows what they're doing with
WAL and HS get the rest of it safely into the release? That seems
to me to be the only way it can still happen for 9.1.I think it's way too late to be embarking on what will probably
turn out to be a reasonably complex and possibly controversial new
development arc. I don't have a strong position on what we should
do instead, but let's NOT do that.
If that can't reasonably be done for 9.1, well, my next sentence was:
If not, I agree this can be 9.2 material.
It'd be sweet if it could still happen 9.1, but hardly a shock if it
can't. I didn't want to presume to make the call.
Like I said at the start, the alternative is to decide how noisy we
want to be about providing snapshot isolation on hot standbys when
SERIALIZABLE is requested, and figuring out where to document it.
-Kevin
Import Notes
Resolved by subject fallback
* Simon Riggs (simon@2ndQuadrant.com) wrote:
I gave you a quick response to let you know that HS need not be a
blocker, for this release. If you are saying you have knowingly ignored
a requirement for a whole year, then I am shocked. How exactly did you
think this would ever be committed?
Erm, to be perfectly honest, I think the answer is probably "I was
busy.", and "no one provided any feedback on *how* to deal with it."
Given the amount of work that Kevin's put into this patch (which has
been beyond impressive, imv), I have a hard time finding fault with
him not getting time to implement a solution for Hot Standby for this.
As you say, it's not a blocker, I agree completely with that, regardless
of when it was identified as an issue. What we're talking about is
right now, and right now is too late to fix it for HS, and to be
perfectly frank, fixing it for HS isn't required or even a terribly
important factor in if it should be committed.
I'll refrain from casting stones about issues brought up nearly a year
ago on certain other patches which are apparently not going to include
what I, at least, consider extremely important to PG acceptance by
others.
Thanks,
Stephen
Simon Riggs wrote:
I gave you a quick response to let you know that HS need not be a
blocker, for this release. If you are saying you have knowingly
ignored a requirement for a whole year, then I am shocked. How
exactly did you think this would ever be committed?
I was asked not to discuss this effort on list for most of that time,
and while it was on the Wiki page, I just lost track of it -- not
maliciously or intentionally. I really apologize. By the time the
9.0 release was out and it was deemed OK for me to discuss things, I
started getting feedback on problems which needed response, and I got
into the mode of reacting to that rather than ticking through my
issues list.
-Kevin
Import Notes
Resolved by subject fallback
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
If we don't do something like this, do we just provide REPEATABLE
READ on the standby as the strictest level of transaction isolation?
If so, do we generate an error on a request for SERIALIZABLE, warn
and provide degraded behavior, or just quietly give them REPEATABLE
READ behavior?Thoughts?
Hopefully there is a better option available. We don't want to silently
give wrong results.
Maybe we should bring back the compatibility GUC? It could throw an
error unless the user sets the compatibility GUC to turn "serializable"
into "repeatable read".
Regards,
Jeff Davis
On 20.01.2011 03:05, Kevin Grittner wrote:
If we don't do something like this, do we just provide REPEATABLE
READ on the standby as the strictest level of transaction isolation?
If so, do we generate an error on a request for SERIALIZABLE, warn
and provide degraded behavior, or just quietly give them REPEATABLE
READ behavior?
+1 for generating an error.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 20.01.2011 03:05, Kevin Grittner wrote:
If we don't do something like this, do we just provide REPEATABLE
READ on the standby as the strictest level of transaction
isolation? If so, do we generate an error on a request for
SERIALIZABLE, warn and provide degraded behavior, or just quietly
give them REPEATABLE READ behavior?+1 for generating an error.
Before I go do that, I want to be sure everyone is clear about the
state of things.
If SSI is used to provide data integrity on the master, it will
prevent any serialization anomalies from being persisted on any hot
standby *long term*. For example, at any point where the standby is
at a point in the transaction stream where there were no read/write
transaction active, no anomalies can be observed. (That isn't the
*only* time; it's just the simplest one to describe as an example.)
Queries on the standby can, however, see *transient* anomalies when
they run queries which would cause a serialization failure if run on
the master at the same point in the transaction stream. This can
only occur when, of two concurrent transactions, the one which
*appears* to run second because the other can't read what it wrote,
*commits* first.
The most common and alarming situation where this occurs, in my
opinion, is batch processing. This is extremely common in financial
applications, and tends to show up in a lot of other places, too.
(The receipting query set is an instance of this type of problem,
but I'm going to keep it more general in hopes that people can see
where it impacts them.) Imagine an application which has some small
control record in a table, and inserts to some other table are
assigned to a batch based on the control record. The batches are
normally identified by ascending dates or serial numbers.
Periodically a new batch is opened and the old batch is closed by
updating a "current batch id" column in the control table. If the
batch ID is updated and the transaction in which that update was
executed commits while a transaction which read the old batch ID is
still in flight, a read of the database will show that the batch is
closed, but if you look at the detail of the batch, it will not yet
be complete.
Under SSI, one of these transactions will be canceled to prevent
this. Our implementation will always allow the update which closes
the batch to complete, and either the insert or the select of the
detail will be rolled back with a serialization failure, depending
on the timing the actions inside those transactions. If the insert
fails, it can be retried, and will land in the new batch -- making
the list of the batch which omits it OK. If the listing of the
batch details is canceled, it will be because the insert into the
old batch committed before it recognized the problem, so an
immediate retry of the select will see the complete batch contents.
A hot standby can't really take part in the predicate locking and
transaction cancellation on the master.
Dan and I have both come to the conclusion that the only reasonable
way to allow hot standby to work with SSI is for the WAL (when
wal_level = hot_standby) to contain information about which
snapshots develop which won't see such a state. In the above
example, barring some throttling mechanism skipping these particular
snapshots, or other problematic conflicts around the same time, the
master would tell the standby that the snapshot before either of the
two problem transactions was OK, and then it would tell them that
the snapshot after both had committed was OK. It would not suggest
using the snapshot available between the commit of the control
record update and the commit of the insert into the batch.
This seems to me to be not completely unrelated to the snapshot
synchronization patch. It is clearly closely related to the READ
ONLY DEFERRABLE mode, which also looks for a snapshot which is
immune to serialization anomalies without predicate locking,
conflict detection, transaction cancellation, etc. Melding these
two things with hot standby seems to be beyond what can reasonably
happen for 9.1 without delaying the release.
If someone is using one feature and not the other, they really don't
have a problem. Like anyone else, if a hot standby user has been
using SERIALIZABLE mode under 9.0 or earlier, they will need to
switch to REPEATABLE READ. A SERIALIZABLE user who doesn't set up
hot standby has no issue. Opinions so far seem to be in favor of
reporting an error on the standby if SERIALIZABLE is requested, so
that people don't silently get less protection than they expect.
The most annoying thing about that is that if the use would *like*
to use truly serializable transactions on the standby, and will do
so when they get it in 9.2, they must switch to REPEATABLE READ now,
and switch back to SERIALIZABLE with the next release.
So, based on a more complete description of the issues, any more
opinions on whether to generate the error, as suggested by Heikki?
Does anyone think this justifies the compatibility GUC as suggested
by Jeff? It seems to me that this deserved documentation in the
MVCC chapter under both the "Serializable Isolation Level" and
"Enforcing Consistency With Serializable Transactions" sections. I
think it probably deserves a note in the SET TRANSACTION reference
page, too. Agreed? Anywhere else?
-Kevin
Kevin,
So, based on a more complete description of the issues, any more
opinions on whether to generate the error, as suggested by Heikki?
If it's a choice between generating an error and letting users see
inconsistent data, I'll take the former.
Does anyone think this justifies the compatibility GUC as suggested
by Jeff?
I think it might, yes. Since someone could simply turn on the backwards
compatibility flag for 9.1 and turn it off for 9.2, rather than trying
to mess with transaction states which might be set in application code.
Unfortunately, people have not responded to our survey :-(
http://www.postgresql.org/community/survey.77
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
The idea is that whenever we see a valid snapshot which would yield
a truly serializable view of the data for a READ ONLY transaction,
we add a WAL record with that snapshot information.
You haven't explained why this approach is the way forwards. What other
options have been ruled out, and why. The above approach doesn't sound
particularly viable to me.
It's not clear to me what the reason is that this doesn't just work on
HS already. If you started there it might help.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> wrote:
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
The idea is that whenever we see a valid snapshot which would
yield a truly serializable view of the data for a READ ONLY
transaction, we add a WAL record with that snapshot information.You haven't explained why this approach is the way forwards. What
other options have been ruled out, and why. The above approach
doesn't sound particularly viable to me.
Why not? We already generate appropriate snapshots for this in SSI,
so is the problem in getting the appropriate information into the
WAL stream or in having a request for a snapshot within a
serializable transaction while running in hot standby the problem?
It's not clear to me what the reason is that this doesn't just
work on HS already. If you started there it might help.
Because the standby would need to bombard the server with a stream
of predicate lock information, we would need to allow transactions
on the master to be canceled do in part to activity on the standby,
and I don't even know how you would begin to track read/write
conflicts between transactions on two different clusters.
If any of that didn't make sense, it would probably be more
efficient for everyone involved if those interested browsed the
Overview section of the Wiki page than to have me duplicate its
contents in a post.
http://wiki.postgresql.org/wiki/Serializable
-Kevin
On Jan21, 2011, at 00:11 , Simon Riggs wrote:
It's not clear to me what the reason is that this doesn't just work on
HS already. If you started there it might help.
The problem is that snapshots taken on the master sometimes represent a
state of the database which cannot occur under any (valid) serial schedule.
Hence, if you use that snapshot to read the *whole* database, you've
surely violated serializability. If you read only parts of the database,
things may or may not be fine, depending on the parts you read.
To have the same stringent guarantees that SERIALIZABLE provides on the
master also for queries run against the slave, you somehow need to prevent
this. The easiest way is to only use snapshots on the slave which *cannot*
produce such anomalies. We already know now to generate such snapshots -
SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question
is mainly how to transfer such snapshots to the slave, and how often we
transmit a new one.
best regards,
Florian Pflug
I wrote:
Why not? We already generate appropriate snapshots for this in
SSI, so is the problem in getting the appropriate information into
the WAL stream or in having a request for a snapshot within a
serializable transaction while running in hot standby the problem?
I dropped few words.
That was supposed to ask whether the problem was in getting hot
standby to *use such a snapshot*.
I'm open to other suggestions on how else we might do this. I don't
see any alternatives, but maybe you're seeing some possibility that
eludes me.
-Kevin
Simon Riggs <simon@2ndQuadrant.com> writes:
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
The idea is that whenever we see a valid snapshot which would yield
a truly serializable view of the data for a READ ONLY transaction,
we add a WAL record with that snapshot information.
You haven't explained why this approach is the way forwards. What other
options have been ruled out, and why. The above approach doesn't sound
particularly viable to me.
I'm pretty concerned about the performance implications, too. In
particular that sounds like you could get an unbounded amount of WAL
emitted from a *purely read only* transaction flow. Which is not
going to fly.
regards, tom lane
On Fri, 2011-01-21 at 00:26 +0100, Florian Pflug wrote:
On Jan21, 2011, at 00:11 , Simon Riggs wrote:
It's not clear to me what the reason is that this doesn't just work on
HS already. If you started there it might help.The problem is that snapshots taken on the master sometimes represent a
state of the database which cannot occur under any (valid) serial schedule.
Hence, if you use that snapshot to read the *whole* database, you've
surely violated serializability. If you read only parts of the database,
things may or may not be fine, depending on the parts you read.To have the same stringent guarantees that SERIALIZABLE provides on the
master also for queries run against the slave, you somehow need to prevent
this. The easiest way is to only use snapshots on the slave which *cannot*
produce such anomalies. We already know now to generate such snapshots -
SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question
is mainly how to transfer such snapshots to the slave, and how often we
transmit a new one.
Thank you for explaining a little more.
What I'm still not clear on is why that HS is different. Whatever rules
apply on the master must also apply on the standby, immutably. Why is it
we need to pass explicit snapshot information from master to standby? We
don't do that, except at startup for normal HS. Why do we need that?
I hear, but do not yet understand, that the SSI transaction sequence on
the master may differ from the WAL transaction sequence. Is it important
that the ordering on the master would differ from the standby?
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
Tom Lane wrote:
I'm pretty concerned about the performance implications, too. In
particular that sounds like you could get an unbounded amount of
WAL emitted from a *purely read only* transaction flow.
No. Read only transactions wouldn't create any flow at all. And I
suggested that we might want some kind of throttle on how often we
generate snapshots even from the read write transactions. I'm not
at all clear on how you got to the concerns you have. Is there
something in particular I could clear up for you that isn't already
mentioned in the previous emails?
-Kevin
Import Notes
Resolved by subject fallback