SSI patch version 14

Started by Kevin Grittnerover 15 years ago119 messageshackers

Kevin.Grittner@wicourts.gov

over 15 years ago

Attached.

Dan and I have spent many, many hours desk-check and testing,
including pounding for many hours in DBT-2 at high concurrency
factors on a 16 processor box. In doing so, we found and fixed a few
issues. Neither of us is aware of any bugs or deficiencies
remaining, even after a solid day of pounding in DBT-2, other than
the failure to extend any new functionality to hot standby.

At Heikki's suggestion I have included a test to throw an error on an
attempt to switch to serializable mode during recovery. Anything
more to address that issue can be a small follow-up patch -- probably
mainly a few notes in the docs.

-Kevin

Jeff Davis

pgsql@j-davis.com

over 15 years ago

In reply to: Kevin Grittner (#1)

Re: SSI patch version 14

On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote:

Attached.

Dan and I have spent many, many hours desk-check and testing,
including pounding for many hours in DBT-2 at high concurrency
factors on a 16 processor box. In doing so, we found and fixed a few
issues. Neither of us is aware of any bugs or deficiencies
remaining, even after a solid day of pounding in DBT-2, other than
the failure to extend any new functionality to hot standby.

At Heikki's suggestion I have included a test to throw an error on an
attempt to switch to serializable mode during recovery. Anything
more to address that issue can be a small follow-up patch -- probably
mainly a few notes in the docs.

Here is my first crack at a review:

First of all, I am very impressed with the patch. I was pleased to see
that both READ ONLY DEFERRABLE and 2PC work! Also, I found the patch
very usable and did not encounter any bugs or big surprises.

Second, I do not understand this patch entirely, so the following
statements can be considered questions as much as answers.

At a high level, there is a nice conceptual simplicity. Let me try to
summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction will
result in a cycle (and therefore a serialization anomaly); and if
so, abort it.

That's where the simplicity ends, however ;)

For starters, the above structures require unlimited memory, while we
have fixed amounts of memory. The predicate locks are referenced by a
global shared hash table as well as per-process SHMQueues. It can adapt
memory usage downward in three ways:
* increase lock granularity -- e.g. change N page locks into a
table lock
* "summarization" -- fold multiple locks on the same object across
many old committed transactions into a single lock
* use the SLRU

Tracking of RW conflicts of current and past transactions is more
complex. Obviously, it doesn't keep _all_ past transactions, but only
ones that overlap with a currently-running transaction. It does all of
this management using SHMQueue. There isn't much of an attempt to
gracefully handle OOM here as far as I can tell, it just throws an error
if there's not enough room to track a new transaction (which is
reasonable, considering that it should be quite rare and can be
mitigated by increasing max_connections).

The heavy use of SHMQueue is quite reasonable, but for some reason I
find the API very difficult to read. I think it would help (at least me)
quite a lot to annotate (i.e. with a comment in the struct) the various
lists and links with the types of data being held.

The actual checking of conflicts isn't quite so simple, either, because
we want to detect problems before the victim transaction has done too
much work. So, checking is done along the way in several places, and
it's a little difficult to follow exactly how those smaller checks add
up to the overall serialization-anomaly check (the third point in my
simple model).

There are also optimizations around transactions declared READ ONLY.
Some of these are a little difficult to follow as well, and I haven't
followed them all.

There is READ ONLY DEFERRABLE, which is a nice feature that waits for a
"safe" snapshot, so that the transaction will never be aborted.

Now, on to my code comments (non-exhaustive):

* I see that you use a union as well as using "outLinks" to also be a
free list. I suppose you did this to conserve shared memory space,
right?

* In RegisterSerializableTransactionInt(), for a RO xact, it considers
any concurrent RW xact a possible conflict. It seems like it would be
possible to know whether a RW transaction may have overlapped with any
committed RW transaction (in finishedLink queue), and only add those as
potential conflicts. Would that work? If so, that would make more
snapshots safe.

* When a transaction finishes, then PID should probably be set to zero.
You only use it for waking up a deferrable RO xact waiting for a
snapshot, right?

* Still some compiler warnings:
twophase.c: In function ‘FinishPreparedTransaction’:
twophase.c:1360: warning: implicit declaration of function
‘PredicateLockTwoPhaseFinish’

* You have a function called CreatePredTran. We are already using
"Transaction" and "Xact" -- we probably don't need "Tran" as well.

* HS error has a typo:
"ERROR: cannot set transaction isolationn level to serializable during
recovery"

* I'm a little unclear on summarization and writing to the SLRU. I don't
see where it's determining that the predicate locks can be safely
released. Couldn't the oldest transaction still have relevant predicate
locks?

* In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
goes into summarization. But summarization assumes that it has at least
one finished xact. Is that always true? If you have enough memory to
hold a transaction for each connection, plus max_prepared_xacts, plus
one, I think that's true. But maybe that could be made more clear?

I'll keep working on this patch. I hope I can be of some help getting
this committed, because I'm looking forward to this feature. And I
certainly think that you and Dan have applied the amount of planning,
documentation, and careful implementation necessary for a feature like
this.

Regards,
Jeff Davis

Dan Ports

drkp@csail.mit.edu

over 15 years ago

In reply to: Jeff Davis (#2)

Re: SSI patch version 14

Thanks for working your way through this patch. I'm certainly well
aware that that's not a trivial task!

I'm suffering through a bout of insomnia, so I'll respond to some of
your high-level comments in hopes that serializability will help put me
to sleep (as it often does). I'll leave the more detailed code comments
for later when I'm actually looking at the code, or better yet Kevin
will take care of them and I won't have to. ;-)

On Tue, Jan 25, 2011 at 01:07:39AM -0800, Jeff Davis wrote:

At a high level, there is a nice conceptual simplicity. Let me try to
summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction will
result in a cycle (and therefore a serialization anomaly); and if
so, abort it.

This summary is right on. I would add one additional detail or
clarification to the last point, which is that rather than checking for
a cycle, we're checking for a transaction with both "in" and "out"
conflicts, which every cycle must contain.

That's where the simplicity ends, however ;)

Indeed!

Tracking of RW conflicts of current and past transactions is more
complex. Obviously, it doesn't keep _all_ past transactions, but only
ones that overlap with a currently-running transaction. It does all of
this management using SHMQueue. There isn't much of an attempt to
gracefully handle OOM here as far as I can tell, it just throws an error
if there's not enough room to track a new transaction (which is
reasonable, considering that it should be quite rare and can be
mitigated by increasing max_connections).

If the OOM condition you're referring to is the same one from the
following comment, then it can't happen: (Apologies if I've
misunderstood what you're referring to.)

* In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
goes into summarization. But summarization assumes that it has at least
one finished xact. Is that always true? If you have enough memory to
hold a transaction for each connection, plus max_prepared_xacts, plus
one, I think that's true. But maybe that could be made more clear?

Yes -- the SerializableXact pool is allocated up front and it
definitely has to be bigger than the number of possible active
transactions. In fact, it's much larger: 10 * (MaxBackends +
max_prepared_xacts) to allow some room for the committed transactions
we still have to track.

* In RegisterSerializableTransactionInt(), for a RO xact, it considers
any concurrent RW xact a possible conflict. It seems like it would be
possible to know whether a RW transaction may have overlapped with any
committed RW transaction (in finishedLink queue), and only add those as
potential conflicts. Would that work? If so, that would make more
snapshots safe.

Interesting idea. That's worth some careful thought. I think it's
related to the condition that the RW xact needs to commit with a
conflict out to a transaction earlier than the RO xact. My first guess
is that this wouldn't make more transactions safe, but could detect
safe snapshots faster.

* When a transaction finishes, then PID should probably be set to zero.
You only use it for waking up a deferrable RO xact waiting for a
snapshot, right?

Correct. It probably wouldn't hurt to clear that field when releasing
the transaction, but we don't use it after.

* I'm a little unclear on summarization and writing to the SLRU. I don't
see where it's determining that the predicate locks can be safely
released. Couldn't the oldest transaction still have relevant predicate
locks?

When a SerializableXact gets summarized to the SLRU, its predicate
locks aren't released; they're transferred to the dummy
OldCommittedSxact.

I'll keep working on this patch. I hope I can be of some help getting
this committed, because I'm looking forward to this feature. And I
certainly think that you and Dan have applied the amount of planning,
documentation, and careful implementation necessary for a feature like
this.

Hopefully my comments here will help clarify the patch. It's not lost
on me that there's no shortage of complexity in the patch, so if you
found anything particularly confusing we should probably add some
documentation to README-SSI.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

SSI patch version 14

Attachments: