SSI patch version 14

Started by Kevin Grittnerabout 15 years ago119 messageshackers
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

Attached.

Dan and I have spent many, many hours desk-check and testing,
including pounding for many hours in DBT-2 at high concurrency
factors on a 16 processor box. In doing so, we found and fixed a few
issues. Neither of us is aware of any bugs or deficiencies
remaining, even after a solid day of pounding in DBT-2, other than
the failure to extend any new functionality to hot standby.

At Heikki's suggestion I have included a test to throw an error on an
attempt to switch to serializable mode during recovery. Anything
more to address that issue can be a small follow-up patch -- probably
mainly a few notes in the docs.

-Kevin

Attachments:

ssi-14.patch.gzapplication/octet-stream; name=ssi-14.patch.gzDownload+1-2
#2Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#1)
Re: SSI patch version 14

On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote:

Attached.

Dan and I have spent many, many hours desk-check and testing,
including pounding for many hours in DBT-2 at high concurrency
factors on a 16 processor box. In doing so, we found and fixed a few
issues. Neither of us is aware of any bugs or deficiencies
remaining, even after a solid day of pounding in DBT-2, other than
the failure to extend any new functionality to hot standby.

At Heikki's suggestion I have included a test to throw an error on an
attempt to switch to serializable mode during recovery. Anything
more to address that issue can be a small follow-up patch -- probably
mainly a few notes in the docs.

Here is my first crack at a review:

First of all, I am very impressed with the patch. I was pleased to see
that both READ ONLY DEFERRABLE and 2PC work! Also, I found the patch
very usable and did not encounter any bugs or big surprises.

Second, I do not understand this patch entirely, so the following
statements can be considered questions as much as answers.

At a high level, there is a nice conceptual simplicity. Let me try to
summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction will
result in a cycle (and therefore a serialization anomaly); and if
so, abort it.

That's where the simplicity ends, however ;)

For starters, the above structures require unlimited memory, while we
have fixed amounts of memory. The predicate locks are referenced by a
global shared hash table as well as per-process SHMQueues. It can adapt
memory usage downward in three ways:
* increase lock granularity -- e.g. change N page locks into a
table lock
* "summarization" -- fold multiple locks on the same object across
many old committed transactions into a single lock
* use the SLRU

Tracking of RW conflicts of current and past transactions is more
complex. Obviously, it doesn't keep _all_ past transactions, but only
ones that overlap with a currently-running transaction. It does all of
this management using SHMQueue. There isn't much of an attempt to
gracefully handle OOM here as far as I can tell, it just throws an error
if there's not enough room to track a new transaction (which is
reasonable, considering that it should be quite rare and can be
mitigated by increasing max_connections).

The heavy use of SHMQueue is quite reasonable, but for some reason I
find the API very difficult to read. I think it would help (at least me)
quite a lot to annotate (i.e. with a comment in the struct) the various
lists and links with the types of data being held.

The actual checking of conflicts isn't quite so simple, either, because
we want to detect problems before the victim transaction has done too
much work. So, checking is done along the way in several places, and
it's a little difficult to follow exactly how those smaller checks add
up to the overall serialization-anomaly check (the third point in my
simple model).

There are also optimizations around transactions declared READ ONLY.
Some of these are a little difficult to follow as well, and I haven't
followed them all.

There is READ ONLY DEFERRABLE, which is a nice feature that waits for a
"safe" snapshot, so that the transaction will never be aborted.

Now, on to my code comments (non-exhaustive):

* I see that you use a union as well as using "outLinks" to also be a
free list. I suppose you did this to conserve shared memory space,
right?

* In RegisterSerializableTransactionInt(), for a RO xact, it considers
any concurrent RW xact a possible conflict. It seems like it would be
possible to know whether a RW transaction may have overlapped with any
committed RW transaction (in finishedLink queue), and only add those as
potential conflicts. Would that work? If so, that would make more
snapshots safe.

* When a transaction finishes, then PID should probably be set to zero.
You only use it for waking up a deferrable RO xact waiting for a
snapshot, right?

* Still some compiler warnings:
twophase.c: In function ‘FinishPreparedTransaction’:
twophase.c:1360: warning: implicit declaration of function
‘PredicateLockTwoPhaseFinish’

* You have a function called CreatePredTran. We are already using
"Transaction" and "Xact" -- we probably don't need "Tran" as well.

* HS error has a typo:
"ERROR: cannot set transaction isolationn level to serializable during
recovery"

* I'm a little unclear on summarization and writing to the SLRU. I don't
see where it's determining that the predicate locks can be safely
released. Couldn't the oldest transaction still have relevant predicate
locks?

* In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
goes into summarization. But summarization assumes that it has at least
one finished xact. Is that always true? If you have enough memory to
hold a transaction for each connection, plus max_prepared_xacts, plus
one, I think that's true. But maybe that could be made more clear?

I'll keep working on this patch. I hope I can be of some help getting
this committed, because I'm looking forward to this feature. And I
certainly think that you and Dan have applied the amount of planning,
documentation, and careful implementation necessary for a feature like
this.

Regards,
Jeff Davis

#3Dan Ports
drkp@csail.mit.edu
In reply to: Jeff Davis (#2)
Re: SSI patch version 14

Thanks for working your way through this patch. I'm certainly well
aware that that's not a trivial task!

I'm suffering through a bout of insomnia, so I'll respond to some of
your high-level comments in hopes that serializability will help put me
to sleep (as it often does). I'll leave the more detailed code comments
for later when I'm actually looking at the code, or better yet Kevin
will take care of them and I won't have to. ;-)

On Tue, Jan 25, 2011 at 01:07:39AM -0800, Jeff Davis wrote:

At a high level, there is a nice conceptual simplicity. Let me try to
summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction will
result in a cycle (and therefore a serialization anomaly); and if
so, abort it.

This summary is right on. I would add one additional detail or
clarification to the last point, which is that rather than checking for
a cycle, we're checking for a transaction with both "in" and "out"
conflicts, which every cycle must contain.

That's where the simplicity ends, however ;)

Indeed!

Tracking of RW conflicts of current and past transactions is more
complex. Obviously, it doesn't keep _all_ past transactions, but only
ones that overlap with a currently-running transaction. It does all of
this management using SHMQueue. There isn't much of an attempt to
gracefully handle OOM here as far as I can tell, it just throws an error
if there's not enough room to track a new transaction (which is
reasonable, considering that it should be quite rare and can be
mitigated by increasing max_connections).

If the OOM condition you're referring to is the same one from the
following comment, then it can't happen: (Apologies if I've
misunderstood what you're referring to.)

* In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
goes into summarization. But summarization assumes that it has at least
one finished xact. Is that always true? If you have enough memory to
hold a transaction for each connection, plus max_prepared_xacts, plus
one, I think that's true. But maybe that could be made more clear?

Yes -- the SerializableXact pool is allocated up front and it
definitely has to be bigger than the number of possible active
transactions. In fact, it's much larger: 10 * (MaxBackends +
max_prepared_xacts) to allow some room for the committed transactions
we still have to track.

* In RegisterSerializableTransactionInt(), for a RO xact, it considers
any concurrent RW xact a possible conflict. It seems like it would be
possible to know whether a RW transaction may have overlapped with any
committed RW transaction (in finishedLink queue), and only add those as
potential conflicts. Would that work? If so, that would make more
snapshots safe.

Interesting idea. That's worth some careful thought. I think it's
related to the condition that the RW xact needs to commit with a
conflict out to a transaction earlier than the RO xact. My first guess
is that this wouldn't make more transactions safe, but could detect
safe snapshots faster.

* When a transaction finishes, then PID should probably be set to zero.
You only use it for waking up a deferrable RO xact waiting for a
snapshot, right?

Correct. It probably wouldn't hurt to clear that field when releasing
the transaction, but we don't use it after.

* I'm a little unclear on summarization and writing to the SLRU. I don't
see where it's determining that the predicate locks can be safely
released. Couldn't the oldest transaction still have relevant predicate
locks?

When a SerializableXact gets summarized to the SLRU, its predicate
locks aren't released; they're transferred to the dummy
OldCommittedSxact.

I'll keep working on this patch. I hope I can be of some help getting
this committed, because I'm looking forward to this feature. And I
certainly think that you and Dan have applied the amount of planning,
documentation, and careful implementation necessary for a feature like
this.

Hopefully my comments here will help clarify the patch. It's not lost
on me that there's no shortage of complexity in the patch, so if you
found anything particularly confusing we should probably add some
documentation to README-SSI.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#3)
Re: SSI patch version 14

Thanks for the review, Jeff!

Dan Ports wrote:

On Tue, Jan 25, 2011 at 01:07:39AM -0800, Jeff Davis wrote:

At a high level, there is a nice conceptual simplicity. Let me
try to summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction
will result in a cycle (and therefore a serialization anomaly);
and if so, abort it.

This summary is right on. I would add one additional detail or
clarification to the last point, which is that rather than
checking for a cycle, we're checking for a transaction with both
"in" and "out" conflicts, which every cycle must contain.

Yep. For the visual thinkers out there, the whole concept can be
understood by looking at the jpeg file that's in the Wiki page:

http://wiki.postgresql.org/images/e/eb/Serialization-Anomalies-in-Snapshot-Isolation.png

* In RegisterSerializableTransactionInt(), for a RO xact, it
considers any concurrent RW xact a possible conflict. It seems
like it would be possible to know whether a RW transaction may
have overlapped with any committed RW transaction (in
finishedLink queue), and only add those as potential conflicts.
Would that work? If so, that would make more snapshots safe.

Interesting idea. That's worth some careful thought. I think it's
related to the condition that the RW xact needs to commit with a
conflict out to a transaction earlier than the RO xact. My first
guess is that this wouldn't make more transactions safe, but could
detect safe snapshots faster.

Yes, that would work. It would lower one type of overhead a little
and allow RO transactions to be released from SSI tracking earlier.
The question is how to determine it without taking too much time
scanning the finished transaction list for every active read write
transaction every time you start a RO transaction. I don't think
that it's a trivial enough issue to consider for 9.1; it's certainly
one to put on the list to look at for 9.2.

* When a transaction finishes, then PID should probably be set to
zero. You only use it for waking up a deferrable RO xact waiting
for a snapshot, right?

Correct. It probably wouldn't hurt to clear that field when
releasing the transaction, but we don't use it after.

I thought they might show up in the pid column of pg_locks, but I
see they don't. Should they? If so, should we still see the pid
after a commit, for as long as the lock survives?

-Kevin

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#2)
Re: SSI patch version 14

Jeff Davis <pgsql@j-davis.com> wrote:

For starters, the above structures require unlimited memory, while
we have fixed amounts of memory. The predicate locks are
referenced by a global shared hash table as well as per-process
SHMQueues. It can adapt memory usage downward in three ways:
* increase lock granularity -- e.g. change N page locks into a
table lock
* "summarization" -- fold multiple locks on the same object
across many old committed transactions into a single lock
* use the SLRU

Those last two are related -- the summarization process takes the
oldest committed-but-still-significant transaction and does two
things with it:

(1) We move predicate locks to the "dummy" transaction, kept just
for this purpose. We combine locks against the same lock target,
using the more recent commit point to determine when the resulting
lock can be removed.

(2) We use SLRU to keep track of the top level xid of the old
committed transaction, and the earliest commit point of any
transactions to which it had an out-conflict.

The above reduces the information available to avoid serialization
failure in certain corner cases, and is more expensive to access
than the other structures, but it keeps us running in pessimal
circumstances, even if at a degraded level of performance.

The heavy use of SHMQueue is quite reasonable, but for some reason
I find the API very difficult to read. I think it would help (at
least me) quite a lot to annotate (i.e. with a comment in the
struct) the various lists and links with the types of data being
held.

We've tried to comment enough, but when you have your head buried in
code you don't always recognize how mysterious something can look.
Can you suggest some particular places where more comments would be
helpful?

The actual checking of conflicts isn't quite so simple, either,
because we want to detect problems before the victim transaction
has done too much work. So, checking is done along the way in
several places, and it's a little difficult to follow exactly how
those smaller checks add up to the overall serialization-anomaly
check (the third point in my simple model).

There are also optimizations around transactions declared READ
ONLY. Some of these are a little difficult to follow as well, and
I haven't followed them all.

Yeah. After things were basically working correctly, in terms of
not allowing any anomalies, we still had a lot of false positives.
Checks around the order and timing of commits, as well as read-only
status, helped bring these down. The infamous receipting example
was my main guide here. There are 210 permutations in the way the
statements can be interleaved, of which only six can result in
anomalies. At first we only managed to commit a couple dozen
permutations. As we added code to cover optimizations for various
special cases the false positive rate dropped a little at a time,
until that test hit 204 commits, six rollbacks. Although, all the
tests in the dcheck target are useful -- if I made a mistake in
implementing an optimization there would sometimes be just one or
two of the other tests which would fail. Looking at which
permutations got it right and which didn't was a really good way to
figure out what I did wrong.

There is READ ONLY DEFERRABLE, which is a nice feature that waits
for a "safe" snapshot, so that the transaction will never be
aborted.

*And* will not contribute to causing any other transaction to be
rolled back, *and* dodges the overhead of predicate locking and
conflict checking. Glad you like it! ;-)

Now, on to my code comments (non-exhaustive):

* I see that you use a union as well as using "outLinks" to also
be a free list. I suppose you did this to conserve shared memory
space, right?

Yeah, we tried to conserve shared memory space where we could do so
without hurting performance. Some of it gets to be a little bit-
twiddly, but it seemed like a good idea at the time. Does any of it
seem like it creates a confusion factor which isn't worth it
compared to the shared memory savings?

* Still some compiler warnings:
twophase.c: In function *FinishPreparedTransaction*:
twophase.c:1360: warning: implicit declaration of function
*PredicateLockTwoPhaseFinish*

Ouch! That could cause bugs, since the implicit declaration didn't
actually *match* the actual definition. Don't know how we missed
that. I strongly recommend that anyone who wants to test 2PC with
the patch add this commit to it:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=8e020d97bc7b8c72731688515b6d895f7e243e27

* You have a function called CreatePredTran. We are already using
"Transaction" and "Xact" -- we probably don't need "Tran" as well.

OK. Will rename if you like. Since that creates the PredTran
structure, I assume you want that renamed, too?

* HS error has a typo:
"ERROR: cannot set transaction isolationn level to serializable
during recovery"

Will fix.

I'll keep working on this patch. I hope I can be of some help
getting this committed, because I'm looking forward to this
feature. And I certainly think that you and Dan have applied the
amount of planning, documentation, and careful implementation
necessary for a feature like this.

Thanks much! This effort was driven, for my part, by the needs of
my employer; but I have to admit it was kinda fun to do some serious
coding on innovative ideas again. It's been a while. I'm ready to
kick back and party a bit when this gets done, though. ;-)

-Kevin

#6Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#5)
Re: SSI patch version 14

On Tue, 2011-01-25 at 11:17 -0600, Kevin Grittner wrote:

The heavy use of SHMQueue is quite reasonable, but for some reason
I find the API very difficult to read. I think it would help (at
least me) quite a lot to annotate (i.e. with a comment in the
struct) the various lists and links with the types of data being
held.

We've tried to comment enough, but when you have your head buried in
code you don't always recognize how mysterious something can look.
Can you suggest some particular places where more comments would be
helpful?

I think just annotating RWConflict.in/outLink and
PredTranList.available/activeList with the types of things they hold
would be a help.

Also, you say something about RWConflict and "when the structure is not
in use". Can you elaborate on that a bit?

I'll address the rest of your comments in a later email.

Regards,
Jeff Davis

#7Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#4)
Re: SSI patch version 14

On Tue, 2011-01-25 at 09:41 -0600, Kevin Grittner wrote:

Yep. For the visual thinkers out there, the whole concept can be
understood by looking at the jpeg file that's in the Wiki page:

http://wiki.postgresql.org/images/e/eb/Serialization-Anomalies-in-Snapshot-Isolation.png

Yes, that helped a lot throughout the review process. Good job keeping
it up-to-date!

Yes, that would work. It would lower one type of overhead a little
and allow RO transactions to be released from SSI tracking earlier.
The question is how to determine it without taking too much time
scanning the finished transaction list for every active read write
transaction every time you start a RO transaction. I don't think
that it's a trivial enough issue to consider for 9.1; it's certainly
one to put on the list to look at for 9.2.

It's OK to leave it to 9.2. But if it's a RO deferrable transaction,
it's just going to go to sleep in that case anyway; so why not look for
an opportunity to get a safe snapshot right away?

Regards,
Jeff Davis

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#1)
Re: SSI patch version 14

On 25.01.2011 05:30, Kevin Grittner wrote:

Attached.

The readme says this:

+   4. PostgreSQL supports subtransactions -- an issue not mentioned
+in the papers.

But I don't see any mention anywhere else on how subtransactions are
handled. If a subtransaction aborts, are its predicate locks immediately
released?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#8)
Re: SSI patch version 14

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 25.01.2011 05:30, Kevin Grittner wrote:

The readme says this:

4. PostgreSQL supports subtransactions -- an issue not mentioned
in the papers.

But I don't see any mention anywhere else on how subtransactions
are handled. If a subtransaction aborts, are its predicate locks
immediately released?

No. Here's the reasoning. Within a top level transaction, you
might start a subtransaction, read some data, and then decide based
on what you read that the subtransaction should be rolled back. If
the decision as to what is part of the top level transaction can
depend on what is read in the subtransaction, predicate locks taken
by the subtransaction must survive rollback of the subtransaction.

Does that make sense to you? Is there somewhere you would like to
see that argument documented?

-Kevin

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#6)
Re: SSI patch version 14

Jeff Davis <pgsql@j-davis.com> wrote:

I think just annotating RWConflict.in/outLink and
PredTranList.available/activeList with the types of things they
hold would be a help.

Also, you say something about RWConflict and "when the structure
is not in use". Can you elaborate on that a bit?

Please let me know whether this works for you:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=325ec55e8c9e5179e2e16ff303af6afc1d6e732b

-Kevin

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#7)
Re: SSI patch version 14

Jeff Davis <pgsql@j-davis.com> wrote:

It's OK to leave it to 9.2. But if it's a RO deferrable
transaction, it's just going to go to sleep in that case anyway;
so why not look for an opportunity to get a safe snapshot right
away?

If you're talking about doing this only for DEFERRABLE transactions
it *might* make sense for 9.1. I'd need to look at what's involved.
We make similar checks for all read only transactions, so they can
withdraw from SSI while running, if their snapshot *becomes* safe.
I don't think I'd want to consider messing with that code at this
point.

-Kevin

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#9)
Re: SSI patch version 14

On 25.01.2011 22:53, Kevin Grittner wrote:

Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:

On 25.01.2011 05:30, Kevin Grittner wrote:

The readme says this:

4. PostgreSQL supports subtransactions -- an issue not mentioned
in the papers.

But I don't see any mention anywhere else on how subtransactions
are handled. If a subtransaction aborts, are its predicate locks
immediately released?

No. Here's the reasoning. Within a top level transaction, you
might start a subtransaction, read some data, and then decide based
on what you read that the subtransaction should be rolled back. If
the decision as to what is part of the top level transaction can
depend on what is read in the subtransaction, predicate locks taken
by the subtransaction must survive rollback of the subtransaction.

Does that make sense to you?

Yes, that's what I suspected. And I gather that all the data structures
in predicate.c work with top-level xids, not subxids. When looking at an
xid that comes from a tuple's xmin or xmax, for example, you always call
SubTransGetTopmostTransaction() before doing much else with it.

Is there somewhere you would like to
see that argument documented?

README-SSI .

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#1)
Re: SSI patch version 14

On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote:

Dan and I have spent many, many hours desk-check and testing,
including pounding for many hours in DBT-2 at high concurrency
factors on a 16 processor box. In doing so, we found and fixed a few
issues. Neither of us is aware of any bugs or deficiencies
remaining, even after a solid day of pounding in DBT-2, other than
the failure to extend any new functionality to hot standby.

A couple of comments on this.

I looked at the patch to begin a review and immediately saw "dtest".
There's no docs to explain what it is, but a few comments fill me in a
little more. Can we document that please? And/or explain why its an
essential part of this commit? Could we keep it out of core, or if not,
just commit that part separately? I notice the code is currently
copyright someone else than PGDG.

Pounding for hours on 16 CPU box sounds good. What diagnostics or
instrumentation are included with the patch? How will we know whether
pounding for hours is actually touching all relevant parts of code? I've
done such things myself only to later realise I wasn't actually testing
the right piece of code.

When this runs in production, how will we know whether SSI is stuck or
is consuming too much memory? e.g. Is there a dynamic view e.g.
pg_prepared_xacts, or is there a log mode log_ssi_impact, etc??

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#13)
Re: SSI patch version 14

On Wed, 2011-01-26 at 11:36 +0000, Simon Riggs wrote:

Pounding for hours on 16 CPU box sounds good. What diagnostics or
instrumentation are included with the patch? How will we know whether
pounding for hours is actually touching all relevant parts of code? I've
done such things myself only to later realise I wasn't actually testing
the right piece of code.

An example of this is the XIDCACHE_DEBUG code used in procarray.c to
validate TransactionIdIsInProgress().

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#12)
Re: SSI patch version 14

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 25.01.2011 22:53, Kevin Grittner wrote:

Is there somewhere you would like to
see that argument documented?

README-SSI .

Done (borrowing some of your language).

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=470abc51c5626cf3c7cbd734b1944342973d0d47

Let me know if you think more is needed.

-Kevin

#16Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#14)
Re: SSI patch version 14

Simon Riggs <simon@2ndQuadrant.com> wrote:

On Wed, 2011-01-26 at 11:36 +0000, Simon Riggs wrote:

Pounding for hours on 16 CPU box sounds good. What diagnostics or
instrumentation are included with the patch? How will we know
whether pounding for hours is actually touching all relevant
parts of code? I've done such things myself only to later realise
I wasn't actually testing the right piece of code.

An example of this is the XIDCACHE_DEBUG code used in procarray.c
to validate TransactionIdIsInProgress().

It isn't exactly equivalent, but on a conceptually similar note some
of the hours of DBT-2 pounding were done with #ifdef statements to
force code into code paths which are normally rarely used. We left
one of them in the codebase with the #define commented out, although
I know that's not strictly necessary. (It does provide a convenient
place to put a comment about what it's for, though.)

In looking at it just now, I noticed that after trying it in a
couple different places what was left in the repository was not the
optimal version for code coverage. I've put this back to the
version which did a better job, for reasons described in the commit
comment:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=8af1bc84318923ba0ec3d4413f374a3beb10bc70

Dan, did you have some others which should maybe be included?

I'm not sure I see any counts we could get from SSI which would be
useful beyond what we might get from a code coverage tool or
profiling, but I'm open to suggestions.

-Kevin

#17Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#13)
Re: SSI patch version 14

Simon Riggs <simon@2ndQuadrant.com> wrote:

I looked at the patch to begin a review and immediately saw
"dtest". There's no docs to explain what it is, but a few comments
fill me in a little more. Can we document that please? And/or
explain why its an essential part of this commit? Could we keep it
out of core, or if not, just commit that part separately? I notice
the code is currently copyright someone else than PGDG.

I'm including Markus on this reply, because he's the only one who
can address the copyright issue.

I will say that while the dcheck make target is not required for a
proper build, and the tests run for too long to consider including
this in the check target, I would not recommend that anybody hack on
SSI without regularly running these tests or some equivalent..

When I suggested breaking this out of the patch, everybody who spoke
up said not to do so. How the eventual committer commits it is of
course up to that person.

Pounding for hours on 16 CPU box sounds good. What diagnostics or
instrumentation are included with the patch? How will we know
whether pounding for hours is actually touching all relevant parts
of code? I've done such things myself only to later realise I
wasn't actually testing the right piece of code.

We've looked at distributions of failed transactions by ereport
statement. This confirms that we are indeed exercising the vast
majority of the code. See separate post for how we pushed execution
into the summarization path to test code related to that.

I do have some concern that the 2PC testing hasn't yet covered all
code paths. I don't see how the problem found by Jeff during review
could have survived thorough testing there.

When this runs in production, how will we know whether SSI is
stuck

Stuck? I'm not sure what you mean by that. Other than LW locking
(which I believe is always appropriately brief, with rules for order
of acquisition which should prevent deadlocks), the only blocking
introduced by SSI is when there is an explicit request for
DEFERRABLE READ ONLY transactions. Such a transaction may take a
while to start. Is that what you're talking about?

or is consuming too much memory?

Relevant shared memory is allocated at startup, with strategies for
living within that as suggested by Heikki and summarized in a recent
post by Jeff. It's theoretically possible to run out of certain
objects, in which case there is an ereport, but we haven't seen
anything like that since the mitigation and graceful degradation
changes were implemented.

e.g. Is there a dynamic view e.g. pg_prepared_xacts,

We show predicate locks in the pg_locks view with mode SIReadLock.

is there a log mode log_ssi_impact, etc??

Don't have that. What would you expect that to show?

-Kevin

#18Markus Wanner
markus@bluegap.ch
In reply to: Kevin Grittner (#17)
Re: SSI patch version 14

Kevin,

thanks for your heads-up.

On 01/26/2011 06:07 PM, Kevin Grittner wrote:

Simon Riggs <simon@2ndQuadrant.com> wrote:

I looked at the patch to begin a review and immediately saw
"dtest". There's no docs to explain what it is, but a few comments
fill me in a little more. Can we document that please? And/or
explain why its an essential part of this commit? Could we keep it
out of core, or if not, just commit that part separately? I notice
the code is currently copyright someone else than PGDG.

I'm including Markus on this reply, because he's the only one who
can address the copyright issue.

I certainly agree to change the copyright notice for my parts of the
code in Kevin's SSI to say "Copyright ... Postgres Global Development
Group".

However, it's also worth mentioning that 'make dcheck' requires my
dtester python package to be installed. See [1]dtester project site: http://www.bluegap.ch/projects/dtester/. I put that under the
Boost license, which seems very similar to the Postgres license.

When I suggested breaking this out of the patch, everybody who spoke
up said not to do so. How the eventual committer commits it is of
course up to that person.

If the committer decides not to commit the dtests, I'm happy to add
these dtests to the "official" postgres-dtests repository [2]postgres dtests: http://git.postgres-r.org/?p=postgres-dtests;a=summary. There I
could let it follow the development of dtester.

If Kevin's dtests gets committed, either dtester needs to be backwards
compatible or the Postgres sources need to follow development of
dtester, which I'm not satisfied with, yet. (However, note that I
didn't have any time to work on dtester, recently, so that is somewhat
hypothetical anyway).

If you have any more questions, I'm happy to help.

Regards

Markus Wanner

[1]: dtester project site: http://www.bluegap.ch/projects/dtester/
http://www.bluegap.ch/projects/dtester/

[2]: postgres dtests: http://git.postgres-r.org/?p=postgres-dtests;a=summary
http://git.postgres-r.org/?p=postgres-dtests;a=summary

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#17)
Re: SSI patch version 14

On Wed, 2011-01-26 at 11:07 -0600, Kevin Grittner wrote:

When this runs in production, how will we know whether SSI is
stuck

Stuck? I'm not sure what you mean by that. Other than LW locking
(which I believe is always appropriately brief, with rules for order
of acquisition which should prevent deadlocks), the only blocking
introduced by SSI is when there is an explicit request for
DEFERRABLE READ ONLY transactions. Such a transaction may take a
while to start. Is that what you're talking about?

I'm thinking of a general requirement for diagnostics.

What has been done so far looks great to me, so talking about this
subject is in no way meant to be negative. Everything has bugs and we
find them quicker if there are some ways of getting out more information
about what is happening in the guts of the system.

For HS, I put in a special debug mode and various information functions.
For HOT, I produced a set of page inspection functions.

I'm keen to have some ways where we can find out various metrics about
what is happening, so we can report back to you to check if there are
bugs. I foresee that some applications will be more likely to generate
serialization errors than others. People will ask questions and they may
claim there are bugs. I would like to be able to say "there is no bug -
look at XYZ and see that the errors you are getting are because of ABC".

or is consuming too much memory?

Relevant shared memory is allocated at startup, with strategies for
living within that as suggested by Heikki and summarized in a recent
post by Jeff. It's theoretically possible to run out of certain
objects, in which case there is an ereport, but we haven't seen
anything like that since the mitigation and graceful degradation
changes were implemented.

e.g. Is there a dynamic view e.g. pg_prepared_xacts,

We show predicate locks in the pg_locks view with mode SIReadLock.

OK, that's good.

is there a log mode log_ssi_impact, etc??

Don't have that. What would you expect that to show?

-Kevin

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kevin Grittner (#17)
Re: SSI patch version 14

Excerpts from Kevin Grittner's message of mié ene 26 14:07:18 -0300 2011:

Simon Riggs <simon@2ndQuadrant.com> wrote:

Pounding for hours on 16 CPU box sounds good. What diagnostics or
instrumentation are included with the patch? How will we know
whether pounding for hours is actually touching all relevant parts
of code? I've done such things myself only to later realise I
wasn't actually testing the right piece of code.

We've looked at distributions of failed transactions by ereport
statement. This confirms that we are indeed exercising the vast
majority of the code. See separate post for how we pushed execution
into the summarization path to test code related to that.

BTW did you try "make coverage" and friends? See
http://www.postgresql.org/docs/9.0/static/regress-coverage.html
and
http://developer.postgresql.org/~petere/coverage/

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#21Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#20)
#22Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#21)
#23Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#16)
#24Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#21)
#25Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#23)
#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#25)
#27Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#25)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#1)
#29Jeff Davis
pgsql@j-davis.com
In reply to: Dan Ports (#3)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#29)
#31Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#30)
#32Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#10)
#33Dan Ports
drkp@csail.mit.edu
In reply to: Jeff Davis (#29)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#33)
#35Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#1)
#36Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#34)
#37Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#36)
#38Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#35)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#38)
#40Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#37)
#41Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#40)
#42Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#41)
#43Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#40)
#44Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#42)
#45Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#43)
#46Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#45)
#47Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#46)
#48Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#46)
#49Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#40)
#50Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#48)
#51Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#39)
#52Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#50)
#53Jeff Davis
pgsql@j-davis.com
In reply to: Heikki Linnakangas (#51)
#54Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#52)
#55Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#54)
#56Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#55)
#57Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#56)
#58Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#57)
#59Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#13)
#60Markus Wanner
markus@bluegap.ch
In reply to: Heikki Linnakangas (#59)
#61Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#59)
#62Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Markus Wanner (#60)
#63Markus Wanner
markus@bluegap.ch
In reply to: Heikki Linnakangas (#62)
#64Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Markus Wanner (#63)
#65Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#64)
#66Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#65)
#67Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#66)
#68Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#67)
#69Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#68)
#70Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#69)
#71Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#70)
#72Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#71)
#73Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#72)
#74Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#70)
#75Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#74)
#76Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#75)
#77Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#76)
#78Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#77)
#79Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#78)
#80Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#79)
#81Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#80)
#82Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#81)
#83Jeff Davis
pgsql@j-davis.com
In reply to: Kevin Grittner (#79)
#84Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Jeff Davis (#83)
#85Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#84)
#86Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#79)
#87Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#86)
#88Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#82)
#89Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#89)
#91Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#88)
#92Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#91)
#93Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#92)
#94Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#92)
#95Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#94)
#96Dan Ports
drkp@csail.mit.edu
In reply to: Heikki Linnakangas (#93)
#97Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#96)
#98Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#95)
#99Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#97)
#100Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#98)
#101Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#98)
#102Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#101)
#103Dan Ports
drkp@csail.mit.edu
In reply to: Dan Ports (#100)
#104Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#101)
#105Dan Ports
drkp@csail.mit.edu
In reply to: Kevin Grittner (#104)
#106Robert Haas
robertmhaas@gmail.com
In reply to: Dan Ports (#105)
#107Dan Ports
drkp@csail.mit.edu
In reply to: Robert Haas (#106)
#108Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#104)
#109David Fetter
david@fetter.org
In reply to: Robert Haas (#106)
#110Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Heikki Linnakangas (#108)
#111Markus Wanner
markus@bluegap.ch
In reply to: David Fetter (#109)
#112Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Dan Ports (#105)
#113Robert Haas
robertmhaas@gmail.com
In reply to: Markus Wanner (#111)
#114A.M.
agentm@themactionfaction.com
In reply to: Robert Haas (#113)
#115Robert Haas
robertmhaas@gmail.com
In reply to: A.M. (#114)
#116Markus Wanner
markus@bluegap.ch
In reply to: Robert Haas (#113)
#117Robert Haas
robertmhaas@gmail.com
In reply to: Markus Wanner (#116)
#118Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kevin Grittner (#112)
#119Andrew Dunstan
andrew@dunslane.net
In reply to: Heikki Linnakangas (#118)