2-phase commit
Hi,
As the 7.4 beta rolls on, I thought now would be a good time to start
talking about the future.
I have a potential need in the future for distributed transactions
(XA). To get that from Postgres, I'd need two-phase commit, I think.
There is someone working on such a project
(<http://snaga.org/pgsql/>), but last time it was discussed here, it
received a rather lukewarm reception (see, e.g., the thread starting
at
<http://archives.postgresql.org/pgsql-hackers/2003-06/msg00752.php>).
While at OSCON, I had a discussion with Joe Conway, Bruce Momjian,
and Greg Sabino Mullane about 2PC. Various people expressed various
opinions on the topic, but I think we agreed on the following. The
relevant folks can correct me if I'm wrong:
Two-phase commit has theoretical problems, but it is implemented in
several "enterprise" RDBMS. 2PC is something needed by certain kinds
of clients (especially those with transaction managers), so if
PostgreSQL doesn't have it, PostgreSQL just won't get supported in
that arena. Someone is already working on 2PC, but may feel unwanted
due to the reactions last heard on the topic, and may not continue
working unless he gets some support. What is a necessary condition
for such support is to get some idea of what compromises 2PC might
impose, and thereafter to try to determine which such compromises, if
any, are acceptable ones.
I think the idea here is that, while in most cases a "pretty-good"
implementation of a desirable feature might get included in the
source on the grounds that it can always be improved upon later,
something like 2PC has the potential to do great harm to an otherwise
reliable transaction manager. So the arguments about what to do need
to be aired in advance.
I (perhaps foolishly) volunteered to undertake to collect the
arguments in various directions, on the grounds that I can contribute
no code, but have skin made of asbestos. I thought I'd try to
collect some information about what people think the problems and
potentially acceptable compromises are, to see if there is some way
to understand what can and cannot be contemplated for 2PC. I'll
include in any such outline the remarks found in the -hackers thread
referenced above. Any objections?
A
--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew@libertyrms.info> M2P 2A8
+1 416 646 3304 x110
In an attempt to throw the authorities off his trail, andrew@libertyrms.info (Andrew Sullivan) transmitted:
As the 7.4 beta rolls on, I thought now would be a good time to start
talking about the future.I have a potential need in the future for distributed transactions
(XA). To get that from Postgres, I'd need two-phase commit, I think.
There is someone working on such a project
(<http://snaga.org/pgsql/>), but last time it was discussed here, it
received a rather lukewarm reception (see, e.g., the thread starting
at
<http://archives.postgresql.org/pgsql-hackers/2003-06/msg00752.php>).
Interesting/positive news on this front; the XA specification
documents are now all available in PDF form "freely", from the Open
Group, where they used to be fairly pricey.
<http://www.opengroup.org/publications/catalog/tp.htm>
Another notable XA documentation source is here...
<http://www.middleware.net/tuxedo/resources/XA_Documentation.html>
Two interesting implications of XA support would be that there could
be some "congruence of interests" that would arise regarding two
vendors:
- XA is essentially based on the API of BEA Tuxedo. I'm told they
include a simple database system with Tuxedo, but nothing particularly
wonderful. (Who thinks of BEA as a DBMS vendor???) They might have
interest in bundling something better...
- The main Tuxedo reseller that I am aware of is PeopleSoft, who use
it for their "high traffic" clients. Anyone that has seen news lately
knows that they and Oracle aren't exactly "best pals" these days;
having another DB option could be helpful to them...
--
(format nil "~S@~S" "aa454" "freenet.carleton.ca")
http://www3.sympatico.ca/cbbrowne/tpmonitor.html
"In order to make an apple pie from scratch, you must first create the
universe." -- Carl Sagan, Cosmos
(moving to advocacy)
Christopher Browne wrote:
- The main Tuxedo reseller that I am aware of is PeopleSoft, who use
it for their "high traffic" clients. Anyone that has seen news lately
knows that they and Oracle aren't exactly "best pals" these days;
having another DB option could be helpful to them...
That's an interesting observation, because I've long thought PeopleSoft
ought to support Postgres too. From what I recall, their database schema
is *very* database neutral (at least as of PSFT version 7.x) and fairly
simple (we ran it on MSSQL 6.5). It would probably be pretty easily
ported to run on Postgres.
I wonder how we could get them to consider it...
Joe
On Tue, Aug 26, 2003 at 08:04:13PM -0400, Christopher Browne wrote:
Interesting/positive news on this front; the XA specification
documents are now all available in PDF form "freely", from the Open
Group, where they used to be fairly pricey.
A step in the right direction, but AFAIC it's too little, too late.
The impression I get, at least, is that it's as good as dead now: Java
may use it, but it hides the details anyway so it might as well not be
there--the Java way is to standardize the API but nothing that goes "on
the wire".
Lots of proprietary middleware uses XA, but from what I hear there are
enough subtle differences to make mixing-and-matching of products risky
at best--the proprietary way is to bundle products that will work at
least marginally together, and relegate standards to a bullshit point
in the PowerPoint presentations. "Based on industry standard" means
about the same as "based on a true story."
Then there's the fact that the necessary followup standards never got
anywhere, and the fact that XA doesn't cope with threading really well.
Don't get me wrong, XA support may well be a good thing. But at this
stage, personally I'd go for a good 2PC implementation first and worry
about supporting XA later.
Jeroen
Joe Conway wrote:
(moving to advocacy)
Christopher Browne wrote:
- The main Tuxedo reseller that I am aware of is PeopleSoft, who use
it for their "high traffic" clients. Anyone that has seen news lately
knows that they and Oracle aren't exactly "best pals" these days;
having another DB option could be helpful to them...That's an interesting observation, because I've long thought PeopleSoft
ought to support Postgres too. From what I recall, their database schema
is *very* database neutral (at least as of PSFT version 7.x) and fairly
simple (we ran it on MSSQL 6.5). It would probably be pretty easily
ported to run on Postgres.I wonder how we could get them to consider it...
Not a bad suggestion. Just went to their site and submitted an quick
brief of benefits/etc via their "Partner Proposal" page:
http://checkers.peoplesoft.com/allconn/ppp.nsf/PPP?OpenForm&Seq=2#_RefreshKW_type
I'm hoping they are read by People With A Clue, and that they in turn
will pass it on to the right group internally.
Worth a shot I guess.
:-)
Regards and best wishes,
Justin Clift
Show quoted text
Joe
After a long battle with technology,mail@joeconway.com (Joe Conway), an earthling, wrote:
(moving to advocacy)
Christopher Browne wrote:
- The main Tuxedo reseller that I am aware of is PeopleSoft, who use
it for their "high traffic" clients. Anyone that has seen news lately
knows that they and Oracle aren't exactly "best pals" these days;
having another DB option could be helpful to them...That's an interesting observation, because I've long thought
PeopleSoft ought to support Postgres too. From what I recall, their
database schema is *very* database neutral (at least as of PSFT
version 7.x) and fairly simple (we ran it on MSSQL 6.5). It would
probably be pretty easily ported to run on Postgres.I wonder how we could get them to consider it...
XA support so that it would "play well" with Tuxedo would be the best
thing I can think of. Arguing that they _should_ consider PostgreSQL
when it doesn't support their "scalability extender" wouldn't seem
likely to me to sell well.
That's _exactly_ why I mentioned both products; congruence of
interests...
--
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/internet.html
Consciousness - that annoying time between naps.
After a long battle with technology,justin@postgresql.org (Justin Clift), an earthling, wrote:
Worth a shot I guess.
:-)
I'd think that they would take the idea more seriously if PostgreSQL
supported XA and thereby was compatible with Tuxedo. But it probably
doesn't hurt for them to hear the idea multiple times...
--
let name="aa454" and tld="freenet.carleton.ca" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/finances.html
Whatever you do don't mail me at pink-and-wobbly@asdkjlwelkj.com,
because then I'll know you're just an address-harvester, and blacklist
your IP until the end of time
On Wed, Aug 27, 2003 at 22:46:58 -0700,
Joe Conway <mail@joeconway.com> wrote:
That's an interesting observation, because I've long thought PeopleSoft
ought to support Postgres too. From what I recall, their database schema
is *very* database neutral (at least as of PSFT version 7.x) and fairly
simple (we ran it on MSSQL 6.5). It would probably be pretty easily
ported to run on Postgres.
In my opinion it is too database agnostic. They pretty much just use the
DB as a file. From what I have seen of the system it is one big hack.
Their trusted client security model is ridiculous. Fortunately in
version 8 you don't have to let people run 2 tier accept for developer
types. (Anyone with 2 tier access owns the system.) I really don't
even trust 3 tier access, because I believe that a fair amount of
security is enforced by the client rather than the app server.
It was annoying that the set of characters usable for passwords in 7.6
(and presumably still apply to the connect ID in 8) was restricted
because they didn't want to quote the password string so that you could
have special characters in it.
They aren't big on using referential integrity to keep the data clean.
Bruno Wolff III wrote:
On Wed, Aug 27, 2003 at 22:46:58 -0700,
Joe Conway <mail@joeconway.com> wrote:That's an interesting observation, because I've long thought PeopleSoft
ought to support Postgres too. From what I recall, their database schema
is *very* database neutral (at least as of PSFT version 7.x) and fairly
simple (we ran it on MSSQL 6.5). It would probably be pretty easily
ported to run on Postgres.In my opinion it is too database agnostic. They pretty much just use the
DB as a file. From what I have seen of the system it is one big hack.
Yeah, I didn't say I *liked* their schema, just that I thought it would
be easy for them to support Postgres ;-)
Like it or not, they are one of the larger ERP/CRM players (after the
merger with JDEwards, they will be *ahead* of Oracle, only second to
SAP), and having them offer PostgreSQL support would be significant. If
the XA/Tuxedo thing is an issue, they could position it for mid-tier
customers who don't need the transaction manager anyway.
Joe
Oops! bruno@wolff.to (Bruno Wolff III) was seen spray-painting on a wall:
On Wed, Aug 27, 2003 at 22:46:58 -0700,
Joe Conway <mail@joeconway.com> wrote:That's an interesting observation, because I've long thought
PeopleSoft ought to support Postgres too. From what I recall, their
database schema is *very* database neutral (at least as of PSFT
version 7.x) and fairly simple (we ran it on MSSQL 6.5). It would
probably be pretty easily ported to run on Postgres.In my opinion it is too database agnostic. They pretty much just use
the DB as a file. From what I have seen of the system it is one big
hack.
Ah, so it's like the way SAP R/3's HR module works. (I expect I'm the
only one around that is more than passing familiar with "cluster
tables"; quite supremely nonrelational stuff, and quite
bletcherous...)
To a great extent this comes from the nature of the application. HR
is all about collecting together "documents," and these applications
replace "paper" with "pseudopaper."
They aren't big on using referential integrity to keep the data
clean.
Ditto for SAP R/3; "cleanliness" is, there, imposed by only using
their applications to do updates, which includes writing your software
to invoke their functions.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/linuxxian.html
ASSEMBLER is a language. Any language that can take a half-dozen
keystrokes and compile it down to one byte of code is all right in my
books. Though for the REAL programmer, assembler is a waste of
time. Why use a compiler when you can code directly into memory
through a front panel.
I haven't seen any comment on this email.
From our previous discussion of 2-phase commit, there was concern that
the failure modes of 2-phase commit were not solvable. However, I think
multi-master replication is going to have similar non-solvable failure
modes, yet people still want multi-master replication.
We have had several requests for 2-phase commit in the past month. I
think we should encourage the Japanese group to continue on their
2-phase commit patch to be included in 7.5. Yes, it will have
non-solvable failure modes, but let's discuss them and find an
appropriate way to deal with the failures.
---------------------------------------------------------------------------
Andrew Sullivan wrote:
Hi,
As the 7.4 beta rolls on, I thought now would be a good time to start
talking about the future.I have a potential need in the future for distributed transactions
(XA). To get that from Postgres, I'd need two-phase commit, I think.
There is someone working on such a project
(<http://snaga.org/pgsql/>), but last time it was discussed here, it
received a rather lukewarm reception (see, e.g., the thread starting
at
<http://archives.postgresql.org/pgsql-hackers/2003-06/msg00752.php>).While at OSCON, I had a discussion with Joe Conway, Bruce Momjian,
and Greg Sabino Mullane about 2PC. Various people expressed various
opinions on the topic, but I think we agreed on the following. The
relevant folks can correct me if I'm wrong:Two-phase commit has theoretical problems, but it is implemented in
several "enterprise" RDBMS. 2PC is something needed by certain kinds
of clients (especially those with transaction managers), so if
PostgreSQL doesn't have it, PostgreSQL just won't get supported in
that arena. Someone is already working on 2PC, but may feel unwanted
due to the reactions last heard on the topic, and may not continue
working unless he gets some support. What is a necessary condition
for such support is to get some idea of what compromises 2PC might
impose, and thereafter to try to determine which such compromises, if
any, are acceptable ones.I think the idea here is that, while in most cases a "pretty-good"
implementation of a desirable feature might get included in the
source on the grounds that it can always be improved upon later,
something like 2PC has the potential to do great harm to an otherwise
reliable transaction manager. So the arguments about what to do need
to be aired in advance.I (perhaps foolishly) volunteered to undertake to collect the
arguments in various directions, on the grounds that I can contribute
no code, but have skin made of asbestos. I thought I'd try to
collect some information about what people think the problems and
potentially acceptable compromises are, to see if there is some way
to understand what can and cannot be contemplated for 2PC. I'll
include in any such outline the remarks found in the -hackers thread
referenced above. Any objections?A
-- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote:
I haven't seen any comment on this email.
From our previous discussion of 2-phase commit, there was concern that
the failure modes of 2-phase commit were not solvable. However, I think
multi-master replication is going to have similar non-solvable failure
modes, yet people still want multi-master replication.We have had several requests for 2-phase commit in the past month. I
think we should encourage the Japanese group to continue on their
2-phase commit patch to be included in 7.5. Yes, it will have
non-solvable failure modes, but let's discuss them and find an
appropriate way to deal with the failures.
FWIW, Oracle 8's manual for the recovery of a distributed tx where the
coordinator never comes back on line is:
"If a database must be recovered to a point in the past, Oracle's
recovery facilities allow database administrators at other sites to
return their databases to the earlier point in time also. This ensures
that the global database remains consistent."
So it seems, for Oracle 8 at least, PITR is the method of recovery for
cohorts after unrecoverable coordinator failure.
Ugly and yet probably a prerequisite.
Mike Mascari
mascarm@mascari.com
Mike Mascari wrote:
Bruce Momjian wrote:
I haven't seen any comment on this email.
From our previous discussion of 2-phase commit, there was concern that
the failure modes of 2-phase commit were not solvable. However, I think
multi-master replication is going to have similar non-solvable failure
modes, yet people still want multi-master replication.We have had several requests for 2-phase commit in the past month. I
think we should encourage the Japanese group to continue on their
2-phase commit patch to be included in 7.5. Yes, it will have
non-solvable failure modes, but let's discuss them and find an
appropriate way to deal with the failures.FWIW, Oracle 8's manual for the recovery of a distributed tx where the
coordinator never comes back on line is:"If a database must be recovered to a point in the past, Oracle's
recovery facilities allow database administrators at other sites to
return their databases to the earlier point in time also. This ensures
that the global database remains consistent."So it seems, for Oracle 8 at least, PITR is the method of recovery for
cohorts after unrecoverable coordinator failure.
Yep, I assume PITR would be the solution for most failure cases --- very
ugly of course.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
From our previous discussion of 2-phase commit, there was concern that
the failure modes of 2-phase commit were not solvable. However, I think
multi-master replication is going to have similar non-solvable failure
modes, yet people still want multi-master replication.
No. The real problem with 2PC in my mind is that its failure modes
occur *after* you have promised commit to one or more parties. In
multi-master, if you fail you know it before you have told the client
his data is committed.
regards, tom lane
On Tue, Sep 09, 2003 at 08:38:41PM -0400, Bruce Momjian wrote:
Yep, I assume PITR would be the solution for most failure cases --- very
ugly of course.
Anything can be broken in some way, if bad luck is willing to work hard
enough. In at least one, ah, competing company I know of, employees are
allowed by the legal people to say "assured" but not "guaranteed" for
precisely this reason.
First thing is an acceptable failure mode, then you try to narrow its
chances of occurring. And if worst comes to worst, one example of an
acceptable failure mode is "when in danger or doubt, run in circles,
scream and shout."
Jeroen
From our previous discussion of 2-phase commit, there was concern that
the failure modes of 2-phase commit were not solvable. However, I think
multi-master replication is going to have similar non-solvable failure
modes, yet people still want multi-master replication.No. The real problem with 2PC in my mind is that its failure modes
occur *after* you have promised commit to one or more parties. In
multi-master, if you fail you know it before you have told the client
his data is committed.
Hmm ? The appl cannot take the first phase commit as its commit info. It
needs to wait for the second phase commit. The second phase is only finished
when all coservers have reported back. 2PC is synchronous.
The problems with 2PC are when after second phase commit was sent to all
servers and before all report back one of them becomes unreachable/down ...
(did it receive and do the 2nd commit or not) Such a transaction must stay
open until the coserver is reachable again or an administrator committed/aborted it.
It is multi master replication that usually has an asynchronous mode for
performance, and there the trouble starts.
Andreas
Import Notes
Resolved by subject fallback
Zeugswetter Andreas SB SD wrote:
From our previous discussion of 2-phase commit, there was concern that
the failure modes of 2-phase commit were not solvable. However, I think
multi-master replication is going to have similar non-solvable failure
modes, yet people still want multi-master replication.No. The real problem with 2PC in my mind is that its failure modes
occur *after* you have promised commit to one or more parties. In
multi-master, if you fail you know it before you have told the client
his data is committed.Hmm ? The appl cannot take the first phase commit as its commit info. It
needs to wait for the second phase commit. The second phase is only finished
when all coservers have reported back. 2PC is synchronous.The problems with 2PC are when after second phase commit was sent to all
servers and before all report back one of them becomes unreachable/down ...
(did it receive and do the 2nd commit or not) Such a transaction must stay
open until the coserver is reachable again or an administrator committed/aborted it.It is multi master replication that usually has an asynchronous mode for
performance, and there the trouble starts.
Let me diagram this so we can see the issues. Normal operation is:
Master Slave
------ -----
commit ready-->
<--OK
commit done--->
<--OK
completed
One possible failure is:
Master Slave
------ -----
commit ready-->
<--OK
commit done--->
dies here
stuck waiting
Another possible failure is:
Master Slave
------ -----
commit ready-->
<--OK
dies here
stuck waiting
Are these the issues? Can't we just add GUC timeouts to cause the
commit to fail, and the slave to stop waiting? I suppose a problem is:
Master Slave
------ -----
commit ready-->
<--OK
sleep
stuck waiting, times out
commit done
Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.
You're not considering the possibility of a transient communication
failure. The fact that you cannot currently contact the other guy
is not proof that he's not still alive.
Example:
Master Slave
------ -----
commit ready-->
<--OK
commit done->XX
where "->XX" means the message gets lost due to network failure. Now
what? The slave cannot abort; he promised he could commit, and he does
not know whether the master has committed or not. The master does not
know the slave's state either; maybe he got the second message, and
maybe he didn't. Both sides are forced to keep information about the
open transaction indefinitely. Timing out on either side could yield
the wrong result.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.You're not considering the possibility of a transient communication
failure. The fact that you cannot currently contact the other guy
is not proof that he's not still alive.Example:
Master Slave
------ -----
commit ready-->
<--OK
commit done->XXwhere "->XX" means the message gets lost due to network failure. Now
what? The slave cannot abort; he promised he could commit, and he does
not know whether the master has committed or not. The master does not
know the slave's state either; maybe he got the second message, and
maybe he didn't. Both sides are forced to keep information about the
open transaction indefinitely. Timing out on either side could yield
the wrong result.
Can't the master re-send the request after a timeout?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, 26 Sep 2003, Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Could we allow slaves to check if the backend is still alive, perhaps by
asking the postmaster, similar to what we do with the cancel signal ---
that way, the slave would never time out and always wait if the master
was alive.You're not considering the possibility of a transient communication
failure. The fact that you cannot currently contact the other guy
is not proof that he's not still alive.Example:
Master Slave
------ -----
commit ready-->
<--OK
commit done->XXwhere "->XX" means the message gets lost due to network failure. Now
'k, but isn't alot of that a "retry" issue? we're talking TCP here, not
UDP, which I *thought* was designed for transient network problems ... ?
I would think that any implementation would have a timeout/retry GUC
variable associated with it ... 'if no answer in x seconds, retry up to y
times' ...
if we are talking two computers sitting next to each other on a switch,
you'd expect those to be low ... but if you were talking about two
seperate geographical locations (and yes, I realize you are adding lag to
the mix with waiting for responses), you'd expect those #s to rise ...