Incorrect response code after XA recovery

Started by Ondrej Chaloupkaover 12 years ago13 messagesbugsgeneral
Jump to latest
#1Ondrej Chaloupka
ochaloup@redhat.com
bugsgeneral

Hi,

I would like to consult with you a problematic response put by PostgreSQL after transaction recovery run by Narayana (JBossTS).

I work on tests for Narayana and I hit a issue with PostgreSQL. The db returns incorrect code XAException.XA_HEURHAZ when the TM does recovery after crash of the jboss eap app server.
The exception is following:
Caused by: org.postgresql.util.PSQLException: ERROR: prepared transaction with identifier "131072_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAAKDE=_AAAAAAAAAAAAAP//fwAAAd7TXOBR8jj5AAAALQAAAAAAAAAA" does not exist

It's run on PostgreSQL 9.2 but the older versions seem to be affected as well.

The problem occurs when TM runs on JTS transactions.

The idea of the test:
The test enlists two resources to a transaction. There is called prepare on resource of PostgreSQL. The app server crashes before prepare is called on second transaction participant. After restart of the app server TM tries to recover the transaction. As the fail occurs during prepare phase rollback is expected.

The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This causes that two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the second time it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB returns error that no such transaction exists. But this seems to be against OTS specification.
There are some more details in the following bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=988724

Do you have some experience with such behaviour? Can I suppose this being problem of PostgreSQL? Or is there already some bug for this issue in Postgres bugtracking system?

Thank you
Ondra

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ondrej Chaloupka (#1)
bugsgeneral
Re: [BUGS] Incorrect response code after XA recovery

Ondrej Chaloupka <ochaloup@redhat.com> writes:

The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This causes that two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the second time it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB returns error that no such transaction exists. But this seems to be against OTS specification.

It's not likely that we would consider changing the behavior of ROLLBACK
PREPARED. The alternatives we would have are (1) silently accept a
ROLLBACK against a non-existent transaction ID, or (2) remember every
rolled-back ID forever. Neither seems sane in the least.

It seems to me that this is something client-side code, probably the XA
manager, would need to deal with. The XA manager already has to track
uncommitted 2-phase transactions, and would furthermore have the best
idea of when it would be safe to forget about a rolled-back ID.

Right offhand it appears to me that that Red Hat bug is filed against
the correct component, and you need to push them harder to fix their
bug/shortcoming rather than claim it's our problem.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Jenkinson
tom.jenkinson@redhat.com
In reply to: Tom Lane (#2)
bugsgeneral
Re: Incorrect response code after XA recovery

Hi Tom,

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch�s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

Thanks,
Tom

On 29/07/13 14:50, Tom Lane wrote:

Ondrej Chaloupka <ochaloup@redhat.com> writes:

The OTS specification requires both bottom up and top down recovery to be triggered by the recovering resource. This causes that two rollback calls are done against the DB. DB receives rollback call and does the rollback. Then for the second time it returns the exceptional code. As the DB already rollbacked the transaction and forgot about it the DB returns error that no such transaction exists. But this seems to be against OTS specification.

It's not likely that we would consider changing the behavior of ROLLBACK
PREPARED. The alternatives we would have are (1) silently accept a
ROLLBACK against a non-existent transaction ID, or (2) remember every
rolled-back ID forever. Neither seems sane in the least.

It seems to me that this is something client-side code, probably the XA
manager, would need to deal with. The XA manager already has to track
uncommitted 2-phase transactions, and would furthermore have the best
idea of when it would be safe to forget about a rolled-back ID.

Right offhand it appears to me that that Red Hat bug is filed against
the correct component, and you need to push them harder to fix their
bug/shortcoming rather than claim it's our problem.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Jenkinson (#3)
bugsgeneral
Re: Incorrect response code after XA recovery

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch�s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Tom Jenkinson
tom.jenkinson@redhat.com
In reply to: Tom Lane (#4)
bugsgeneral
Re: Incorrect response code after XA recovery

Hi Tom,

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch’s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.

Is there a different component within your bug tracking system we
should be using to raise this against the JDBC driver instead?

Thanks,
Tom

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Jenkinson (#5)
bugsgeneral
Re: Incorrect response code after XA recovery

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.

Is there a different component within your bug tracking system we
should be using to raise this against the JDBC driver instead?

The folk who would fix anything in the JDBC driver tend to read
pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Alban Hertroys
haramrae@gmail.com
In reply to: Tom Jenkinson (#5)
bugsgeneral
Re: [GENERAL] Incorrect response code after XA recovery

On Jul 29, 2013, at 16:57, Tom Jenkinson <tom.jenkinson@redhat.com> wrote:

Hi Tom,

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch’s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how the PostGres project is packaged, all I know is that the PostGres JDBC driver component appears to be returning an XAException with the message "Error rolling back prepared transaction" and an errorCode of XAException.XAER_RMERR rather than XAER_NOTA.

Looking at the error codes, it appears that it isn't even the Postgres JDBC driver returning that error, but the XA manager you're using, which is not a part of Postgres (nor is the JDBC driver, for that matter - that's a separate project).

The errors you're quoting are from the XA manager and are about XA manager stuff. For all we know, the actual error appears to be occuring in the XA manager and not in Postgres. It's possible that the XA manager error is a result of an error that Postgres returned, but since the XA manager prints its own error message and not the original one, you'll need to uncover those error messages before we can help you with them.

For all we know at this point, the error is with your XA manager, not with Postgres.

If you want to be sure, grep the source of the JDBC driver for those error codes; I doubt you'll find them in there.
Google was kind enough to point me here: http://jdbc.postgresql.org/development/git.html

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Tom Jenkinson
tom.jenkinson@redhat.com
In reply to: Alban Hertroys (#7)
bugsgeneral
Re: [GENERAL] Incorrect response code after XA recovery

Hi Alban,

I stripped down the code to a raw XA example using the latest postgres
driver available in maven central. It demonstrates that regardless of
what the codebase might suggest, it is certainly the case that postgres
is returning XAER_RMERR in the scenario where the resource manager no
longer knows about the Xid.

The code is available here:
https://github.com/tomjenkinson/xa-recovery/commit/944d45e86a91eacb9489843acfbf6a80f1b4b820

I hope that this helps,
Tom

On Mon 29 Jul 2013 18:52:31 BST, Alban Hertroys wrote:

On Jul 29, 2013, at 16:57, Tom Jenkinson <tom.jenkinson@redhat.com> wrote:

Hi Tom,

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

A little bit of information in the linked bugzilla report is that the
exception being returned has an XA error code of XAER_RMERR "An error
occurred in rolling back the transaction branch. The resource manager is
free to forget about the branch when returning this error so long as all
accessing threads of control have been notified of the branch’s state."

That does not sound right to me, wouldn't XAER_NOTA "The specified XID
is not known by the resource manager" be more accurate?

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how the PostGres project is packaged, all I know is that the PostGres JDBC driver component appears to be returning an XAException with the message "Error rolling back prepared transaction" and an errorCode of XAException.XAER_RMERR rather than XAER_NOTA.

Looking at the error codes, it appears that it isn't even the Postgres JDBC driver returning that error, but the XA manager you're using, which is not a part of Postgres (nor is the JDBC driver, for that matter - that's a separate project).

The errors you're quoting are from the XA manager and are about XA manager stuff. For all we know, the actual error appears to be occuring in the XA manager and not in Postgres. It's possible that the XA manager error is a result of an error that Postgres returned, but since the XA manager prints its own error message and not the original one, you'll need to uncover those error messages before we can help you with them.

For all we know at this point, the error is with your XA manager, not with Postgres.

If you want to be sure, grep the source of the JDBC driver for those error codes; I doubt you'll find them in there.
Google was kind enough to point me here: http://jdbc.postgresql.org/development/git.html

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Jenkinson (#8)
bugsgeneral
Re: [BUGS] Incorrect response code after XA recovery

Tom Jenkinson escribi�:

Hi Alban,

I stripped down the code to a raw XA example using the latest
postgres driver available in maven central. It demonstrates that
regardless of what the codebase might suggest, it is certainly the
case that postgres is returning XAER_RMERR in the scenario where the
resource manager no longer knows about the Xid.

The code is available here:
https://github.com/tomjenkinson/xa-recovery/commit/944d45e86a91eacb9489843acfbf6a80f1b4b820

Those error codes do certainly appear in the PGXAConnection.java source
in the pgjdbc git.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Jeremy Whiting
jwhiting@redhat.com
In reply to: Tom Lane (#6)
bugsgeneral
Re: [BUGS] Incorrect response code after XA recovery

Hi Tom,
The driver currently doesn't report back to the calling client (tm)
XAException.XAER_NOTA code as Ondrej and Tom Jenkinson have identified.
Instead it returns XAException.XAER_RMERR. See line 416

https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/xa/PGXAConnection.java#416

which imo is used for general errors in the resource manager.

I've written a test case that can be pulled into the pgjdbc testsuite
that will make verifying this error easier. It is based on the example
code Tom Jenkinson provided. A pull request has been created which can
be found here...

https://github.com/pgjdbc/pgjdbc/pull/73

I am currently coding up a change to the driver in anticipation there
is agreement in the pgjdbc group to change the rollback method. Another
pull request will be created for that. Let's see what discussion and
decision is made by the more active members in pgjdbc.

Regards,
Jeremy

On 29/07/13 16:11, Tom Lane wrote:

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.
Is there a different component within your bug tracking system we
should be using to raise this against the JDBC driver instead?

The folk who would fix anything in the JDBC driver tend to read
pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.

regards, tom lane

--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

#11Jeremy Whiting
jwhiting@redhat.com
In reply to: Jeremy Whiting (#10)
bugsgeneral
Re: [BUGS] Incorrect response code after XA recovery

Hi pgjdbc,
The code change to fix the failing test case
(https://github.com/whitingjr/pgjdbc/tree/xa-not-found-testcase) is on
this branch...
https://github.com/whitingjr/pgjdbc/tree/handle-rollback-xid-not-found

By checking the SQLException.SQLState field the driver will now throw
an exception using XAER_NOTA as the reason. The SQLState being 42704
("undefined_object") error code as listed in
http://www.postgresql.org/docs/9.2/static/errcodes-appendix.html

Comments on the implementation are welcome though probably premature
because wider discussion has yet to happen for $subject.

Regards,
Jeremy

On 31/07/13 12:36, Jeremy Whiting wrote:

Hi Tom,
The driver currently doesn't report back to the calling client (tm)
XAException.XAER_NOTA code as Ondrej and Tom Jenkinson have identified.
Instead it returns XAException.XAER_RMERR. See line 416

https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/xa/PGXAConnection.java#416

which imo is used for general errors in the resource manager.

I've written a test case that can be pulled into the pgjdbc testsuite
that will make verifying this error easier. It is based on the example
code Tom Jenkinson provided. A pull request has been created which can
be found here...

https://github.com/pgjdbc/pgjdbc/pull/73

I am currently coding up a change to the driver in anticipation there
is agreement in the pgjdbc group to change the rollback method. Another
pull request will be created for that. Let's see what discussion and
decision is made by the more active members in pgjdbc.

Regards,
Jeremy

On 29/07/13 16:11, Tom Lane wrote:

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.
Is there a different component within your bug tracking system we
should be using to raise this against the JDBC driver instead?

The folk who would fix anything in the JDBC driver tend to read
pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.

regards, tom lane

--
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Mark Hegarty (Ireland), Matt Parson (USA), Charlie Peters (USA)

--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

#12Jeremy Whiting
jwhiting@redhat.com
In reply to: Jeremy Whiting (#10)
bugsgeneral
Re: [JDBC] Incorrect response code after XA recovery

Hello Tom,
A quick update on progress. A second PR was created to provide a patch.

https://github.com/pgjdbc/pgjdbc/pull/76

Regards,
Jeremy

On 31/07/13 12:36, Jeremy Whiting wrote:

Hi Tom,
The driver currently doesn't report back to the calling client (tm)
XAException.XAER_NOTA code as Ondrej and Tom Jenkinson have identified.
Instead it returns XAException.XAER_RMERR. See line 416

https://github.com/pgjdbc/pgjdbc/blob/master/org/postgresql/xa/PGXAConnection.java#416

which imo is used for general errors in the resource manager.

I've written a test case that can be pulled into the pgjdbc testsuite
that will make verifying this error easier. It is based on the example
code Tom Jenkinson provided. A pull request has been created which can
be found here...

https://github.com/pgjdbc/pgjdbc/pull/73

I am currently coding up a change to the driver in anticipation there
is agreement in the pgjdbc group to change the rollback method. Another
pull request will be created for that. Let's see what discussion and
decision is made by the more active members in pgjdbc.

Regards,
Jeremy

On 29/07/13 16:11, Tom Lane wrote:

Tom Jenkinson <tom.jenkinson@redhat.com> writes:

On Mon 29 Jul 2013 15:46:12 BST, Tom Lane wrote:

No idea, but in any case that's outside Postgres' purview. It's barely
possible that the Postgres JDBC driver has something to do with that,
but it sounds more like the XA manager's turf.

I am not sure what you mean here as I don't know the structure of how
the PostGres project is packaged, all I know is that the PostGres JDBC
driver component appears to be returning an XAException with the
message "Error rolling back prepared transaction" and an errorCode of
XAException.XAER_RMERR rather than XAER_NOTA.
Is there a different component within your bug tracking system we
should be using to raise this against the JDBC driver instead?

The folk who would fix anything in the JDBC driver tend to read
pgsql-jdbc sooner than pgsql-bugs, so cc'ing there for comment.

regards, tom lane

--
Jeremy Whiting
Senior Software Engineer, Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Michael Cunningham (USA), Mark Hegarty (Ireland), Matt Parson (USA), Charlie Peters (USA)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeremy Whiting (#12)
bugsgeneral
Re: [JDBC] Incorrect response code after XA recovery

On 05.08.2013 17:58, Jeremy Whiting wrote:

Hello Tom,
A quick update on progress. A second PR was created to provide a patch.

https://github.com/pgjdbc/pgjdbc/pull/76

Thanks. Looks good to me.

I wish the backend would throw a more specific error code for this,
42704 is used for many other errors as well. But at COMMIT/ROLLBACK
PREPARED, it's probably safe to assume that it means that the
transaction does not exist.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs