[PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

Started by Dave Sharpealmost 6 years ago3 messages
#1Dave Sharpe
dave.sharpe@lzlabs.com
6 attachment(s)

Hi pghackers,
This is my first time posting here ... Gilles Darold and I are developing a new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses a RegisterXactCallback function to send local transactions remote. But I found that a registered XactCallback is not always called for a successful client ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by FDW, and now the local and remote transaction states are messed up, out of sync. The local database is "eating" the successful rollback.

I attach a git format-patch file 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
The patch fixes the problem and is ready to commit as far as I can tell. It adds some comment lines and three lines of code to CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the callback returns successful (which is usually the case), (3) the new code switches back to TBLOCK_ABORT_END, then continues with old code to CleanupTransaction() as before. If any callback does error out, the TBLOCK_ABORT was the correct block state for an error.

I have not added a regression test since my scenario involves a C module ... I didn't see such a regression test, but somebody can teach me where to put the C module and Makefile if a regression test should be added. Heads up that the scenario to hit this is a bit complex (to me) ... I attach the following unit test files:
1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for debugging.
RegisterSubXactCallback(mySubtransactionCallback) which injects some logging and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW savepoint failed.
And RegisterXactTransaction(myTransactionCallback) with info logging.
2) Makefile, to make the eat_rollback.so
3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which errors out, then later a successful ROLLBACK gets incorrectly eaten (no CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK (now there is CALLBACK info logging).
4) eat_rollback2.out, output without the fix, the rollback at line 68 is successful but there is not myTransactionCallback() INFO output
5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is successful, and now there is a myTransactionCallback() INFO log. Success!

It first failed for me in v11.1, I suspect it failed since before then too. And it is still failing in current master.

Bye the way, we worked around the bug in our FDW by handling sub/xact in the utility hook. A transaction callback is still needed for implicit, internal rollbacks. Another advantage of the workaround is (I think) it reduces the code complexity and improves performance because the entire subxact stack is not unwound to drive each and every subtransaction level to remote. Also sub/transaction statements are sent remote as they arrive (local and remote are more "transactionally" synced, not stacked by FDW for later). And of course, the workaround doesn't hit the above bug, since our utility hook correctly handles the client ROLLBACK. If it makes sense to the community, I could try and patch contrib/postgres_fdw to do what we are doing. But note that I don't need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software defined mainframe with PostgreSQL (of all things).

Hope it helps!

Dave Sharpe
I don't necessarily agree with everything I say. (MM)
www.lzlabs.com

Attachments:

0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patchapplication/octet-stream; name=0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patchDownload
From 6b669f9ebc71980851ae7ce5ec8119e0bfa9df02 Mon Sep 17 00:00:00 2001
From: Dave Sharpe <davesharpe13@gmail.com>
Date: Tue, 24 Mar 2020 19:58:56 -0400
Subject: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in
 TBLOCK_ABORT_END

---
 src/backend/access/transam/xact.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index e3c60f23cd..b745007102 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2973,10 +2973,15 @@ CommitTransactionCommand(void)
 
 			/*
 			 * Here we were in an aborted transaction block and we just got
-			 * the ROLLBACK command from the user, so clean up the
+			 * the ROLLBACK command from the user, so first return blockState
+			 * to TBLOCK_ABORT in case of callXactCallbacks() failure, run any
+			 * callback and restore blockState on success. Now clean up the
 			 * already-aborted transaction and return to the idle state.
 			 */
 		case TBLOCK_ABORT_END:
+			s->blockState = TBLOCK_ABORT;
+			CallXactCallbacks(XACT_EVENT_ABORT);
+			s->blockState = TBLOCK_ABORT_END;
 			CleanupTransaction();
 			s->blockState = TBLOCK_DEFAULT;
 			if (s->chain)
-- 
2.16.2

eat_rollback.capplication/octet-stream; name=eat_rollback.cDownload
Makefileapplication/octet-stream; name=MakefileDownload
eat_rollback2.sqlapplication/octet-stream; name=eat_rollback2.sqlDownload
eat_rollback2.outapplication/octet-stream; name=eat_rollback2.outDownload
eat_rollback2.fixedapplication/octet-stream; name=eat_rollback2.fixedDownload
#2Gilles Darold
gilles@darold.net
In reply to: Dave Sharpe (#1)
Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

Le 25/03/2020 à 03:49, Dave Sharpe a écrit :

Hi pghackers,

This is my first time posting here ...  Gilles Darold and I are
developing a new FDW which is based on the contrib/postgres_fdw. The
postgres_fdw logic uses a RegisterXactCallback function to send local
transactions remote. But I found that a registered XactCallback is not
always called for a successful client ROLLBACK statement. So, a
successful local ROLLBACK does not get sent remote by FDW, and now the
local and remote transaction states are messed up, out of sync. The
local database is "eating" the successful rollback.

 

I attach a git format-patch file
0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch

The patch fixes the problem and is ready to commit as far as I can
tell. It adds some comment lines and three lines of code to
CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to
reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2)
a new call to callXactCallbacks(). If the callback returns successful
(which is usually the case), (3) the new code switches back to
TBLOCK_ABORT_END, then continues with old code to CleanupTransaction()
as before. If any callback does error out, the TBLOCK_ABORT was the
correct block state for an error.

 

I have not added a regression test since my scenario involves a C
module ... I didn't see such a regression test, but somebody can teach
me where to put the C module and Makefile if a regression test should
be added. Heads up that the scenario to hit this is a bit complex (to
me) ... I attach the following unit test files:

1) eat_rollback.c, _/PG_init() installs my/_utility_hook() for INFO
log for debugging.

RegisterSubXactCallback(mySubtransactionCallback) which injects some
logging and an ERROR for savepoints, i.e., negative testing, e.g.,
like a remote FDW savepoint failed.

And RegisterXactTransaction(myTransactionCallback) with info logging.

2) Makefile, to make the eat_rollback.so

3) eat_rollback2.sql, drives the fail sequence, especially the
SAVEPOINT, which errors out, then later a successful ROLLBACK gets
incorrectly eaten (no CALLBACK info logging, demonstrates the bug),
then another successful ROLLBACK (now there is CALLBACK info logging).

4) eat_rollback2.out, output without the fix, the rollback at line 68
is successful but there is not myTransactionCallback() INFO output

5) eat_rollback2.fixed, output after the fix, the rollback at line 68
is successful, and now there is a myTransactionCallback() INFO log.
Success!

 

It first failed for me in v11.1, I suspect it failed since before then
too. And it is still failing in current master.

 

Bye the way, we worked around the bug in our FDW by handling sub/xact
in the utility hook. A transaction callback is still needed for
implicit, internal rollbacks. Another advantage of the workaround is
(I think) it reduces the code complexity and improves performance
because the entire subxact stack is not unwound to drive each and
every subtransaction level to remote. Also sub/transaction statements
are sent remote as they arrive (local and remote are more
"transactionally" synced, not stacked by FDW for later). And of
course, the workaround doesn't hit the above bug, since our utility
hook correctly handles the client ROLLBACK. If it makes sense to the
community, I could try and patch contrib/postgres_fdw to do what we
are doing. But note that I don't need it myself: we have our own new
FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software
defined mainframe with PostgreSQL (of all things).

 

Hope it helps!

 

Dave Sharpe

/I don't necessarily agree with everything I say./(MM)

www.lzlabs.com

Hi,

As I'm involved in this thread I have given a review to this bug report
and I don't think that there is a problem here but as a disclaimer my
knowledge on internal transaction management is probably not enough to
have a definitive opinion.

Actually the callback function is called when the error is thrown:

psql:eat_rollback2.sql:20: INFO:  00000: myTransactionCallback()
XactEvent 2 (is abort) level 1 <-------------------
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback2.sql:20: ERROR:  XX000: no no no
LOCATION:  mySubtransactionCallback, eat_rollback.c:65

this is probably why the callback is not called on the subsequent
ROLLBACK execution because abort processing is already done
(src/backend/access/transam/xact.c:3890).

With this simple test and the debug extension:

BEGIN;
DROP INDEX "index2", "index3"; -- will fail indexes doesn't exist
ROLLBACK;

the callback function is also called at error level, not on the ROLLBACK:

---------------------------------------------------------------------------------BEGIN
psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO:  00000:
my_utility_hook() utility transaction kind 0 (is not rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
BEGIN
---------------------------------------------------------------------------------DROP
INDEX + error
psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO:  00000:
myTransactionCallback() XactEvent 2 (is abort) level 1 <---------------
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR:  42704: index
"concur_index2" does not exist
LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1209
---------------------------------------------------------------------------------ROLLBACK
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000:
my_utility_hook() utility transaction kind 3 (is rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
ROLLBACK

So this is probably the expected behavior , however the proposed patch
will throw the callback function two times which might not be what we want.

---------------------------------------------------------------------------------BEGIN
psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO:  00000:
my_utility_hook() utility transaction kind 0 (is not rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
BEGIN
---------------------------------------------------------------------------------DROP
INDEX + error
psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO:  00000:
myTransactionCallback() XactEvent 2 (is abort) level 1
<-------------- first call
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR:  42704: index
"concur_index2" does not exist
LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1209
---------------------------------------------------------------------------------ROLLBACK
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000:
my_utility_hook() utility transaction kind 3 (is
rollback)<-------------- second call
LOCATION:  my_utility_hook, eat_rollback.c:28
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000:
myTransactionCallback() XactEvent 2 (is abort) level 1
LOCATION:  myTransactionCallback, eat_rollback.c:52
ROLLBACK

But again I may be wrong.

--
Gilles Darold
http://www.darold.net/

#3Dave Sharpe
dave.sharpe@lzlabs.com
In reply to: Gilles Darold (#2)
Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

From Gilles Darold <gilles@darold.net> on 2020-03-26T16:09:04.

Actually the callback function is called when the error is thrown:

psql:eat_rollback2.sql:20: INFO: 00000: myTransactionCallback() XactEvent 2 (is abort) level 1 <-----------------
LOCATION: myTransactionCallback, eat_rollback.c:52
psql:eat_rollback2.sql:20: ERROR: XX000: no no no
LOCATION: mySubtransactionCallback, eat_rollback.c:65

this is probably why the callback is not called on the subsequent ROLLBACK execution because abort processing is
already done (src/backend/access/transam/xact.c:3890).

So I withdraw this patch and fix. The callback during the error will drive the ROLLBACK remote, as required in the fdw.
Great catch, thanks Gilles!

Cheers,
Dave