ERROR after writing PREPARE WAL record
Hello
Cancel/terminate requests are held off during "PREPARE TRANSACTION"
processing in function PrepareTransaction(). However, a subroutine invoked
by PrepareTransaction() may perform elog(ERROR) or elog(FATAL).
And if that happens after PREPARE WAL record is written and before
transaction state is cleaned up, normal abort processing is triggered, i.e.
AbortTransaction(). It is not correct to perform abort transaction
workflow against a transaction that is already marked as prepared. A
prepared transaction should only be finished using "COMMIT/ROLLBACK
PREPARED" operation.
I tried injecting an elog(ERROR) at the end of EndPrepare() and that
resulted in a PANIC at some point.
Before delving into more details, I want to ascertain that this is a valid
problem to solve. Is the above problem worth worrying about?
Asim
Asim R P <apraveen@pivotal.io> writes:
Cancel/terminate requests are held off during "PREPARE TRANSACTION"
processing in function PrepareTransaction(). However, a subroutine invoked
by PrepareTransaction() may perform elog(ERROR) or elog(FATAL).
Doing anything that's likely to fail in the post-commit code path is
a Bad Idea (TM). There's no good recovery avenue, so the fact that
you generally end up at a PANIC is expected/intentional.
The correct response, if you notice code doing that, is to fix it so
it doesn't do that. Typically the right answer is to move the
failure-prone operation to pre-commit processing.
regards, tom lane
On Wed, Jul 17, 2019 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Asim R P <apraveen@pivotal.io> writes:
Cancel/terminate requests are held off during "PREPARE TRANSACTION"
processing in function PrepareTransaction(). However, a subroutine
invoked
by PrepareTransaction() may perform elog(ERROR) or elog(FATAL).
The correct response, if you notice code doing that, is to fix it so
it doesn't do that. Typically the right answer is to move the
failure-prone operation to pre-commit processing.
Thank you for the response. There is nothing particularly alarming. There
is one case in LWLockAcquire that may error out if (num_held_lwlocks >=
MAX_SIMUL_LWLOCKS). This problem also exists in CommitTransaction() and
AbortTransaction() code paths. Then there is arbitrary add-on code
registered as Xact_callbacks.
SyncRepWaitForLSN() directly checks ProcDiePending and QueryCancelPending
without going through CHECK_FOR_INTERRUPTS and that is for good reason.
Moreover, it only emits a WARNING, so no problem there.
Asim