Do we need to handle orphaned prepared transactions in the server?

Started by Hamid Akhtarover 6 years ago38 messageshackers

hamid.akhtar@gmail.com

over 6 years ago

Hello Everyone,

I have been thinking about the orphaned prepared transaction problem in
PostgreSQL and pondering on ways for handling it.

A prepared transaction can be left unfinished (neither committed nor
rollbacked) if the client has disappeared. It can happen for various
reasons including a client crash, or a server crash leading to client's
connection getting terminated and never returning back. Another way a
prepared transaction can be left unfinished is if a backup is restored that
carried the preparation steps, but not the steps closing the transaction.

Needless to mention that this does hamper maintenance work including
vacuuming of dead tuples.

First and foremost is to define what an orphaned transaction is. At this
stage, I believe any prepared transaction that has been there for more than
X time may be considered as an orphan. X may be defined as an integer in
seconds (a GUC perhaps). May be there are better ways to define this.
Please feel free to chime in.

This leads to a question whether at server level, we need to be better at
managing these orphaned prepared transactions. There are obviously other
ways of identifying such transactions by simply querying the
pg_prepared_xacts and checking transaction start date, which begs the
question if there is a real need here to make a change in the server to
either terminate these transactions (perhaps something similar to
idle_in_transaction_session_timeout) or notify an administrator (less
preferred as I believe notifications should be handled by some external
tools, not by server).

I see 3 potential solutions for solving this:
(1) Only check for any prepared transactions when server is starting or
restarting (may be after a crash)
(2) Have a background process that is checking on an idle timeout of
prepared transactions
(3) Do not make any change in the server and let the administrator handle
this by a client or an external tool

Option (1) IMHO seems to be the least suitable one as I'd expect that when
a server is being started (or restarted) perhaps after a crash, it is done
manually and user can see the server startup logs. So it is very likely
that user will notice any prepared transactions that were created when the
server was previously running and take any necessary actions.

Option (3) is let user manage it on their own, however they wish. This is
the simplest and the easiest way as we don't need to do anything here.

Option (2) is probably the best solution IMHO. Though, it does require
changes in the server which might not be an undertaking we wish to not
pursue for this problem.

So in case we wish to move forward with Option (2), this will require a
change in the server. One potential place is in autovacuum by adding a
similar change as it was done for idle_in_transaction_session_timeout, but
rather than terminating the connection in this case, we simply abort/roll
back the transaction. We could have a similar GUC for a prepared
transaction timeout. Though in this case, to be able to do that, we
obviously need a backend process that can monitor the timer which will add
overhead to any existing background process like the autovacuum, or
creation of a new background process (which is not such a good idea IMHO)
which will add even more overhead.

At this stage, I'm not sure of the scale of changes this will require,
however, I wanted to get an understanding and consensus on whether (a) this
is something we should work on, and (b) whether an approach to implementing
a timeout makes sense.

Please feel free to share your thoughts here.

Regards.

--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
SKYPE: engineeredvirus