Problem with txid_snapshot_in/out() functionality

Started by Jan Wieckabout 12 years ago31 messageshackers
Jump to latest
#1Jan Wieck
JanWieck@Yahoo.com

Hackers,

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.

The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation. This
string is later rejected by txid_snapshot_in() when trying to determine
if a particular txid is visible in that snapshot using the
txid_visible_in_snapshot() function.

I was not yet able to reproduce this problem in a lab environment. It
might be related to subtransactions and/or two phase commit (at least
one user is using both of them). The reported PostgreSQL version
involved in that case was 9.1.

At this point I would find it extremely helpful to "sanitize" the
external representation in txid_snapshot_out() while emitting some
NOTICE level logging when this actually happens. I am aware that this
does amount to a functional change for a back release, but considering
that the _out() generated external representation of an existing binary
datum won't pass the type's _in() function, I argue that such change is
warranted. Especially since this problem could possibly corrupt a dump.

Comments?

Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jan Wieck (#1)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/2014 12:07 AM, Jan Wieck wrote:

Hackers,

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.

The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation. This
string is later rejected by txid_snapshot_in() when trying to determine
if a particular txid is visible in that snapshot using the
txid_visible_in_snapshot() function.

I was not yet able to reproduce this problem in a lab environment. It
might be related to subtransactions and/or two phase commit (at least
one user is using both of them). The reported PostgreSQL version
involved in that case was 9.1.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.

You can reproduce that by putting a sleep or breakpoint in
PrepareTransaction(), just before the
"ProcArrayClearTransaction(MyProc);" call. If you call
txid_current_snapshot() from another session at that point, it will
output two duplicate xids. (you will have to also commit one more
unrelated transaction to bump up xmax).

At this point I would find it extremely helpful to "sanitize" the
external representation in txid_snapshot_out() while emitting some
NOTICE level logging when this actually happens. I am aware that this
does amount to a functional change for a back release, but considering
that the _out() generated external representation of an existing binary
datum won't pass the type's _in() function, I argue that such change is
warranted. Especially since this problem could possibly corrupt a dump.

Hmm. Do we snapshots to be stored in tables, and included in a dump? I
don't think we can guarantee that will work, at least not across
versions, as the way we handle snapshot internally can change.

But yeah, we probably should do something about that. The most
straightforward fix would be to scan the array in
txid_current_snapshot() and remove any duplicates.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jan Wieck
JanWieck@Yahoo.com
In reply to: Heikki Linnakangas (#2)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/14 03:27, Heikki Linnakangas wrote:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

Hackers,

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.

The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation. This
string is later rejected by txid_snapshot_in() when trying to determine
if a particular txid is visible in that snapshot using the
txid_visible_in_snapshot() function.

I was not yet able to reproduce this problem in a lab environment. It
might be related to subtransactions and/or two phase commit (at least
one user is using both of them). The reported PostgreSQL version
involved in that case was 9.1.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.

You can reproduce that by putting a sleep or breakpoint in
PrepareTransaction(), just before the
"ProcArrayClearTransaction(MyProc);" call. If you call
txid_current_snapshot() from another session at that point, it will
output two duplicate xids. (you will have to also commit one more
unrelated transaction to bump up xmax).

Thanks, that explains it.

At this point I would find it extremely helpful to "sanitize" the
external representation in txid_snapshot_out() while emitting some
NOTICE level logging when this actually happens. I am aware that this
does amount to a functional change for a back release, but considering
that the _out() generated external representation of an existing binary
datum won't pass the type's _in() function, I argue that such change is
warranted. Especially since this problem could possibly corrupt a dump.

Hmm. Do we snapshots to be stored in tables, and included in a dump? I
don't think we can guarantee that will work, at least not across
versions, as the way we handle snapshot internally can change.

At least Londiste and Slony do store snapshots as well as xids in tables
and assuming that the txid epoch is properly bumped, that information is
useful and valid after a restore.

But yeah, we probably should do something about that. The most
straightforward fix would be to scan the array in
txid_current_snapshot() and remove any duplicates.

The code in txid_snapshot_in() checks that the xip list is ascending.
txid_snapshot_out() does not sort the list, so it must already be sorted
when the snapshot itself is created. That scan would be fairly simple.

Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#2)
Re: Problem with txid_snapshot_in/out() functionality

On 2014-04-12 10:27:16 +0300, Heikki Linnakangas wrote:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation. This
string is later rejected by txid_snapshot_in() when trying to determine
if a particular txid is visible in that snapshot using the
txid_visible_in_snapshot() function.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the PGXACT
entry of the backend is cleared. There is a transient state when both PGXACT
entries have the same xid.

Which I find to be a pretty bad idea independent of this bug. But I
think that's nothing fixable in the back branches.

Hmm. Do we snapshots to be stored in tables, and included in a dump? I don't
think we can guarantee that will work, at least not across versions, as the
way we handle snapshot internally can change.

Hm. I don't think we'll earn much love changing that - there's at the
very least slony and londiste out there using it... IIRC both store the
result in tables.

But yeah, we probably should do something about that. The most
straightforward fix would be to scan the array in txid_current_snapshot()
and remove any duplicates.

Since it's sorted there, that should be fairly straightforward. Won't
fix already created and stored datums tho. Maybe _in()/parse_snapshot()
should additionally skip over duplicate values? Looks easy enough.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jan Wieck
JanWieck@Yahoo.com
In reply to: Andres Freund (#4)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/14 08:38, Andres Freund wrote:

Since it's sorted there, that should be fairly straightforward. Won't
fix already created and stored datums tho. Maybe _in()/parse_snapshot()
should additionally skip over duplicate values? Looks easy enough.

There is the sort ... missed that when glancing over the code earlier.

Right, that is easy enough and looks like an acceptable fix for back
branches too.

Thanks,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: Problem with txid_snapshot_in/out() functionality

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.
The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.

Hm, yeah, but why is that intermediate state visible to anyone else?
Don't we have exclusive lock on the PGPROC array while we're doing this?
If we don't, aren't we letting other backends see non-self-consistent
state in regards to who holds which locks, for example?

I'm worried that the proposed fix is just band-aiding one particular
symptom of inadequate locking.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#6)
Re: Problem with txid_snapshot_in/out() functionality

On 2014-04-12 09:47:24 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.
The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.

Hm, yeah, but why is that intermediate state visible to anyone else?
Don't we have exclusive lock on the PGPROC array while we're doing this?

It's done outside the remit of ProcArray lock :(. And documented to lead
to duplicate xids in PGXACT.
EndPrepare():
/*
* Mark the prepared transaction as valid. As soon as xact.c marks
* MyPgXact as not running our XID (which it will do immediately after
* this function returns), others can commit/rollback the xact.
*
* NB: a side effect of this is to make a dummy ProcArray entry for the
* prepared XID. This must happen before we clear the XID from MyPgXact,
* else there is a window where the XID is not running according to
* TransactionIdIsInProgress, and onlookers would be entitled to assume
* the xact crashed. Instead we have a window where the same XID appears
* twice in ProcArray, which is OK.
*/
MarkAsPrepared(gxact);

It doesn't sound too hard to essentially move PrepareTransaction()'s
ProcArrayClearTransaction() into MarkAsPrepared() and rejigger the
locking to remove the intermediate state. But I think it'll lead to
bigger changes than we'd be comfortable backpatching.

If we don't, aren't we letting other backends see non-self-consistent
state in regards to who holds which locks, for example?

I think that actually works out ok, because the locks aren't owned by
xids/xacts, but procs. Otherwise we'd be in deep trouble in
CommitTransaction() as well where ProcArrayEndTransaction() clearing
that state.
After the whole xid transfer, there's PostPrepare_Locks() transferring
the locks.

Brr.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#3)
Re: Problem with txid_snapshot_in/out() functionality

On 12 Apr 2014 08:35, "Jan Wieck" <jan@wi3ck.info> wrote:

On 04/12/14 03:27, Heikki Linnakangas wrote:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

Hackers,

Hmm. Do we snapshots to be stored in tables, and included in a dump? I
don't think we can guarantee that will work, at least not across
versions, as the way we handle snapshot internally can change.

At least Londiste and Slony do store snapshots as well as xids in tables

and assuming that the txid epoch is properly bumped, that information is
useful and valid after a restore.

As I understand it the epoch increments whenever the xid wraps.

A physical restore would continue the same xid space in the same epoch
which should work fine as long as no system stores any txids outside the
database from the "future".

A pg_restore would start a new xid space from FirstNormalXid which would
obviously not work with any stored txids.

#9Jan Wieck
JanWieck@Yahoo.com
In reply to: Bruce Momjian (#8)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/14 10:09, Greg Stark wrote:

A pg_restore would start a new xid space from FirstNormalXid which would
obviously not work with any stored txids.

http://www.postgresql.org/docs/9.3/static/app-pgresetxlog.html

Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Jan Wieck (#9)
Re: Problem with txid_snapshot_in/out() functionality

On 2014-04-12 11:15:09 -0400, Jan Wieck wrote:

On 04/12/14 10:09, Greg Stark wrote:

A pg_restore would start a new xid space from FirstNormalXid which would
obviously not work with any stored txids.

http://www.postgresql.org/docs/9.3/static/app-pgresetxlog.html

Using that as part of any sort of routine task IMNSHO is a seriously bad
idea.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Jan Wieck
JanWieck@Yahoo.com
In reply to: Andres Freund (#10)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/14 11:18, Andres Freund wrote:

On 2014-04-12 11:15:09 -0400, Jan Wieck wrote:

On 04/12/14 10:09, Greg Stark wrote:

A pg_restore would start a new xid space from FirstNormalXid which would
obviously not work with any stored txids.

http://www.postgresql.org/docs/9.3/static/app-pgresetxlog.html

Using that as part of any sort of routine task IMNSHO is a seriously bad
idea.

Nobody is advocating doing so.

Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jan Wieck
JanWieck@Yahoo.com
In reply to: Andres Freund (#7)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/14 10:03, Andres Freund wrote:

On 2014-04-12 09:47:24 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.
The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.

Hm, yeah, but why is that intermediate state visible to anyone else?
Don't we have exclusive lock on the PGPROC array while we're doing this?

It's done outside the remit of ProcArray lock :(. And documented to lead
to duplicate xids in PGXACT.
EndPrepare():
/*
* Mark the prepared transaction as valid. As soon as xact.c marks
* MyPgXact as not running our XID (which it will do immediately after
* this function returns), others can commit/rollback the xact.
*
* NB: a side effect of this is to make a dummy ProcArray entry for the
* prepared XID. This must happen before we clear the XID from MyPgXact,
* else there is a window where the XID is not running according to
* TransactionIdIsInProgress, and onlookers would be entitled to assume
* the xact crashed. Instead we have a window where the same XID appears
* twice in ProcArray, which is OK.
*/
MarkAsPrepared(gxact);

It doesn't sound too hard to essentially move PrepareTransaction()'s
ProcArrayClearTransaction() into MarkAsPrepared() and rejigger the
locking to remove the intermediate state. But I think it'll lead to
bigger changes than we'd be comfortable backpatching.

If we don't, aren't we letting other backends see non-self-consistent
state in regards to who holds which locks, for example?

I think that actually works out ok, because the locks aren't owned by
xids/xacts, but procs. Otherwise we'd be in deep trouble in
CommitTransaction() as well where ProcArrayEndTransaction() clearing
that state.
After the whole xid transfer, there's PostPrepare_Locks() transferring
the locks.

Since it doesn't seem to produce any side effects, I'd think that making
the snapshot unique within txid_current_snapshot() and filtering
duplicates on input should be sufficient and eligible for backpatching.

The attached patch adds a unique loop to the internal sort_snapshot()
function and skips duplicates on input. The git commit is here:

https://github.com/wieck/postgres/commit/a88a2b2c25b856478d7e2b012fc718106338fe00

Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

Attachments:

txid_snapshot_filter_duplicate_xip.difftext/plain; charset=us-ascii; name=txid_snapshot_filter_duplicate_xip.diffDownload+33-16
#13Marko Kreen
markokr@gmail.com
In reply to: Jan Wieck (#12)
Re: Problem with txid_snapshot_in/out() functionality

On Sat, Apr 12, 2014 at 02:10:13PM -0400, Jan Wieck wrote:

Since it doesn't seem to produce any side effects, I'd think that
making the snapshot unique within txid_current_snapshot() and
filtering duplicates on input should be sufficient and eligible for
backpatching.

Agreed.

The attached patch adds a unique loop to the internal
sort_snapshot() function and skips duplicates on input. The git
commit is here:

https://github.com/wieck/postgres/commit/a88a2b2c25b856478d7e2b012fc718106338fe00

static void
sort_snapshot(TxidSnapshot *snap)
{
+ 	txid	last = 0;
+ 	int		nxip, idx1, idx2;
+ 
if (snap->nxip > 1)
+ 	{
qsort(snap->xip, snap->nxip, sizeof(txid), cmp_txid);
+ 		nxip = snap->nxip;
+ 		idx1 = idx2 = 0;
+ 		while (idx1 < nxip)
+ 		{
+ 			if (snap->xip[idx1] != last)
+ 				last = snap->xip[idx2++] = snap->xip[idx1];
+ 			else
+ 				snap->nxip--;
+ 			idx1++;
+ 		}
+ 	}
}

I think you need to do SET_VARSIZE also here. Alternative is to
move SET_VARSIZE after sort_snapshot().

And it seems the drop-double-txid logic should be added also to
txid_snapshot_recv(). It seems weird to have it behave differently
from txid_snapshot_in().

--
marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jan Wieck
JanWieck@Yahoo.com
In reply to: Marko Kreen (#13)
Re: Problem with txid_snapshot_in/out() functionality

On 04/13/14 08:27, Marko Kreen wrote:

On Sat, Apr 12, 2014 at 02:10:13PM -0400, Jan Wieck wrote:

Since it doesn't seem to produce any side effects, I'd think that
making the snapshot unique within txid_current_snapshot() and
filtering duplicates on input should be sufficient and eligible for
backpatching.

Agreed.

The attached patch adds a unique loop to the internal
sort_snapshot() function and skips duplicates on input. The git
commit is here:

https://github.com/wieck/postgres/commit/a88a2b2c25b856478d7e2b012fc718106338fe00

static void
sort_snapshot(TxidSnapshot *snap)
{
+ 	txid	last = 0;
+ 	int		nxip, idx1, idx2;
+
if (snap->nxip > 1)
+ 	{
qsort(snap->xip, snap->nxip, sizeof(txid), cmp_txid);
+ 		nxip = snap->nxip;
+ 		idx1 = idx2 = 0;
+ 		while (idx1 < nxip)
+ 		{
+ 			if (snap->xip[idx1] != last)
+ 				last = snap->xip[idx2++] = snap->xip[idx1];
+ 			else
+ 				snap->nxip--;
+ 			idx1++;
+ 		}
+ 	}
}

I think you need to do SET_VARSIZE also here. Alternative is to
move SET_VARSIZE after sort_snapshot().

And it seems the drop-double-txid logic should be added also to
txid_snapshot_recv(). It seems weird to have it behave differently
from txid_snapshot_in().

Thanks,

yes on both issues. Will create another patch.

Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#7)
Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: Problem with txid_snapshot_in/out() functionality)

On 04/12/2014 05:03 PM, Andres Freund wrote:

If we don't, aren't we letting other backends see non-self-consistent
state in regards to who holds which locks, for example?

I think that actually works out ok, because the locks aren't owned by
xids/xacts, but procs. Otherwise we'd be in deep trouble in
CommitTransaction() as well where ProcArrayEndTransaction() clearing
that state.
After the whole xid transfer, there's PostPrepare_Locks() transferring
the locks.

Right.

However, I just noticed that there's a race condition between PREPARE
TRANSACTION and COMMIT/ROLLBACK PREPARED. PostPrepare_Locks runs after
the prepared transaction is already marked as fully prepared. That means
that by the time we get to PostPrepare_Locks, another backend might
already have finished and removed the prepared transaction. That leads
to a PANIC (put a breakpoint just before PostPrepare_Locks):

postgres=# commit prepared 'foo';
PANIC: failed to re-find shared proclock object
PANIC: failed to re-find shared proclock object
The connection to the server was lost. Attempting reset: Failed.

FinishPrepareTransaction reads the list of locks from the two-phase
state file, but PANICs when it doesn't find the corresponding locks in
the lock manager (because PostPrepare_Locks hasn't transfered them to
the dummy PGPROC yet).

I think we'll need to transfer of the locks earlier, before the
transaction is marked as fully prepared. I'll take a closer look at this
tomorrow.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Jan Wieck
JanWieck@Yahoo.com
In reply to: Jan Wieck (#14)
Re: Problem with txid_snapshot_in/out() functionality

On 04/13/14 14:22, Jan Wieck wrote:

On 04/13/14 08:27, Marko Kreen wrote:

I think you need to do SET_VARSIZE also here. Alternative is to
move SET_VARSIZE after sort_snapshot().

And it seems the drop-double-txid logic should be added also to
txid_snapshot_recv(). It seems weird to have it behave differently
from txid_snapshot_in().

Thanks,

yes on both issues. Will create another patch.

New patch attached.

New github commit is
https://github.com/wieck/postgres/commit/b8fd0d2eb78791e5171e34aecd233fd06218890d

Thanks again,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

Attachments:

txid_snapshot_filter_duplicate_xip_v2.difftext/plain; charset=us-ascii; name=txid_snapshot_filter_duplicate_xip_v2.diffDownload+56-27
#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#15)
Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: Problem with txid_snapshot_in/out() functionality)

On 04/13/2014 11:39 PM, Heikki Linnakangas wrote:

However, I just noticed that there's a race condition between PREPARE
TRANSACTION and COMMIT/ROLLBACK PREPARED. PostPrepare_Locks runs after
the prepared transaction is already marked as fully prepared. That means
that by the time we get to PostPrepare_Locks, another backend might
already have finished and removed the prepared transaction. That leads
to a PANIC (put a breakpoint just before PostPrepare_Locks):

postgres=# commit prepared 'foo';
PANIC: failed to re-find shared proclock object
PANIC: failed to re-find shared proclock object
The connection to the server was lost. Attempting reset: Failed.

FinishPrepareTransaction reads the list of locks from the two-phase
state file, but PANICs when it doesn't find the corresponding locks in
the lock manager (because PostPrepare_Locks hasn't transfered them to
the dummy PGPROC yet).

I think we'll need to transfer of the locks earlier, before the
transaction is marked as fully prepared. I'll take a closer look at this
tomorrow.

Here's a patch to do that. It's very straightforward, I just moved the
calls to transfer locks earlier, before ProcArrayClearTransaction.
PostPrepare_MultiXact had a similar race - it also transfer state from
the old PGPROC entry to the new, and needs to be done before allowing
another backend to remove the new PGPROC entry. I changed the names of
the functions to distinguish them from the other PostPrepare_* functions
that now happen at a different time.

The patch is simple, but it's a bit scary to change the order of things
like this. Looking at all the calls that now happen after transferring
the locks, I believe this is OK. The change also applies to the
callbacks called by the RegisterXactCallback mechanism, which means that
in theory there might be a 3rd party extension out there that's
affected. All the callbacks in contrib and plpgsql are OK, and it's
questionable to do anything complicated that would depend on
heavy-weight locks to be held in those callbacks, so I think this is OK.
Warrants a note in the release notes, though.

- Heikki

Attachments:

fix-prepare-transaction-race-1.patchtext/x-diff; name=fix-prepare-transaction-race-1.patchDownload+27-11
#18Marko Kreen
markokr@gmail.com
In reply to: Jan Wieck (#16)
Re: Problem with txid_snapshot_in/out() functionality

On Sun, Apr 13, 2014 at 05:46:20PM -0400, Jan Wieck wrote:

On 04/13/14 14:22, Jan Wieck wrote:

On 04/13/14 08:27, Marko Kreen wrote:

I think you need to do SET_VARSIZE also here. Alternative is to
move SET_VARSIZE after sort_snapshot().

And it seems the drop-double-txid logic should be added also to
txid_snapshot_recv(). It seems weird to have it behave differently
from txid_snapshot_in().

Thanks,

yes on both issues. Will create another patch.

New patch attached.

New github commit is https://github.com/wieck/postgres/commit/b8fd0d2eb78791e5171e34aecd233fd06218890d

Looks OK to me.

--
marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#7)
Re: Problem with txid_snapshot_in/out() functionality

On 04/12/2014 05:03 PM, Andres Freund wrote:

On 2014-04-12 09:47:24 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 04/12/2014 12:07 AM, Jan Wieck wrote:

the Slony team has been getting seldom reports of a problem with the
txid_snapshot data type.
The symptom is that a txid_snapshot on output lists the same txid
multiple times in the xip list part of the external representation.

It's two-phase commit. When preparing a transaction, the state of the
transaction is first transfered to a dummy PGXACT entry, and then the
PGXACT entry of the backend is cleared. There is a transient state when
both PGXACT entries have the same xid.

Hm, yeah, but why is that intermediate state visible to anyone else?
Don't we have exclusive lock on the PGPROC array while we're doing this?

It's done outside the remit of ProcArray lock :(. And documented to lead
to duplicate xids in PGXACT.
EndPrepare():
/*
* Mark the prepared transaction as valid. As soon as xact.c marks
* MyPgXact as not running our XID (which it will do immediately after
* this function returns), others can commit/rollback the xact.
*
* NB: a side effect of this is to make a dummy ProcArray entry for the
* prepared XID. This must happen before we clear the XID from MyPgXact,
* else there is a window where the XID is not running according to
* TransactionIdIsInProgress, and onlookers would be entitled to assume
* the xact crashed. Instead we have a window where the same XID appears
* twice in ProcArray, which is OK.
*/
MarkAsPrepared(gxact);

It doesn't sound too hard to essentially move PrepareTransaction()'s
ProcArrayClearTransaction() into MarkAsPrepared() and rejigger the
locking to remove the intermediate state. But I think it'll lead to
bigger changes than we'd be comfortable backpatching.

Hmm. There's a field in GlobalTransactionData called locking_xid, which
is used to mark the XID of the transaction that's currently operating on
the prepared transaction. At prepare, that ensures that the transaction
cannot be committed or rolled back by another backend until the original
backend has cleared its PGPROC entry. At COMMIT/ROLLBACK PREPARED, it
ensures that only one backend can commit/rollback the transaction.

I wonder why we don't use a VirtualTransactionId there. AFAICS there is
no reason for COMMIT/ROLLBACK PREPARED to be assigned an XID of its own.
And if we used a VirtualTransactionId there, prepare could clear the xid
field of the PGPROC entry earlier.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#19)
Re: Problem with txid_snapshot_in/out() functionality

On 2014-04-14 12:15:30 +0300, Heikki Linnakangas wrote:

Hmm. There's a field in GlobalTransactionData called locking_xid, which is
used to mark the XID of the transaction that's currently operating on the
prepared transaction. At prepare, that ensures that the transaction cannot
be committed or rolled back by another backend until the original backend
has cleared its PGPROC entry. At COMMIT/ROLLBACK PREPARED, it ensures that
only one backend can commit/rollback the transaction.

I wonder why we don't use a VirtualTransactionId there.

I wondered about that previously as well. My bet it's because the 2pc
support arrived before the virtualxact stuff...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#17)
#22Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#22)
#24Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#23)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#21)
#26Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#25)
#27Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#27)
#29Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#28)
#30Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Marko Kreen (#18)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#29)