StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Started by Tom Laneover 8 years ago23 messages

tgl@sss.pgh.pa.us

over 8 years ago

Now that we've got consistent failure reports about the 009_twophase.pl
recovery test, I set out to find out why it's failing. It looks to me
like the reason is that this (twophase.c:2145):

SubTransSetParent(xid, subxid, overwriteOK);

ought to be this:

SubTransSetParent(subxid, xid, overwriteOK);

because the definition of SubTransSetParent is

void
SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)

not the other way 'round.

While "git blame" blames this line on the recent commit 728bd991c,
that just moved the call from somewhere else. AFAICS this has actually
been wrong since StandbyRecoverPreparedTransactions was written,
in 361bd1662 of 2010-04-13.

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#1)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 2017-04-22 19:55:18 -0400, Tom Lane wrote:

Now that we've got consistent failure reports about the 009_twophase.pl
recovery test, I set out to find out why it's failing. It looks to me
like the reason is that this (twophase.c:2145):

SubTransSetParent(xid, subxid, overwriteOK);

ought to be this:

SubTransSetParent(subxid, xid, overwriteOK);

because the definition of SubTransSetParent is

void
SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)

not the other way 'round.

While "git blame" blames this line on the recent commit 728bd991c,
that just moved the call from somewhere else. AFAICS this has actually
been wrong since StandbyRecoverPreparedTransactions was written,
in 361bd1662 of 2010-04-13.

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Yikes. This is clearly way undertested. It's also pretty scary that
the code has recently been whacked out quite heavily (both 9.6 and
master), without hitting anything around this - doesn't seem to bode
well for how in-depth the testing was.

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

Hm. I think it can cause wrong tqual.c results in some edge cases.
During HS, lastOverflowedXid will be set in some cases, and then
XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
turn cause lookups snapshot->subxip (where all HS xids reside)
potentially return wrong results.

I was about to say that I don't see how it could result in persistent
corruption however - the subtrans lookups are only done for
(snapshot->xmin, snapshot->xmax] and subtrans is truncated
regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
anymore, so that might be delayed. Hm.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#1)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 23 April 2017 at 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Now that we've got consistent failure reports about the 009_twophase.pl
recovery test, I set out to find out why it's failing. It looks to me
like the reason is that this (twophase.c:2145):

SubTransSetParent(xid, subxid, overwriteOK);

ought to be this:

SubTransSetParent(subxid, xid, overwriteOK);

because the definition of SubTransSetParent is

void
SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)

not the other way 'round.

While "git blame" blames this line on the recent commit 728bd991c,
that just moved the call from somewhere else. AFAICS this has actually
been wrong since StandbyRecoverPreparedTransactions was written,
in 361bd1662 of 2010-04-13.

Thanks for finding that.

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

Well, strangely for different reason I was looking at that particular
call a couple of days back. I didn't spot that issue, but I was
thinking why we even make that call at that time. My conclusion was
that it could be optimized away mostly, since it isn't needed very
often, but its not really worth optimizing.

SubTransSetParent() only matters for the case where we are half-way
through a commit with a large commit. Since we do atomic updates of
commit and subcommmit when on same page, this problem would only
matter when top level xid and other subxids were separated across
multiple pages and the issue would only affect a read only query
checking visibility during the commit for that prepared transaction.
Furthermore, the nature of the corruption is that we set the xid to
point to the subxid; since we never mark a top-level commit as
subcommitted, subtrans would never be consulted and so the corruption
would not lead to any incorrect answer even during the window of
exposure. So it looks to me like this error shouldn't cause a problem.

We can fix that, but...

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Not sure about that, investigating.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#2)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 23 April 2017 at 01:19, Andres Freund <andres@anarazel.de> wrote:

On 2017-04-22 19:55:18 -0400, Tom Lane wrote:

Now that we've got consistent failure reports about the 009_twophase.pl
recovery test, I set out to find out why it's failing. It looks to me
like the reason is that this (twophase.c:2145):

SubTransSetParent(xid, subxid, overwriteOK);

ought to be this:

SubTransSetParent(subxid, xid, overwriteOK);

because the definition of SubTransSetParent is

void
SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)

not the other way 'round.

While "git blame" blames this line on the recent commit 728bd991c,
that just moved the call from somewhere else. AFAICS this has actually
been wrong since StandbyRecoverPreparedTransactions was written,
in 361bd1662 of 2010-04-13.

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Yikes. This is clearly way undertested. It's also pretty scary that
the code has recently been whacked out quite heavily (both 9.6 and
master), without hitting anything around this - doesn't seem to bode
well for how in-depth the testing was.

Obviously if there is a bug it's because tests didn't find it and
therefore it is by definition undertested for that specific bug.

I'm not sure what other conclusion you wish to draw, if any?

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

Hm. I think it can cause wrong tqual.c results in some edge cases.
During HS, lastOverflowedXid will be set in some cases, and then
XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
turn cause lookups snapshot->subxip (where all HS xids reside)
potentially return wrong results.

I was about to say that I don't see how it could result in persistent
corruption however - the subtrans lookups are only done for
(snapshot->xmin, snapshot->xmax] and subtrans is truncated
regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
anymore, so that might be delayed. Hm.

I've not found any reason, yet, to believe we return wrong answers in
any case, even though the transient data structure pg_subtrans is
corrupted by the switched call Tom discovers.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Simon Riggs (#3)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Simon Riggs <simon@2ndquadrant.com> writes:

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Not sure about that, investigating.

As a quick hack, I just hotwired RecoverPreparedTransactions to set
overwriteOK = true always, and with that and the SubTransSetParent
argument-order fix, HEAD passes the recovery tests. Maybe we can
be smarter than that, but this might be a good short-term fix to get
the buildfarm green again.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Simon Riggs (#3)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Simon Riggs <simon@2ndquadrant.com> writes:

On 23 April 2017 at 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

SubTransSetParent() only matters for the case where we are half-way
through a commit with a large commit. Since we do atomic updates of
commit and subcommmit when on same page, this problem would only
matter when top level xid and other subxids were separated across
multiple pages and the issue would only affect a read only query
checking visibility during the commit for that prepared transaction.
Furthermore, the nature of the corruption is that we set the xid to
point to the subxid; since we never mark a top-level commit as
subcommitted, subtrans would never be consulted and so the corruption
would not lead to any incorrect answer even during the window of
exposure. So it looks to me like this error shouldn't cause a problem.

Still trying to wrap my head around this argument ... I agree that
incorrectly setting the parent's pg_subtrans entry can't cause a
visible problem, because it will never be consulted. However, failing
to set the child's entry seems like it could cause transient problems.
There would only be a risk during the eventual commit of the prepared
transaction, and only when the pg_xact entries span multiple pages,
but then there would be a window where the child xact is marked
subcommitted but it has a zero entry in pg_subtrans. That would
result in a WARNING from TransactionIdDidCommit, and a false result,
which depending on timing might mean that commit of the overall
transaction appears non-atomic to onlookers. (That is, the parent
xact might already appear committed to them, but the subtransaction
not yet.)

I can't find any indication in the archives that anyone's ever reported
seeing that WARNING, though that might only mean that they'd not noticed
it. But in any case, it seems like the worst consequence is a narrow
window for seeing non-atomic commit of a prepared transaction on a standby
server. That seems pretty unlikely to cause real-world problems.

The other point about overwriteOK not being set when it needs to be
also seems like it could not cause any problems in production builds,
since that flag is only examined by an Assert.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#5)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 23 April 2017 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Not sure about that, investigating.

As a quick hack, I just hotwired RecoverPreparedTransactions to set
overwriteOK = true always, and with that and the SubTransSetParent
argument-order fix, HEAD passes the recovery tests. Maybe we can
be smarter than that, but this might be a good short-term fix to get
the buildfarm green again.

That would work. I've been looking into a fix I can explain, but "do
it always" may actually be it.

OK, I'll do that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Simon Riggs (#7)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 23 April 2017 at 18:41, Simon Riggs <simon@2ndquadrant.com> wrote:

On 23 April 2017 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Not sure about that, investigating.

As a quick hack, I just hotwired RecoverPreparedTransactions to set
overwriteOK = true always, and with that and the SubTransSetParent
argument-order fix, HEAD passes the recovery tests. Maybe we can
be smarter than that, but this might be a good short-term fix to get
the buildfarm green again.

That would work. I've been looking into a fix I can explain, but "do
it always" may actually be it.

OK, I'll do that.

Done, tests pass again for me now.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Simon Riggs (#4)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Hi,

On 2017-04-23 11:05:44 +0100, Simon Riggs wrote:

Yikes. This is clearly way undertested. It's also pretty scary that
the code has recently been whacked out quite heavily (both 9.6 and
master), without hitting anything around this - doesn't seem to bode
well for how in-depth the testing was.

Obviously if there is a bug it's because tests didn't find it and
therefore it is by definition undertested for that specific bug.

Sure. But there's bugs that are hard to catch, and there's bugs that are
easily testable. This seems to largely fall into the "relatively easy to
test" camp.

I'm not sure what other conclusion you wish to draw, if any?

That the changes around whacking around twophase.c in several of the
last releases, didn't yield enough verified testing infrastructure.

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

Hm. I think it can cause wrong tqual.c results in some edge cases.
During HS, lastOverflowedXid will be set in some cases, and then
XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
turn cause lookups snapshot->subxip (where all HS xids reside)
potentially return wrong results.

I was about to say that I don't see how it could result in persistent
corruption however - the subtrans lookups are only done for
(snapshot->xmin, snapshot->xmax] and subtrans is truncated
regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
anymore, so that might be delayed. Hm.

I've not found any reason, yet, to believe we return wrong answers in
any case, even though the transient data structure pg_subtrans is
corrupted by the switched call Tom discovers.

I think I pointed out a danger above. Consider what happens if query on
a standby has a suboverflowed snapshot:
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...
if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
suboverflowed = true;
}
..
snapshot->suboverflowed = suboverflowed;
}

In that case we rely on pg_subtrans for visibility determinations:
bool
HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
Buffer buffer)
{
...
if (!HeapTupleHeaderXminCommitted(tuple))
{
...
else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
return false;

and
static bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
...
if (!snapshot->takenDuringRecovery)
{
...
else
{
...
if (snapshot->suboverflowed)
{
/*
* Snapshot overflowed, so convert xid to top-level. This is safe
* because we eliminated too-old XIDs above.
*/
xid = SubTransGetTopmostTransaction(xid);

if the subxid->xid mapping doesn't actually exist - as it's the case
with this bug afaics - we'll not get the correct toplevel
transaction. Which'll mean the following block:
/*
* We now have either a top-level xid higher than xmin or an
* indeterminate xid. We don't know whether it's top level or subxact
* but it doesn't matter. If it's present, the xid is visible.
*/
for (j = 0; j < snapshot->subxcnt; j++)
{
if (TransactionIdEquals(xid, snapshot->subxip[j]))
return true;
}
won't work correctly if suboverflowed.

I don't see anything prevent wrong results here?

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#9)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 24 April 2017 at 00:25, Andres Freund <andres@anarazel.de> wrote:

It's not clear to me how much potential this has to create user data
corruption, but it doesn't look good at first glance. Discuss.

Hm. I think it can cause wrong tqual.c results in some edge cases.
During HS, lastOverflowedXid will be set in some cases, and then
XidInMVCCSnapshot etc will not find the actual toplevel xid, which'll in
turn cause lookups snapshot->subxip (where all HS xids reside)
potentially return wrong results.

I was about to say that I don't see how it could result in persistent
corruption however - the subtrans lookups are only done for
(snapshot->xmin, snapshot->xmax] and subtrans is truncated
regularly. But these days CHECKPOINT_END_OF_RECOVERY isn't obligatory
anymore, so that might be delayed. Hm.

I've not found any reason, yet, to believe we return wrong answers in
any case, even though the transient data structure pg_subtrans is
corrupted by the switched call Tom discovers.

I think I pointed out a danger above. Consider what happens if query on
a standby has a suboverflowed snapshot:
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...
if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
suboverflowed = true;
}
..
snapshot->suboverflowed = suboverflowed;
}

In that case we rely on pg_subtrans for visibility determinations:
bool
HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
Buffer buffer)
{
...
if (!HeapTupleHeaderXminCommitted(tuple))
{
...
else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
return false;

and
static bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
...
if (!snapshot->takenDuringRecovery)
{
...
else
{
...
if (snapshot->suboverflowed)
{
/*
* Snapshot overflowed, so convert xid to top-level. This is safe
* because we eliminated too-old XIDs above.
*/
xid = SubTransGetTopmostTransaction(xid);

if the subxid->xid mapping doesn't actually exist - as it's the case
with this bug afaics - we'll not get the correct toplevel
transaction.

The nature of the corruption is that in some cases
* a subxid will point to nothing (even though in most cases it was
already set correctly)
* the parent will point to a subxid

So both wrong.

Which'll mean the following block:
/*
* We now have either a top-level xid higher than xmin or an
* indeterminate xid. We don't know whether it's top level or subxact
* but it doesn't matter. If it's present, the xid is visible.
*/
for (j = 0; j < snapshot->subxcnt; j++)
{
if (TransactionIdEquals(xid, snapshot->subxip[j]))
return true;
}
won't work correctly if suboverflowed.

Your example of snapshots taken during recovery is not correct.

Note that SubTransGetTopmostTransaction() returns a valid, running
xid, even though it is the wrong one.

Snapshots work differently on standbys - we store all known running
xids, so the test still passes correctly, even when overflowed.

I'd call that just plain luck. We behave correctly, but for the wrong
reasons, at least in this case.

I don't see anything prevent wrong results here?

I've had an even better look around now and I think I've found
something but need to turn it into a repeatable test case so I can
double-check this before reporting in full, so I don't cry wolf.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Nikhil Sontakke

nikhils@2ndquadrant.com

over 8 years ago

In reply to: Simon Riggs (#8)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to

be

that RecoverPreparedTransactions's calculation of overwriteOK is

wrong:

it's computing that as "false", but in reality the subtrans link in
question has already been set.

Not sure about that, investigating.

As a quick hack, I just hotwired RecoverPreparedTransactions to set
overwriteOK = true always, and with that and the SubTransSetParent
argument-order fix, HEAD passes the recovery tests. Maybe we can
be smarter than that, but this might be a good short-term fix to get
the buildfarm green again.

That would work. I've been looking into a fix I can explain, but "do
it always" may actually be it.

Here's what's happening:

On Master:

BEGIN;

INSERT INTO t_009_tbl VALUES (42);

SAVEPOINT s1;

INSERT INTO t_009_tbl VALUES (43);

PREPARE TRANSACTION 'xact_009_1';
On Master:

Do a fast shutdown

On Slave:

Restart the slave. This causes StandbyRecoverPreparedTransactions() to be
invoked which causes ProcessTwoPhaseBuffer() to be invoked with setParent
set to true. This sets up parent xid for the above subtransaction.

On Slave:

Promote the slave. This causes RecoverPreparedTransactions() to be invoked
which ends up calling SubTransSetParent() for the above subxid. The
overwriteOK bool remains false since we don't go over the
PGPROC_MAX_CACHED_SUBXIDS limit. Since the parent xid was already set on
restart above, we hit the assert.

-------------------

I am wondering if StandbyRecoverPreparedTransactions() needs the parent
linkage at all? I don't see SubTransGetParent() and related functions being
called on the standby but we need to be careful here. As a quick test, If I
don't call SubTransSetParent() as part of
StandbyRecoverPreparedTransactions() then all recovery tests pass ok. But
the fact that StandbyRecoverPreparedTransactions() takes overwriteOK as a
parameter means it might not be so simple..

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

#12

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Simon Riggs (#10)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 2017-04-24 13:29:11 +0100, Simon Riggs wrote:

On 24 April 2017 at 00:25, Andres Freund <andres@anarazel.de> wrote:

if the subxid->xid mapping doesn't actually exist - as it's the case
with this bug afaics - we'll not get the correct toplevel
transaction.

The nature of the corruption is that in some cases
* a subxid will point to nothing (even though in most cases it was
already set correctly)
* the parent will point to a subxid

Right. Those cases aren't that different from the point of trying to
find the parent of an subxid.

Which'll mean the following block:
/*
* We now have either a top-level xid higher than xmin or an
* indeterminate xid. We don't know whether it's top level or subxact
* but it doesn't matter. If it's present, the xid is visible.
*/
for (j = 0; j < snapshot->subxcnt; j++)
{
if (TransactionIdEquals(xid, snapshot->subxip[j]))
return true;
}
won't work correctly if suboverflowed.

Your example of snapshots taken during recovery is not correct.

Oh?

Note that SubTransGetTopmostTransaction() returns a valid, running
xid, even though it is the wrong one.

Sure.

Snapshots work differently on standbys - we store all known running
xids, so the test still passes correctly, even when overflowed.

I don't think that's generally true. Isn't that precisely what
ProcArrayStruct->lastOverflowedXid is about? If we have a snapshot
that's suboverflowed due to the lastOverflowedXid cutoff, then we the
subxip array does *not* contain all known running xids anymore, we rely
on pg_subtrans to only guarantee that toplevel xids are stored in the
KnownAssignedXids machinery.

See:
* When we throw away subXIDs from KnownAssignedXids, we need to keep track of
* that, similarly to tracking overflow of a PGPROC's subxids array. We do
* that by remembering the lastOverflowedXID, ie the last thrown-away subXID.
* As long as that is within the range of interesting XIDs, we have to assume
* that subXIDs are missing from snapshots. (Note that subXID overflow occurs
* on primary when 65th subXID arrives, whereas on standby it occurs when 64th
* subXID arrives - that is not an error.)

/*
* Highest subxid that has been removed from KnownAssignedXids array to
* prevent overflow; or InvalidTransactionId if none. We track this for
* similar reasons to tracking overflowing cached subxids in PGXACT
* entries. Must hold exclusive ProcArrayLock to change this, and shared
* lock to read it.
*/
TransactionId lastOverflowedXid;

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#5)

1 attachment(s)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 23 April 2017 at 17:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Also, when I fix that, it gets further but still crashes at the same
Assert in SubTransSetParent. The proximate cause this time seems to be
that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
it's computing that as "false", but in reality the subtrans link in
question has already been set.

Not sure about that, investigating.

As a quick hack, I just hotwired RecoverPreparedTransactions to set
overwriteOK = true always, and with that and the SubTransSetParent
argument-order fix, HEAD passes the recovery tests. Maybe we can
be smarter than that, but this might be a good short-term fix to get
the buildfarm green again.

So my thoughts are now that this is indeed the long term fix.

I can't think of why it would be necessary to call SubTransSetParent()
twice with two different values. A subtransaction's top level xid
never changes, so pg_subtrans should be either zero or the xid of the
parent and never anything else.

I can't see any reason now why overwriteOK should exist at all. I'm
guessing that the whole "overwriteOK" idea was an incorrect response
to xids appearing where they shouldn't have done because of the
mistake you just corrected. So I will now remove the parameter from
the call.

(I'll address the discussion of the impact of that bug in separate post).

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

2PC_remove_overwriteOK.v1.patchapplication/octet-stream; name=2PC_remove_overwriteOK.v1.patchDownload

diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 70828e5170..12fc3d9abe 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -68,11 +68,9 @@ static bool SubTransPagePrecedes(int page1, int page2);
 
 /*
  * Record the parent of a subtransaction in the subtrans log.
- *
- * In some cases we may need to overwrite an existing value.
  */
 void
-SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
+SubTransSetParent(TransactionId xid, TransactionId parent)
 {
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
@@ -88,12 +86,13 @@ SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
 	ptr += entryno;
 
 	/* Current state should be 0 */
-	Assert(*ptr == InvalidTransactionId ||
-		   (*ptr == parent && overwriteOK));
-
-	*ptr = parent;
+	Assert(*ptr == InvalidTransactionId || *ptr == parent);
 
-	SubTransCtl->shared->page_dirty[slotno] = true;
+	if (*ptr == InvalidTransactionId)
+	{
+		*ptr = parent;
+		SubTransCtl->shared->page_dirty[slotno] = true;
+	}
 
 	LWLockRelease(SubtransControlLock);
 }
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index fa7124d903..d11986cc76 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -221,8 +221,7 @@ static void RemoveGXact(GlobalTransaction gxact);
 static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len);
 static char *ProcessTwoPhaseBuffer(TransactionId xid,
 							XLogRecPtr	prepare_start_lsn,
-							bool fromdisk, bool overwriteOK, bool setParent,
-							bool setNextXid);
+							bool fromdisk, bool setParent, bool setNextXid);
 static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid,
 				const char *gid, TimestampTz prepared_at, Oid owner,
 				Oid databaseid);
@@ -1743,8 +1742,7 @@ restoreTwoPhaseData(void)
 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
 
 			buf = ProcessTwoPhaseBuffer(xid, InvalidXLogRecPtr,
-										true, false, false,
-										false);
+										true, false, false);
 			if (buf == NULL)
 				continue;
 
@@ -1804,8 +1802,7 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
 
 		buf = ProcessTwoPhaseBuffer(xid,
 				gxact->prepare_start_lsn,
-				gxact->ondisk, false, false,
-				true);
+				gxact->ondisk, false, true);
 
 		if (buf == NULL)
 			continue;
@@ -1858,12 +1855,12 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
  * This is never called at the end of recovery - we use
  * RecoverPreparedTransactions() at that point.
  *
- * Currently we simply call SubTransSetParent() for any subxids of prepared
- * transactions. If overwriteOK is true, it's OK if some XIDs have already
- * been marked in pg_subtrans.
+ * The lack of calls to SubTransSetParent() calls here is by design;
+ * those calls are made by RecoverPreparedTransactions() at the end of recovery
+ * for those xacts that need this.
  */
 void
-StandbyRecoverPreparedTransactions(bool overwriteOK)
+StandbyRecoverPreparedTransactions(void)
 {
 	int			i;
 
@@ -1880,8 +1877,7 @@ StandbyRecoverPreparedTransactions(bool overwriteOK)
 
 		buf = ProcessTwoPhaseBuffer(xid,
 				gxact->prepare_start_lsn,
-				gxact->ondisk, overwriteOK, true,
-				false);
+				gxact->ondisk, true, false);
 		if (buf != NULL)
 			pfree(buf);
 	}
@@ -1895,6 +1891,13 @@ StandbyRecoverPreparedTransactions(bool overwriteOK)
  * each prepared transaction (reacquire locks, etc).
  *
  * This is run during database startup.
+ *
+ * At the end of recovery the way we take snapshots will change. We now need
+ * to mark all running transactions with their full SubTransSetParent() info
+ * to allow normal snapshots to work correctly if snapshots overflow.
+ * We do this here because by definition prepared transactions are the only
+ * type of write transaction still running, so this is necessary and
+ * complete.
  */
 void
 RecoverPreparedTransactions(void)
@@ -1913,15 +1916,13 @@ RecoverPreparedTransactions(void)
 		TwoPhaseFileHeader *hdr;
 		TransactionId *subxids;
 		const char *gid;
-		bool		overwriteOK = false;
 		int			i;
 
 		xid = gxact->xid;
 
 		buf = ProcessTwoPhaseBuffer(xid,
 				gxact->prepare_start_lsn,
-				gxact->ondisk, false, false,
-				false);
+				gxact->ondisk, false, false);
 		if (buf == NULL)
 			continue;
 
@@ -1940,23 +1941,16 @@ RecoverPreparedTransactions(void)
 		bufptr += MAXALIGN(hdr->ninvalmsgs * sizeof(SharedInvalidationMessage));
 
 		/*
-		 * It's possible that SubTransSetParent has been set before, if
-		 * the prepared transaction generated xid assignment records. Test
-		 * here must match one used in AssignTransactionId().
-		 */
-		if (InHotStandby && (hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS ||
-							 XLogLogicalInfoActive()))
-			overwriteOK = true;
-
-		/*
 		 * Reconstruct subtrans state for the transaction --- needed
 		 * because pg_subtrans is not preserved over a restart.  Note that
 		 * we are linking all the subtransactions directly to the
 		 * top-level XID; there may originally have been a more complex
 		 * hierarchy, but there's no need to restore that exactly.
+		 * It's possible that SubTransSetParent has been set before, if
+		 * the prepared transaction generated xid assignment records.
 		 */
 		for (i = 0; i < hdr->nsubxacts; i++)
-			SubTransSetParent(subxids[i], xid, true);
+			SubTransSetParent(subxids[i], xid);
 
 		/*
 		 * Recreate its GXACT and dummy PGPROC. But, check whether
@@ -2006,8 +2000,7 @@ RecoverPreparedTransactions(void)
  * Given a transaction id, read it either from disk or read it directly
  * via shmem xlog record pointer using the provided "prepare_start_lsn".
  *
- * If setParent is true, then use the overwriteOK parameter to set up
- * subtransaction parent linkages.
+ * If setParent is true, set up subtransaction parent linkages.
  *
  * If setNextXid is true, set ShmemVariableCache->nextXid to the newest
  * value scanned.
@@ -2015,7 +2008,7 @@ RecoverPreparedTransactions(void)
 static char *
 ProcessTwoPhaseBuffer(TransactionId xid,
 					  XLogRecPtr prepare_start_lsn,
-					  bool fromdisk, bool overwriteOK,
+					  bool fromdisk, 
 					  bool setParent, bool setNextXid)
 {
 	TransactionId origNextXid = ShmemVariableCache->nextXid;
@@ -2142,7 +2135,7 @@ ProcessTwoPhaseBuffer(TransactionId xid,
 		}
 
 		if (setParent)
-			SubTransSetParent(subxid, xid, overwriteOK);
+			SubTransSetParent(subxid, xid);
 	}
 
 	return buf;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 92b263aea1..605639b0c3 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -559,7 +559,7 @@ AssignTransactionId(TransactionState s)
 		XactTopTransactionId = s->transactionId;
 
 	if (isSubXact)
-		SubTransSetParent(s->transactionId, s->parent->transactionId, false);
+		SubTransSetParent(s->transactionId, s->parent->transactionId);
 
 	/*
 	 * If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7667879c6..a89d99838a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6930,7 +6930,7 @@ StartupXLOG(void)
 
 				ProcArrayApplyRecoveryInfo(&running);
 
-				StandbyRecoverPreparedTransactions(false);
+				StandbyRecoverPreparedTransactions();
 			}
 		}
 
@@ -9692,7 +9692,7 @@ xlog_redo(XLogReaderState *record)
 
 			ProcArrayApplyRecoveryInfo(&running);
 
-			StandbyRecoverPreparedTransactions(true);
+			StandbyRecoverPreparedTransactions();
 		}
 
 		/* ControlFile->checkPointCopy always tracks the latest ckpt XID */
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ebf6a92923..4976bb03c7 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -943,7 +943,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	 * have attempted to SubTransSetParent().
 	 */
 	for (i = 0; i < nsubxids; i++)
-		SubTransSetParent(subxids[i], topxid, false);
+		SubTransSetParent(subxids[i], topxid);
 
 	/* KnownAssignedXids isn't maintained yet, so we're done for now */
 	if (standbyState == STANDBY_INITIALIZED)
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index bd30f5861f..847359873a 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -14,7 +14,7 @@
 /* Number of SLRU buffers to use for subtrans */
 #define NUM_SUBTRANS_BUFFERS	32
 
-extern void SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK);
+extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
 
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 4d547c5553..80ec4ca4a5 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -46,7 +46,7 @@ extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
 
 extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
 							int *nxids_p);
-extern void StandbyRecoverPreparedTransactions(bool overwriteOK);
+extern void StandbyRecoverPreparedTransactions(void);
 extern void RecoverPreparedTransactions(void);
 
 extern void CheckPointTwoPhase(XLogRecPtr redo_horizon);

#14

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Simon Riggs (#13)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Simon Riggs <simon@2ndquadrant.com> writes:

I can't see any reason now why overwriteOK should exist at all. I'm
guessing that the whole "overwriteOK" idea was an incorrect response
to xids appearing where they shouldn't have done because of the
mistake you just corrected. So I will now remove the parameter from
the call.

Seems reasonable, but I don't like the logic change you made in
SubTransSetParent; you broke the former invariant, for non-Assert
builds, that the target pg_subtrans entry is guaranteed to have
the correct value on exit. I do like fixing it to not dirty the
page unnecessarily, but I'd suggest that we write it like

if (*ptr != parent)
{
Assert(*ptr == InvalidTransactionId);
*ptr = parent;
SubTransCtl->shared->page_dirty[slotno] = true;
}

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#14)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 25 April 2017 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I can't see any reason now why overwriteOK should exist at all. I'm
guessing that the whole "overwriteOK" idea was an incorrect response
to xids appearing where they shouldn't have done because of the
mistake you just corrected. So I will now remove the parameter from
the call.

Seems reasonable, but I don't like the logic change you made in
SubTransSetParent; you broke the former invariant, for non-Assert
builds, that the target pg_subtrans entry is guaranteed to have
the correct value on exit. I do like fixing it to not dirty the
page unnecessarily, but I'd suggest that we write it like

if (*ptr != parent)
{
Assert(*ptr == InvalidTransactionId);
*ptr = parent;
SubTransCtl->shared->page_dirty[slotno] = true;
}

OK, thanks. I'll commit that tomorrow.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#12)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 24 April 2017 at 19:59, Andres Freund <andres@anarazel.de> wrote:

I don't think that's generally true.

If you think that, from a risk perspective it is enough for me to
continue to investigate and I have been doing that.

As I said before I thought I had found a problem.

I'm suggesting we take the approach that if there is a problem we can
recreate it as a way of exploring what conditions are required and
therefore work out the impact. Nikhil Sontakke appears to have
re-created something, but not quite what I had expected. I think he
will post here tomorrow with an update for us to discuss.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Simon Riggs (#16)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 2017-04-25 21:22:44 +0100, Simon Riggs wrote:

On 24 April 2017 at 19:59, Andres Freund <andres@anarazel.de> wrote:

I don't think that's generally true.

If you think that, from a risk perspective it is enough for me to
continue to investigate and I have been doing that.

I'm not sure it's worth digging it all that much - I think we just
shouldn't claim in the release notes that the bug doesn't have any
consequences, but that it's though to only matter in unlikely corner
cases.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Simon Riggs (#15)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On Wed, Apr 26, 2017 at 4:53 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 25 April 2017 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I can't see any reason now why overwriteOK should exist at all. I'm
guessing that the whole "overwriteOK" idea was an incorrect response
to xids appearing where they shouldn't have done because of the
mistake you just corrected. So I will now remove the parameter from
the call.

Seems reasonable, but I don't like the logic change you made in
SubTransSetParent; you broke the former invariant, for non-Assert
builds, that the target pg_subtrans entry is guaranteed to have
the correct value on exit. I do like fixing it to not dirty the
page unnecessarily, but I'd suggest that we write it like

if (*ptr != parent)
{
Assert(*ptr == InvalidTransactionId);
*ptr = parent;
SubTransCtl->shared->page_dirty[slotno] = true;
}

OK, thanks. I'll commit that tomorrow.

A couple of comments about the last patch of this thread.

Nit: there is a complain about a trailing whitespace:
$ git diff --check
src/backend/access/transam/twophase.c:2011: trailing whitespace.
+ bool fromdisk,

+        * It's possible that SubTransSetParent has been set before, if
+        * the prepared transaction generated xid assignment records.
         */
        for (i = 0; i < hdr->nsubxacts; i++)
-           SubTransSetParent(subxids[i], xid, true);
+           SubTransSetParent(subxids[i], xid);
You can remove this part in RecoverPreparedTransactions() and just
switch ProcessTwoPhaseBuffer()'s setParent to true to do the same
thing. The existing comment block should be moved before
ProcessTwoPhaseBuffer() call. It is critical to mention that
pg_subtrans is not kept after a restart.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Nikhil Sontakke

nikhils@2ndquadrant.com

over 8 years ago

In reply to: Simon Riggs (#16)

1 attachment(s)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

I'm suggesting we take the approach that if there is a problem we can
recreate it as a way of exploring what conditions are required and
therefore work out the impact. Nikhil Sontakke appears to have
re-created something, but not quite what I had expected. I think he
will post here tomorrow with an update for us to discuss.

So, I reverted commit 0874d4f3e183757ba15a4b3f3bf563e0393dd9c2 to go back
to the earlier bad swapped arguments to SubTransSetParent resulting in
incorrect parent linkages and used the attached TAP test patch.

The test prepares a 2PC with more than 64 subtransactions. It then stops
the master and promotes the standby.

A SELECT query on the newly promoted master on any of the tables involved
in the 2PC hangs. The hang is due to a loop in
SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a
circular reference in parentxid <-> subxid inducing the infinite loop.

Any further DML on these objects which will need to check visibility of
these tuples hangs as well. All unrelated objects and new transactions are
ok AFAICS.

I do not see any data loss, which is good. However tables involved in the
2PC are inaccessible till after a hard restart.

The attached TAP test patch can be considered for commit to test handling
2PC with large subtransactions on promoted standby instances.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachments:

subxid_bug_with_test_case_v1.0.patchapplication/octet-stream; name=subxid_bug_with_test_case_v1.0.patchDownload

diff --git a/src/test/recovery/t/009_twophase.pl b/src/test/recovery/t/009_twophase.pl
old mode 100644
new mode 100755
index be7f00b..d340953
--- a/src/test/recovery/t/009_twophase.pl
+++ b/src/test/recovery/t/009_twophase.pl
@@ -4,7 +4,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 13;
+use Test::More tests => 21;
 
 # Setup master node
 my $node_master = get_new_node("master");
@@ -320,3 +320,88 @@ $node_master->psql('postgres', "
 $node_master->psql('postgres', "SELECT count(*) FROM t_009_tbl",
 	  stdout => \$psql_out);
 is($psql_out, '6', "Check nextXid handling for prepared subtransactions");
+
+###############################################################################
+# Check that replay will correctly set 2PC with more than
+# PGPROC_MAX_CACHED_SUBXIDS subtransations and also show data properly
+# on promotion
+###############################################################################
+$node_master->psql('postgres', "DELETE FROM t_009_tbl");
+
+# Function borrowed from src/test/regress/sql/hs_primary_extremes.sql
+$node_master->psql('postgres', "
+    CREATE OR REPLACE FUNCTION hs_subxids (n integer)
+    RETURNS void
+    LANGUAGE plpgsql
+    AS \$\$
+    BEGIN
+        IF n <= 0 THEN RETURN; END IF;
+        INSERT INTO t_009_tbl VALUES (n);
+        PERFORM hs_subxids(n - 1);
+        RETURN;
+    EXCEPTION WHEN raise_exception THEN NULL; END;
+    \$\$;");
+$node_master->psql('postgres', "
+	BEGIN;
+	SELECT hs_subxids(127);
+	PREPARE TRANSACTION 'xact_009_1';");
+$node_master->wait_for_catchup($node_slave, 'replay', $node_master->lsn('insert'));
+$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_009_tbl",
+	  stdout => \$psql_out);
+is($psql_out, '-1', "Not visible");
+$node_master->stop;
+$node_slave->promote;
+$node_slave->poll_query_until('postgres',
+	"SELECT NOT pg_is_in_recovery()")
+  or die "Timed out while waiting for promotion of standby";
+
+$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_009_tbl",
+	  stdout => \$psql_out);
+is($psql_out, '-1', "Not visible");
+
+# restore state
+($node_master, $node_slave) = ($node_slave, $node_master);
+$node_slave->enable_streaming($node_master);
+$node_slave->append_conf('recovery.conf', qq(
+recovery_target_timeline='latest'
+));
+$node_slave->start;
+$psql_rc = $node_master->psql('postgres', "COMMIT PREPARED 'xact_009_1'");
+is($psql_rc, '0', "Restore of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted slave");
+
+$node_master->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_009_tbl",
+	  stdout => \$psql_out);
+is($psql_out, '8128', "Visible");
+
+$node_master->psql('postgres', "DELETE FROM t_009_tbl");
+$node_master->psql('postgres', "
+	BEGIN;
+	SELECT hs_subxids(201);
+	PREPARE TRANSACTION 'xact_009_1';");
+$node_master->wait_for_catchup($node_slave, 'replay', $node_master->lsn('insert'));
+$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_009_tbl",
+	  stdout => \$psql_out);
+is($psql_out, '-1', "Not visible");
+$node_master->stop;
+$node_slave->promote;
+$node_slave->poll_query_until('postgres',
+	"SELECT NOT pg_is_in_recovery()")
+  or die "Timed out while waiting for promotion of standby";
+
+$node_slave->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_009_tbl",
+	  stdout => \$psql_out);
+is($psql_out, '-1', "Not visible");
+
+# restore state
+($node_master, $node_slave) = ($node_slave, $node_master);
+$node_slave->enable_streaming($node_master);
+$node_slave->append_conf('recovery.conf', qq(
+recovery_target_timeline='latest'
+));
+$node_slave->start;
+$psql_rc = $node_master->psql('postgres', "ROLLBACK PREPARED 'xact_009_1'");
+is($psql_rc, '0', "Rollback of PGPROC_MAX_CACHED_SUBXIDS+ prepared transaction on promoted slave");
+
+$node_master->psql('postgres', "SELECT coalesce(sum(id),-1) FROM t_009_tbl",
+	  stdout => \$psql_out);
+is($psql_out, '-1', "Not visible");

#20

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Nikhil Sontakke (#19)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Nikhil Sontakke <nikhils@2ndquadrant.com> writes:

A SELECT query on the newly promoted master on any of the tables involved
in the 2PC hangs. The hang is due to a loop in
SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a
circular reference in parentxid <-> subxid inducing the infinite loop.

I'm inclined to propose that

(1) SubTransSetParent should contain an assert that
Assert(TransactionIdFollows(xid, parent));
There is a similar assertion near one of the call sites, but that's
obviously too far away from the action.

(2) SubTransGetTopmostTransaction ought to contain an actual runtime
test and ereport (not just Assert) that the parent XID it got from
pg_subtrans precedes the child XID that was looked up. This would
protect us against similar infinite loops in the future. Even without
bugs, such a loop could arise due to corrupted data.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Tom Lane (#20)

1 attachment(s)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On 26 April 2017 at 15:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nikhil Sontakke <nikhils@2ndquadrant.com> writes:

A SELECT query on the newly promoted master on any of the tables involved
in the 2PC hangs. The hang is due to a loop in
SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a
circular reference in parentxid <-> subxid inducing the infinite loop.

I'm inclined to propose that

(1) SubTransSetParent should contain an assert that
Assert(TransactionIdFollows(xid, parent));
There is a similar assertion near one of the call sites, but that's
obviously too far away from the action.

(2) SubTransGetTopmostTransaction ought to contain an actual runtime
test and ereport (not just Assert) that the parent XID it got from
pg_subtrans precedes the child XID that was looked up. This would
protect us against similar infinite loops in the future. Even without
bugs, such a loop could arise due to corrupted data.

Pretty much what I was thinking.

I've added code following Michael and Tom's comments to the previous
patch. New patch attached.

Passes with and without Nikhil's new test.

I plan to apply both patches tomorrow morning my time to give time for
further comments.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

2PC_rework.v2.patchapplication/octet-stream; name=2PC_rework.v2.patchDownload

diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 70828e5170..44efc65e02 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -68,11 +68,9 @@ static bool SubTransPagePrecedes(int page1, int page2);
 
 /*
  * Record the parent of a subtransaction in the subtrans log.
- *
- * In some cases we may need to overwrite an existing value.
  */
 void
-SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
+SubTransSetParent(TransactionId xid, TransactionId parent)
 {
 	int			pageno = TransactionIdToPage(xid);
 	int			entryno = TransactionIdToEntry(xid);
@@ -80,6 +78,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
 	TransactionId *ptr;
 
 	Assert(TransactionIdIsValid(parent));
+	Assert(TransactionIdFollows(xid, parent));
 
 	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
 
@@ -87,13 +86,12 @@ SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
 	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
 	ptr += entryno;
 
-	/* Current state should be 0 */
-	Assert(*ptr == InvalidTransactionId ||
-		   (*ptr == parent && overwriteOK));
-
-	*ptr = parent;
-
-	SubTransCtl->shared->page_dirty[slotno] = true;
+	if (*ptr != parent)
+	{
+		Assert(*ptr == InvalidTransactionId);
+		*ptr = parent;
+		SubTransCtl->shared->page_dirty[slotno] = true;
+	}
 
 	LWLockRelease(SubtransControlLock);
 }
@@ -157,6 +155,9 @@ SubTransGetTopmostTransaction(TransactionId xid)
 		if (TransactionIdPrecedes(parentXid, TransactionXmin))
 			break;
 		parentXid = SubTransGetParent(parentXid);
+		if (TransactionIdFollowsOrEquals(parentXid, previousXid))
+			elog(FATAL, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
+							previousXid, parentXid);
 	}
 
 	Assert(TransactionIdIsValid(previousXid));
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index fa7124d903..c9fff42991 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -221,8 +221,7 @@ static void RemoveGXact(GlobalTransaction gxact);
 static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len);
 static char *ProcessTwoPhaseBuffer(TransactionId xid,
 							XLogRecPtr	prepare_start_lsn,
-							bool fromdisk, bool overwriteOK, bool setParent,
-							bool setNextXid);
+							bool fromdisk, bool setParent, bool setNextXid);
 static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid,
 				const char *gid, TimestampTz prepared_at, Oid owner,
 				Oid databaseid);
@@ -1743,8 +1742,7 @@ restoreTwoPhaseData(void)
 			xid = (TransactionId) strtoul(clde->d_name, NULL, 16);
 
 			buf = ProcessTwoPhaseBuffer(xid, InvalidXLogRecPtr,
-										true, false, false,
-										false);
+										true, false, false);
 			if (buf == NULL)
 				continue;
 
@@ -1804,8 +1802,7 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
 
 		buf = ProcessTwoPhaseBuffer(xid,
 				gxact->prepare_start_lsn,
-				gxact->ondisk, false, false,
-				true);
+				gxact->ondisk, false, true);
 
 		if (buf == NULL)
 			continue;
@@ -1858,12 +1855,12 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p)
  * This is never called at the end of recovery - we use
  * RecoverPreparedTransactions() at that point.
  *
- * Currently we simply call SubTransSetParent() for any subxids of prepared
- * transactions. If overwriteOK is true, it's OK if some XIDs have already
- * been marked in pg_subtrans.
+ * The lack of calls to SubTransSetParent() calls here is by design;
+ * those calls are made by RecoverPreparedTransactions() at the end of recovery
+ * for those xacts that need this.
  */
 void
-StandbyRecoverPreparedTransactions(bool overwriteOK)
+StandbyRecoverPreparedTransactions(void)
 {
 	int			i;
 
@@ -1880,8 +1877,7 @@ StandbyRecoverPreparedTransactions(bool overwriteOK)
 
 		buf = ProcessTwoPhaseBuffer(xid,
 				gxact->prepare_start_lsn,
-				gxact->ondisk, overwriteOK, true,
-				false);
+				gxact->ondisk, false, false);
 		if (buf != NULL)
 			pfree(buf);
 	}
@@ -1895,6 +1891,13 @@ StandbyRecoverPreparedTransactions(bool overwriteOK)
  * each prepared transaction (reacquire locks, etc).
  *
  * This is run during database startup.
+ *
+ * At the end of recovery the way we take snapshots will change. We now need
+ * to mark all running transactions with their full SubTransSetParent() info
+ * to allow normal snapshots to work correctly if snapshots overflow.
+ * We do this here because by definition prepared transactions are the only
+ * type of write transaction still running, so this is necessary and
+ * complete.
  */
 void
 RecoverPreparedTransactions(void)
@@ -1913,15 +1916,21 @@ RecoverPreparedTransactions(void)
 		TwoPhaseFileHeader *hdr;
 		TransactionId *subxids;
 		const char *gid;
-		bool		overwriteOK = false;
-		int			i;
 
 		xid = gxact->xid;
 
+		/*
+		 * Reconstruct subtrans state for the transaction --- needed
+		 * because pg_subtrans is not preserved over a restart.  Note that
+		 * we are linking all the subtransactions directly to the
+		 * top-level XID; there may originally have been a more complex
+		 * hierarchy, but there's no need to restore that exactly.
+		 * It's possible that SubTransSetParent has been set before, if
+		 * the prepared transaction generated xid assignment records.
+		 */
 		buf = ProcessTwoPhaseBuffer(xid,
 				gxact->prepare_start_lsn,
-				gxact->ondisk, false, false,
-				false);
+				gxact->ondisk, true, false);
 		if (buf == NULL)
 			continue;
 
@@ -1940,25 +1949,6 @@ RecoverPreparedTransactions(void)
 		bufptr += MAXALIGN(hdr->ninvalmsgs * sizeof(SharedInvalidationMessage));
 
 		/*
-		 * It's possible that SubTransSetParent has been set before, if
-		 * the prepared transaction generated xid assignment records. Test
-		 * here must match one used in AssignTransactionId().
-		 */
-		if (InHotStandby && (hdr->nsubxacts >= PGPROC_MAX_CACHED_SUBXIDS ||
-							 XLogLogicalInfoActive()))
-			overwriteOK = true;
-
-		/*
-		 * Reconstruct subtrans state for the transaction --- needed
-		 * because pg_subtrans is not preserved over a restart.  Note that
-		 * we are linking all the subtransactions directly to the
-		 * top-level XID; there may originally have been a more complex
-		 * hierarchy, but there's no need to restore that exactly.
-		 */
-		for (i = 0; i < hdr->nsubxacts; i++)
-			SubTransSetParent(subxids[i], xid, true);
-
-		/*
 		 * Recreate its GXACT and dummy PGPROC. But, check whether
 		 * it was added in redo and already has a shmem entry for
 		 * it.
@@ -2006,8 +1996,7 @@ RecoverPreparedTransactions(void)
  * Given a transaction id, read it either from disk or read it directly
  * via shmem xlog record pointer using the provided "prepare_start_lsn".
  *
- * If setParent is true, then use the overwriteOK parameter to set up
- * subtransaction parent linkages.
+ * If setParent is true, set up subtransaction parent linkages.
  *
  * If setNextXid is true, set ShmemVariableCache->nextXid to the newest
  * value scanned.
@@ -2015,7 +2004,7 @@ RecoverPreparedTransactions(void)
 static char *
 ProcessTwoPhaseBuffer(TransactionId xid,
 					  XLogRecPtr prepare_start_lsn,
-					  bool fromdisk, bool overwriteOK,
+					  bool fromdisk,
 					  bool setParent, bool setNextXid)
 {
 	TransactionId origNextXid = ShmemVariableCache->nextXid;
@@ -2142,7 +2131,7 @@ ProcessTwoPhaseBuffer(TransactionId xid,
 		}
 
 		if (setParent)
-			SubTransSetParent(subxid, xid, overwriteOK);
+			SubTransSetParent(subxid, xid);
 	}
 
 	return buf;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 92b263aea1..605639b0c3 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -559,7 +559,7 @@ AssignTransactionId(TransactionState s)
 		XactTopTransactionId = s->transactionId;
 
 	if (isSubXact)
-		SubTransSetParent(s->transactionId, s->parent->transactionId, false);
+		SubTransSetParent(s->transactionId, s->parent->transactionId);
 
 	/*
 	 * If it's a top-level transaction, the predicate locking system needs to
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7667879c6..a89d99838a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6930,7 +6930,7 @@ StartupXLOG(void)
 
 				ProcArrayApplyRecoveryInfo(&running);
 
-				StandbyRecoverPreparedTransactions(false);
+				StandbyRecoverPreparedTransactions();
 			}
 		}
 
@@ -9692,7 +9692,7 @@ xlog_redo(XLogReaderState *record)
 
 			ProcArrayApplyRecoveryInfo(&running);
 
-			StandbyRecoverPreparedTransactions(true);
+			StandbyRecoverPreparedTransactions();
 		}
 
 		/* ControlFile->checkPointCopy always tracks the latest ckpt XID */
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ebf6a92923..4976bb03c7 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -943,7 +943,7 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
 	 * have attempted to SubTransSetParent().
 	 */
 	for (i = 0; i < nsubxids; i++)
-		SubTransSetParent(subxids[i], topxid, false);
+		SubTransSetParent(subxids[i], topxid);
 
 	/* KnownAssignedXids isn't maintained yet, so we're done for now */
 	if (standbyState == STANDBY_INITIALIZED)
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index bd30f5861f..847359873a 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -14,7 +14,7 @@
 /* Number of SLRU buffers to use for subtrans */
 #define NUM_SUBTRANS_BUFFERS	32
 
-extern void SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK);
+extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
 
diff --git a/src/include/access/twophase.h b/src/include/access/twophase.h
index 4d547c5553..80ec4ca4a5 100644
--- a/src/include/access/twophase.h
+++ b/src/include/access/twophase.h
@@ -46,7 +46,7 @@ extern bool StandbyTransactionIdIsPrepared(TransactionId xid);
 
 extern TransactionId PrescanPreparedTransactions(TransactionId **xids_p,
 							int *nxids_p);
-extern void StandbyRecoverPreparedTransactions(bool overwriteOK);
+extern void StandbyRecoverPreparedTransactions(void);
 extern void RecoverPreparedTransactions(void);
 
 extern void CheckPointTwoPhase(XLogRecPtr redo_horizon);

#22

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Simon Riggs (#21)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

Simon Riggs <simon@2ndquadrant.com> writes:

I've added code following Michael and Tom's comments to the previous
patch. New patch attached.

Couple of minor suggestions:

* Rather than deleting the comment for SubTransSetParent entirely,
maybe make it say "It's possible that the parent was already recorded.
However, we should never be asked to change an already-set entry to
something else."

* In SubTransGetTopmostTransaction, maybe it'd be better to spell
"TransactionIdFollowsOrEquals" as "!TransactionIdPrecedes" to make
it look more like the test just above. Matter of taste though.

* Less a matter of taste is that I think that should be just elog(ERROR);
there's no good reason to make it FATAL.

* Also, I think there should be a comment there, along the lines of
"check for reversed linkage to ensure this isn't an infinite loop".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Tom Lane (#22)

Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

On Thu, Apr 27, 2017 at 2:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

I've added code following Michael and Tom's comments to the previous
patch. New patch attached.

Couple of minor suggestions:

* Rather than deleting the comment for SubTransSetParent entirely,
maybe make it say "It's possible that the parent was already recorded.
However, we should never be asked to change an already-set entry to
something else."

* In SubTransGetTopmostTransaction, maybe it'd be better to spell
"TransactionIdFollowsOrEquals" as "!TransactionIdPrecedes" to make
it look more like the test just above. Matter of taste though.

* Less a matter of taste is that I think that should be just elog(ERROR);
there's no good reason to make it FATAL.

* Also, I think there should be a comment there, along the lines of
"check for reversed linkage to ensure this isn't an infinite loop".

No more comments from here, thanks for working on the patch.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers