out-of-order XID insertion in KnownAssignedXids

Started by Konstantin Knizhnikover 7 years ago16 messages

k.knizhnik@postgrespro.ru

over 7 years ago

Hi hackers,

Looks like there is a bug with logging running transactions XIDs and
prepared transactions.
One of our customers get error "FATAL: out-of-order XID insertion in
KnownAssignedXids"
trying to apply backup.
WAL contains the following record:

rmgr: Standby len (rec/tot): 98/ 98, tx: 0, lsn:
1418/A9A76C90, prev 1418/A9A76C48, desc: RUNNING_XACTS nextXid
2004495309 latestCompletedXid 2004495307 oldestRunningXid 2004495290; 3
xacts: 2004495290 2004495308 2004495308

As you can notice, XID 2004495308 is encountered twice which cause error
in KnownAssignedXidsAdd:

    if (head > tail &&
        TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],
from_xid))
    {
        KnownAssignedXidsDisplay(LOG);
        elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
    }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any transaction.
MarkAsPrepared will add GXACT to proc array and at this moment there
will be two entries in procarray with the same XID:

(gdb) p procArray->numProcs
$2 = 4
(gdb) p allPgXact[procArray->pgprocnos[0]]
$4 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0
'\000', delayChkpt = 1 '\001', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[1]]
$5 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000',
delayChkpt = 0 '\000', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[2]]
$6 = {xid = 0, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0 '\000',
delayChkpt = 0 '\000', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}
(gdb) p allPgXact[procArray->pgprocnos[3]]
$7 = {xid = 513976717, xmin = 0, vacuumFlags = 0 '\000', overflowed = 0
'\000', delayChkpt = 0 '\000', nxids = 0 '\000',
used = 0 '\000', parent = 0x0}

Then you should just wait for sometime until checkpoint timeout is
triggered and it logs snapshot:

(gdb) bt
#0 0x00000000007f3dab in GetRunningTransactionData () at procarray.c:2240
#1 0x00000000007fab22 in LogStandbySnapshot () at standby.c:943
#2 0x000000000077cde8 in BackgroundWriterMain () at bgwriter.c:331
#3 0x00000000005377f3 in AuxiliaryProcessMain (argc=2,
argv=0x7ffe00aa00e0) at bootstrap.c:446
#4 0x000000000078e07e in StartChildProcess (type=BgWriterProcess) at
postmaster.c:5323
#5 0x000000000078b6f0 in reaper (postgres_signal_arg=17) at
postmaster.c:2948
#6 <signal handler called>
#7 0x00007f1356d665b3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#8 0x0000000000789931 in ServerLoop () at postmaster.c:1765
#9 0x000000000078906c in PostmasterMain (argc=3, argv=0x1902640) at
postmaster.c:1406
#10 0x00000000006d0e4f in main (argc=3, argv=0x1902640) at main.c:228

Now generated RUNNING_XACTS record contains duplicated XIDs.

I want to ask opinion of community about the best way of fixing this
problem.
Should we avoid storing duplicated XIDs in procarray (by invalidating
XID in original pgaxct)
or eliminate/change check for duplicate in KnownAssignedXidsAdd (for
example just ignore duplicates)?

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Konstantin Knizhnik (#1)

Re: out-of-order XID insertion in KnownAssignedXids

On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

As you can notice, XID 2004495308 is encountered twice which cause error in
KnownAssignedXidsAdd:

    if (head > tail &&
        TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
    {
        KnownAssignedXidsDisplay(LOG);
        elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
    }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any transaction.
MarkAsPrepared will add GXACT to proc array and at this moment there will
be two entries in procarray with the same XID:

[snip]

Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been able
to see that first, and then that you have been able to get into that
state with simple steps.

I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?

Hmmmmm... Please let me think through that first. It seems to me that
the record should not be generated to begin with. At least I am able to
confirm what you see.
--
Michael

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Michael Paquier (#2)

Re: out-of-order XID insertion in KnownAssignedXids

On 05.10.2018 11:04, Michael Paquier wrote:

On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

As you can notice, XID 2004495308 is encountered twice which cause error in
KnownAssignedXidsAdd:

ï¿½ï¿½ï¿½ if (head > tail &&
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
ï¿½ï¿½ï¿½ {
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ KnownAssignedXidsDisplay(LOG);
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
ï¿½ï¿½ï¿½ }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any transaction.
MarkAsPreparedï¿½ will add GXACT to proc array and at this moment there will
be two entries in procarray with the same XID:

[snip]

Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been able
to see that first, and then that you have been able to get into that
state with simple steps.

There are about 1000 active clients performing 2PC transactions, so if
you perform backup (which does checkpoint)
then probability seems to be large enough.

I have reproduced this problem without using gdb by just running in
parallel many 2PC transactions and checkpoints:

for ((i=1;i<10;i++))
do
ï¿½ï¿½ï¿½ pgbench -n -T 300000 -M prepared -f t$i.sql postgres > t$i.log &
done

pgbench -n -T 300000 -f checkpoint.sql postgres > checkpoint.log &
wait
------------------------------

tN.sql:

begin;
update t set val=val+1 where pk=N;
prepare transaction 'tN';
commit prepared 'tN';

------------------------------

checkpoint.sql:

checkpoint;

I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?

Hmmmmm... Please let me think through that first. It seems to me that
the record should not be generated to begin with. At least I am able to
confirm what you see.
--
Michael

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Michael Paquier (#2)

1 attachment(s)

Re: out-of-order XID insertion in KnownAssignedXids

On 05.10.2018 11:04, Michael Paquier wrote:

On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

As you can notice, XID 2004495308 is encountered twice which cause error in
KnownAssignedXidsAdd:

ï¿½ï¿½ï¿½ if (head > tail &&
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
ï¿½ï¿½ï¿½ {
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ KnownAssignedXidsDisplay(LOG);
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
ï¿½ï¿½ï¿½ }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any transaction.
MarkAsPreparedï¿½ will add GXACT to proc array and at this moment there will
be two entries in procarray with the same XID:

[snip]

Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been able
to see that first, and then that you have been able to get into that
state with simple steps.

I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?

Hmmmmm... Please let me think through that first. It seems to me that
the record should not be generated to begin with. At least I am able to
confirm what you see.

The simplest way to fix the problem is to ignore duplicates before
adding them to KnownAssignedXids.
We in any case perform sort i this place...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

runnings_xids.patchtext/x-patch; name=runnings_xids.patchDownload

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bd20497..a2ea233 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -807,9 +807,12 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		/*
 		 * Add the sorted snapshot into KnownAssignedXids
 		 */
-		for (i = 0; i < nxids; i++)
-			KnownAssignedXidsAdd(xids[i], xids[i], true);
-
+		KnownAssignedXidsAdd(xids[0], xids[0], true);
+		for (i = 1; i < nxids; i++)
+		{
+			if (xids[i] != xids[i-1])
+				KnownAssignedXidsAdd(xids[i], xids[i], true);
+		}
 		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	}

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Konstantin Knizhnik (#4)

Re: out-of-order XID insertion in KnownAssignedXids

On Mon, Oct 08, 2018 at 12:04:28PM +0300, Konstantin Knizhnik wrote:

The simplest way to fix the problem is to ignore duplicates before adding
them to KnownAssignedXids.
We in any case perform sort i this place...

I may of course be missing something, but shouldn't we not have
duplicates in the first place?
--
Michael

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Michael Paquier (#5)

Re: out-of-order XID insertion in KnownAssignedXids

On 08.10.2018 12:14, Michael Paquier wrote:

On Mon, Oct 08, 2018 at 12:04:28PM +0300, Konstantin Knizhnik wrote:

The simplest way to fix the problem is to ignore duplicates before adding
them to KnownAssignedXids.
We in any case perform sort i this place...

I may of course be missing something, but shouldn't we not have
duplicates in the first place?

The reason of appearing duplicated XIDs in case of 2PC seems to be clear.
It may be possible to eliminate it by clearing XID of MyPgxact for
prepared transaction.
But there are two problems with it:
1. I am not sure that it will not break something
2. There is obvious race condition between adding GXACT to ProcArrayAdd
and invalidating XID of current transaction.
If it is cleared before calling ProcArrayAdd, then there will be some
moment when XID is not present in procarray.
If it is done after calling ProcArrayAdd, then still it is possible to
see duplicated XID in procarray.

From my point of view it is easier and less invasive to exclude
duplicates while replaying RUNNING_XIDS record.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Konstantin Knizhnik (#4)

Re: out-of-order XID insertion in KnownAssignedXids

On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

On 05.10.2018 11:04, Michael Paquier wrote:

On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

As you can notice, XID 2004495308 is encountered twice which cause

error in

KnownAssignedXidsAdd:

    if (head > tail &&
        TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],

from_xid))

    {
        KnownAssignedXidsDisplay(LOG);
        elog(ERROR, "out-of-order XID insertion in

KnownAssignedXids");

    }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any

transaction.

MarkAsPrepared will add GXACT to proc array and at this moment

there will

be two entries in procarray with the same XID:

[snip]

Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been

able

to see that first, and then that you have been able to get into that
state with simple steps.

I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?

Hmmmmm... Please let me think through that first. It seems to me

that

the record should not be generated to begin with. At least I am able

to

confirm what you see.

The simplest way to fix the problem is to ignore duplicates before
adding them to KnownAssignedXids.
We in any case perform sort i this place...

I vehemently object to that as the proper course.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Andres Freund (#7)

Re: out-of-order XID insertion in KnownAssignedXids

On 08.10.2018 18:24, Andres Freund wrote:

On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

On 05.10.2018 11:04, Michael Paquier wrote:

On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

As you can notice, XID 2004495308 is encountered twice which cause

error in

KnownAssignedXidsAdd:

    if (head > tail &&
        TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],

from_xid))

    {
        KnownAssignedXidsDisplay(LOG);
        elog(ERROR, "out-of-order XID insertion in

KnownAssignedXids");

    }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any

transaction.

MarkAsPrepared will add GXACT to proc array and at this moment

there will

be two entries in procarray with the same XID:

[snip]

Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been

able

to see that first, and then that you have been able to get into that
state with simple steps.

I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?

Hmmmmm... Please let me think through that first. It seems to me

that

the record should not be generated to begin with. At least I am able

to

confirm what you see.

The simplest way to fix the problem is to ignore duplicates before
adding them to KnownAssignedXids.
We in any case perform sort i this place...

I vehemently object to that as the proper course.

And what about adding qsort to GetRunningTransactionData or
LogCurrentRunningXacts and excluding duplicates here?

Andres

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Konstantin Knizhnik (#8)

Re: out-of-order XID insertion in KnownAssignedXids

On 2018-10-08 18:28:52 +0300, Konstantin Knizhnik wrote:

On 08.10.2018 18:24, Andres Freund wrote:

On October 8, 2018 2:04:28 AM PDT, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

On 05.10.2018 11:04, Michael Paquier wrote:

On Fri, Oct 05, 2018 at 10:06:45AM +0300, Konstantin Knizhnik wrote:

As you can notice, XID 2004495308 is encountered twice which cause

error in

KnownAssignedXidsAdd:

ï¿½ï¿½ï¿½ if (head > tail &&
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1],

from_xid))

ï¿½ï¿½ï¿½ {
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ KnownAssignedXidsDisplay(LOG);
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ elog(ERROR, "out-of-order XID insertion in

KnownAssignedXids");

ï¿½ï¿½ï¿½ }

The probability of this error is very small but it can quite easily
reproduced: you should just set breakpoint in debugger after calling
MarkAsPrepared in twophase.c and then try to prepare any

transaction.

MarkAsPreparedï¿½ will add GXACT to proc array and at this moment

there will

be two entries in procarray with the same XID:

[snip]

Now generated RUNNING_XACTS record contains duplicated XIDs.

So, I have been doing exactly that, and if you trigger a manual
checkpoint then things happen quite correctly if you let the first
session finish:
rmgr: Standby len (rec/tot): 58/ 58, tx: 0, lsn:
0/016150F8, prev 0/01615088, desc: RUNNING_XACTS nextXid 608
latestCompletedXid 605 oldestRunningXid 606; 2 xacts: 607 606

If you still maintain the debugger after calling MarkAsPrepared, then
the manual checkpoint would block. Now if you actually keep the
debugger, and wait for a checkpoint timeout to happen, then I can see
the incorrect record. It is impressive that your customer has been

able

to see that first, and then that you have been able to get into that
state with simple steps.

I want to ask opinion of community about the best way of fixing this
problem. Should we avoid storing duplicated XIDs in procarray (by
invalidating XID in original pgaxct) or eliminate/change check for
duplicate in KnownAssignedXidsAdd (for example just ignore
duplicates)?

Hmmmmm... Please let me think through that first. It seems to me

that

the record should not be generated to begin with. At least I am able

to

confirm what you see.

The simplest way to fix the problem is to ignore duplicates before
adding them to KnownAssignedXids.
We in any case perform sort i this place...

I vehemently object to that as the proper course.

And what about adding qsort to GetRunningTransactionData or
LogCurrentRunningXacts and excluding duplicates here?

Sounds less terrible, but still pretty bad. I think we should fix the
underlying data inconsistency, not paper over it a couple hundred meters
away.

Greetings,

Andres Freund

#10

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Andres Freund (#9)

Re: out-of-order XID insertion in KnownAssignedXids

On Mon, Oct 08, 2018 at 09:30:49AM -0700, Andres Freund wrote:

Sounds less terrible, but still pretty bad. I think we should fix the
underlying data inconsistency, not paper over it a couple hundred meters
away.

+1.  I am looking at how to make that possible.
--
Michael

#11

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#10)

Re: out-of-order XID insertion in KnownAssignedXids

On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote:

+1. I am looking at how to make that possible.

And so... Going through a bit of investigation the problem is that each
2PC transaction preparing finishes by putting the procarray in a state
where there are two entries referring to the same transaction running as
MarkAsPrepared adds an extra entry on the top of the one already
existing. This is expected though per the comments in
ProcArrayClearTransaction(), as it assumes that MyProc can be safely
cleaned and it assumes that there is a duplicated entry. Then,
GetRunningTransactionData() ignores the assumption that those duplicate
entries can exist, which makes the snapshot data taken as incorrect,
generating those incorrect XLOG_RUNNING_XACT records.

The most simple fix I can think of first is to tweak the origin of the
problem in GetRunningTransactionData() so as those duplicate XIDs are
ignored if found as there has to be a state between the time
MarkAsPrepared() inserted the 2PC entry in the procarray and the time
ProcArrayClearTransaction() gets called.
--
Michael

#12

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Michael Paquier (#11)

Re: out-of-order XID insertion in KnownAssignedXids

On 09.10.2018 10:52, Michael Paquier wrote:

On Tue, Oct 09, 2018 at 02:59:00PM +0900, Michael Paquier wrote:

+1. I am looking at how to make that possible.

And so... Going through a bit of investigation the problem is that each
2PC transaction preparing finishes by putting the procarray in a state
where there are two entries referring to the same transaction running as
MarkAsPrepared adds an extra entry on the top of the one already
existing. This is expected though per the comments in
ProcArrayClearTransaction(), as it assumes that MyProc can be safely
cleaned and it assumes that there is a duplicated entry. Then,
GetRunningTransactionData() ignores the assumption that those duplicate
entries can exist, which makes the snapshot data taken as incorrect,
generating those incorrect XLOG_RUNNING_XACT records.

The most simple fix I can think of first is to tweak the origin of the
problem in GetRunningTransactionData() so as those duplicate XIDs are
ignored if found as there has to be a state between the time
MarkAsPrepared() inserted the 2PC entry in the procarray and the time
ProcArrayClearTransaction() gets called.
--
Michael

Right now GetRunningTransactionData is used only once in Postgres code - in LogStandbySnapshot to log RUNNING_XACTS record.
So if this situation is not changed, there will be no difference whether to perform sort and exclude duplicates in LogStandbySnapsho
or GetRunningTransactionData.

But GetRunningTransactionData may be used in some other cases... For example I have used it in my snapfs Postgres extensions (fast file-level snapshots)
to check if there are no more active transactions in the system. In my case presence of duplicates is not important and ordering of XIDs is not needed.
But performance of GetRunningTransactionData is also not critical. But I can suspect that there can be cases when it is critical and doing unneeded qsort is not desired.
There may be thousands of active transactions and sorting their XIDs is notinstantaneous.

Also please take in account that in most cases XIDs in RUNNINGS_XACTS record are not processed at all (only if we perform recovery from backup and do not reach consistent point yet). This is why this bug was not detected before.

So I completely understand the argument that it is better to eliminate source of the problem (presence of duplicates in RUNNING_XACTS record)
but not sure that in this particular case it is really the best solution. If presence of duplicates is considered to be acceptable for procarray, why we
can not accept it for RUNNING_XACTS record?

I am not insisting that skipping duplicates in
ProcArrayApplyRecoveryInfo is the right place (but at least is seems to
be the most efficient alternative). But I also do not fill that moving
sort to GetRunningTransactionData and elimination of duplicates here
(requires one more scan and copying of the whole array) is principally
better...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Konstantin Knizhnik (#12)

Re: out-of-order XID insertion in KnownAssignedXids

On Tue, Oct 09, 2018 at 05:26:49PM +0300, Konstantin Knizhnik wrote:

But GetRunningTransactionData may be used in some other cases... For
example I have used it in my snapfs Postgres extensions (fast
file-level snapshots) to check if there are no more active
transactions in the system. In my case presence of duplicates is not
important and ordering of XIDs is not needed. But performance of
GetRunningTransactionData is also not critical. But I can suspect that
there can be cases when it is critical and doing unneeded qsort is not
desired. There may be thousands of active transactions and sorting
their XIDs is notinstantaneous.

I am not sure if the performance argument is actually this much
sensible, it could be as you say, but another thing that we could argue
about is that the presence of duplicate entries in
GetRunningTransactionData() can be used as a point to understand that
2PC transactions are running, and that among the two, one of them is a
dummy entry while the other is pending for being cleared.

So I completely understand the argument that it is better to eliminate
source of the problem (presence of duplicates in RUNNING_XACTS record)
but not sure that in this particular case it is really the best
solution. If presence of duplicates is considered to be acceptable for
procarray, why we can not accept it for RUNNING_XACTS record?

This won't solve the case where records have already been generated and
won't be processed correctly, but based on the chances this can happen
we can rather safely say that fixing only the source would be fine.

I am not insisting that skipping duplicates in ProcArrayApplyRecoveryInfo is
the right place (but at least is seems to be the most efficient
alternative). But I also do not fill that moving sort to
GetRunningTransactionData and elimination of duplicates here (requires one
more scan and copying of the whole array) is principally better...

Making recovery a couple of instructions shorter is always a good thing
in my opinion. Users don't care much if backups take a long time, ans
sweat less if restores take a shorter time.

Hence what about doing roughly the following:

1) Document that GetRunningTransactionData() fetches information also
about 2PC entries, like that:
  * GetRunningTransactionData -- returns information about running transactions.
  *
  * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * all PGXACTs with an assigned TransactionId, even VACUUM processes and
+ * dummy two-phase related entries created when preparing the transaction.

2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted
and ordered. This can be used as a sanity check for recovery via
ProcArrayApplyRecoveryInfo().
--
Michael

#14

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Michael Paquier (#13)

Re: out-of-order XID insertion in KnownAssignedXids

On Wed, Oct 10, 2018 at 11:22:45AM +0900, Michael Paquier wrote:

I am not sure if the performance argument is actually this much
sensible, it could be as you say, but another thing that we could argue
about is that the presence of duplicate entries in
GetRunningTransactionData() can be used as a point to understand that
2PC transactions are running, and that among the two, one of them is a
dummy entry while the other is pending for being cleared.

Actually there would be a performance impact for any deployments if we
were to do so, as ProcArrayLock is hold and we would need to scan 4
times procArray instead of twice. Many people already complain (justly)
that ProcArray is a performance bottleneck, it would be a bad idea to
make things worse...

1) Document that GetRunningTransactionData() fetches information also
about 2PC entries, like that:
* GetRunningTransactionData -- returns information about running transactions.
*
* Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * all PGXACTs with an assigned TransactionId, even VACUUM processes and
+ * dummy two-phase related entries created when preparing the transaction.

2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted
and ordered. This can be used as a sanity check for recovery via
ProcArrayApplyRecoveryInfo().

This also would increase the pressure on ProcArrayLock with wal_level =
logical as the WAL record needs to be included while holding the lock.
So let's do as Konstantin is suggesting by skipping duplicated XIDs
after applying the qsort(). The current code is also doing a bad
documentation job, so we should mentioned that GetRunningTransactionData
may return duplicated XIDs because of the dummy 2PC entries which
overlap with the active ones, and also add a proper comment in
ProcArrayApplyRecoveryInfo(). Konstantin, do you want to give it a try
with a patch? Or should I?
--
Michael

#15

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

over 7 years ago

In reply to: Michael Paquier (#14)

1 attachment(s)

Re: out-of-order XID insertion in KnownAssignedXids

On 11.10.2018 12:06, Michael Paquier wrote:

On Wed, Oct 10, 2018 at 11:22:45AM +0900, Michael Paquier wrote:

I am not sure if the performance argument is actually this much
sensible, it could be as you say, but another thing that we could argue
about is that the presence of duplicate entries in
GetRunningTransactionData() can be used as a point to understand that
2PC transactions are running, and that among the two, one of them is a
dummy entry while the other is pending for being cleared.

Actually there would be a performance impact for any deployments if we
were to do so, as ProcArrayLock is hold and we would need to scan 4
times procArray instead of twice. Many people already complain (justly)
that ProcArray is a performance bottleneck, it would be a bad idea to
make things worse...
1) Document that GetRunningTransactionData() fetches information also
about 2PC entries, like that:
* GetRunningTransactionData -- returns information about running transactions.
*
* Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * all PGXACTs with an assigned TransactionId, even VACUUM processes and
+ * dummy two-phase related entries created when preparing the transaction.
2) Update LogStandbySnapshot so as the list of XIDs fetched is sorted
and ordered. This can be used as a sanity check for recovery via
ProcArrayApplyRecoveryInfo().
This also would increase the pressure on ProcArrayLock with wal_level =
logical as the WAL record needs to be included while holding the lock.
So let's do as Konstantin is suggesting by skipping duplicated XIDs
after applying the qsort(). The current code is also doing a bad
documentation job, so we should mentioned that GetRunningTransactionData
may return duplicated XIDs because of the dummy 2PC entries which
overlap with the active ones, and also add a proper comment in
ProcArrayApplyRecoveryInfo(). Konstantin, do you want to give it a try
with a patch? Or should I?
--
Michael

Proposed patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

runnings_xids-2.patchtext/x-patch; name=runnings_xids-2.patchDownload

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf2f4db..bdf1f65 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -793,13 +793,8 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		}
 
 		/*
-		 * Sort the array so that we can add them safely into
-		 * KnownAssignedXids.
-		 */
-		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
-
-		/*
-		 * Add the sorted snapshot into KnownAssignedXids
+		 * Add the sorted snapshot into KnownAssignedXids,
+		 * xids are already sorted by LogCurrentRunningXacts
 		 */
 		for (i = 0; i < nxids; i++)
 			KnownAssignedXidsAdd(xids[i], xids[i], true);
@@ -1898,7 +1893,8 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
  * GetRunningTransactionData -- returns information about running transactions.
  *
  * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * all PGXACTs with an assigned TransactionId, even VACUUM processes and
+ * dummy two-phase related entries created when preparing the transaction.
  *
  * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
  * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index c9bb3e9..57b0e26 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -29,6 +29,7 @@
 #include "storage/procarray.h"
 #include "storage/sinvaladt.h"
 #include "storage/standby.h"
+#include "utils/builtins.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -977,9 +978,20 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 
 	/* array of TransactionIds */
 	if (xlrec.xcnt > 0)
-		XLogRegisterData((char *) CurrRunningXacts->xids,
-						 (xlrec.xcnt + xlrec.subxcnt) * sizeof(TransactionId));
+	{
+		int i, j, nxids = xlrec.xcnt + xlrec.subxcnt;
+		TransactionId* xids = CurrRunningXacts->xids;
+
+		/* Sort XIDs array and exclude duplicated XIDs caused by two-phase related entries */
+		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
 
+		for (i=0, j=1; j < nxids; j++)
+			if (xids[i] != xids[j])
+				xids[++i] = xids[j];
+
+		nxids = i + 1;
+		XLogRegisterData((char *) xids, nxids * sizeof(TransactionId));
+	}
 	recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
 
 	if (CurrRunningXacts->subxid_overflow)

#16

Michael Paquier

michael@paquier.xyz

over 7 years ago

In reply to: Konstantin Knizhnik (#15)

1 attachment(s)

Re: out-of-order XID insertion in KnownAssignedXids

On Thu, Oct 11, 2018 at 08:04:11PM +0300, Konstantin Knizhnik wrote:

Proposed patch is attached.

The problem I have with this patch doing the duplication removal and
qsort work in LogCurrentRunningXacts is that it would still lock
ProcArrayLock until the WAL record has been written with wal_level =
logical. With many sessions and 2PC transactions running around, this
would be a performance impact for any deployments every time a
checkpoint happens or every time the bgwriter decide to log a standby
snapshot. Hence I would go instead with the attached, which does the
legwork at recovery, which is a one-time code path as you mentioned.
Okay, this makes the recovery a bit longer but that's way better than
impacting all deployments of Postgres, even those not using 2PC when
normally running. And as the sorting phase already happens we just need
to do something like the attached.

One thing that we could also do for HEAD is to add in
RunningTransactionsData if the transaction comes from a 2PC entry, and
allow recovery to do more sanity checks. This would require a WAL
format change. The proposed patch needs to be back-patched down to
9.3.
--
Michael

Attachments:

duplicated-xids-recovery.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index bf2f4dbed2..6317b856a1 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -799,10 +799,20 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
 
 		/*
-		 * Add the sorted snapshot into KnownAssignedXids
+		 * Add the sorted snapshot into KnownAssignedXids.  The running-xacts
+		 * snapshot may include duplicated xids because of prepared
+		 * transactions, so ignore them.
 		 */
 		for (i = 0; i < nxids; i++)
+		{
+			if (i > 0 && TransactionIdEquals(xids[i - 1], xids[i]))
+			{
+				elog(DEBUG1, "found duplicated transaction %u from prepared transaction for KnownAssignedXids",
+					 xids[i]);
+				continue;
+			}
 			KnownAssignedXidsAdd(xids[i], xids[i], true);
+		}
 
 		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	}
@@ -1898,7 +1908,8 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
  * GetRunningTransactionData -- returns information about running transactions.
  *
  * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * all PGXACTs with an assigned TransactionId, even VACUUM processes and
+ * prepared transactions.
  *
  * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
  * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
@@ -1912,6 +1923,11 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
  * This is never executed during recovery so there is no need to look at
  * KnownAssignedXids.
  *
+ * Dummy PGXACTs from prepared transaction are included, meaning that this
+ * may return entries with duplicated TransactionId values coming from
+ * transaction finishing to prepare.  Nothing is done about duplicated
+ * entries here to not hold on ProcArrayLock more than necessary.
+ *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.