pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

Started by Michael Paquierabout 7 years ago14 messages
#1Michael Paquier
michael@paquier.xyz

Avoid duplicate XIDs at recovery when building initial snapshot

On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
periodic basis to allow recovery to build the initial state of
transactions for a hot standby. The set of transaction IDs is created
by scanning all the entries in ProcArray. However it happens that its
logic never counted on the fact that two-phase transactions finishing to
prepare can put ProcArray in a state where there are two entries with
the same transaction ID, one for the initial transaction which gets
cleared when prepare finishes, and a second, dummy, entry to track that
the transaction is still running after prepare finishes. This way
ensures a continuous presence of the transaction so as callers of for
example TransactionIdIsInProgress() are always able to see it as alive.

So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
transaction finishes to prepare, the record can finish with duplicated
XIDs, which is a state expected by design. If this record gets applied
on a standby to initial its recovery state, then it would simply fail,
so the odds of facing this failure are very low in practice. It would
be tempting to change the generation of XLOG_RUNNING_XACTS so as
duplicates are removed on the source, but this requires to hold on
ProcArrayLock for longer and this would impact all workloads,
particularly those using heavily two-phase transactions.

XLOG_RUNNING_XACTS is also actually used only to initialize the standby
state at recovery, so instead the solution is taken to discard
duplicates when applying the initial snapshot.

Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: /messages/by-id/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru
Backpatch-through: 9.3

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1df21ddb19c6e764fc9378c900515a5d642ad820

Modified Files
--------------
src/backend/storage/ipc/procarray.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

#2Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#1)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On 2018-10-14 13:26:24 +0000, Michael Paquier wrote:

Avoid duplicate XIDs at recovery when building initial snapshot

On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
periodic basis to allow recovery to build the initial state of
transactions for a hot standby. The set of transaction IDs is created
by scanning all the entries in ProcArray. However it happens that its
logic never counted on the fact that two-phase transactions finishing to
prepare can put ProcArray in a state where there are two entries with
the same transaction ID, one for the initial transaction which gets
cleared when prepare finishes, and a second, dummy, entry to track that
the transaction is still running after prepare finishes. This way
ensures a continuous presence of the transaction so as callers of for
example TransactionIdIsInProgress() are always able to see it as alive.

So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
transaction finishes to prepare, the record can finish with duplicated
XIDs, which is a state expected by design. If this record gets applied
on a standby to initial its recovery state, then it would simply fail,
so the odds of facing this failure are very low in practice. It would
be tempting to change the generation of XLOG_RUNNING_XACTS so as
duplicates are removed on the source, but this requires to hold on
ProcArrayLock for longer and this would impact all workloads,
particularly those using heavily two-phase transactions.

XLOG_RUNNING_XACTS is also actually used only to initialize the standby
state at recovery, so instead the solution is taken to discard
duplicates when applying the initial snapshot.

Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: /messages/by-id/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru
Backpatch-through: 9.3

I'm unhappy this approach was taken over objections. Without a real
warning. Even leaving the crummyness aside, did you check other users
of XLOG_RUNNING_XACTS, e.g. logical decoding?

Greetings,

Andres Freund

#3Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#2)
1 attachment(s)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

(moving to -hackers)

On Sun, Oct 14, 2018 at 10:42:40AM -0700, Andres Freund wrote:

I'm unhappy this approach was taken over objections. Without a real
warning.

Oops, that was not clear to me. Sorry about that! I did not see you
objecting again after the last arguments I raised. The end of PREPARE
TRANSACTION is designed like that since it has been introduced, so
changing the way the dummy GXACT is inserted before the main one is
cleared from its XID does not sound wise to me. The current design is
also present for a couple of reasons, please see this thread:
/messages/by-id/13905.1119109281@sss.pgh.pa.us
This has resulted in e26b0abd.

Among the things I thought are:
- Clearing the XID at the same time the dummy entry is added, which
actually means to hold on ProcArrayLock longer while doing more at the
end of prepare. I actually don't think you can do that cleanly without
endangering the transaction visibility for other backends, and syncrep
may cause the window to get wider.
- Changing GetRunningTransactionData so as duplicates are removed at
this stage. However this also requires to hold ProcArrayLock for
longer. For most deployments, if no dummy entries from 2PC transactions
are present it would be possible to bypass the checks to remove the
duplicated entries, but if at least one dummy entry is found it would be
necessary to scan again the whole ProcArray, which would be most likely
the case at each checkpoint with workloads like what Konstantin has
mentioned in the original report. And ProcArrayLock is already a point
of contention for many OLTP workloads with small transactions. So the
performance argument worries me.

Speaking of which, I have looked at the performance of qsort, and for a
couple of thousand entries, we may not see any impact. But I am not
confident enough to say that it would be OK for all platforms each time
a standby snapshot is taken when ordering works on 4-bytes elements, so
the performance argument from Konstantin seems quite sensible to me (see
the quickly-hacked qsort_perf.c attached).

Even leaving the crummyness aside, did you check other users of
XLOG_RUNNING_XACTS, e.g. logical decoding?

I actually spent some time checking that, so it is not innocent.
SnapBuildWaitSnapshot() waits for transactions to commit or abort based
on the XIDs in the record. And that's the only place where those XIDs
are used so it won't matter to wait twice for the same transaction to
finish. The error callback would be used only once.
--
Michael

Attachments:

qsort_perf.ctext/x-csrc; charset=us-asciiDownload
#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#2)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On 2018-Oct-14, Andres Freund wrote:

On 2018-10-14 13:26:24 +0000, Michael Paquier wrote:

Avoid duplicate XIDs at recovery when building initial snapshot

I'm unhappy this approach was taken over objections. Without a real
warning. Even leaving the crummyness aside, did you check other users
of XLOG_RUNNING_XACTS, e.g. logical decoding?

Mumble. Is there a real problem here -- I mean, did this break logical
decoding? Maybe we need some more tests to ensure this stuff works
sanely for logical decoding.

FWIW and not directly related, I recently became aware (because of a
customer question) that txid_current_snapshot() is oblivious of
XLOG_RUNNING_XACTS in a standby. AFAICS it's not a really serious
concern, but it was surprising.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#4)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

Hi,

On 2018-10-22 12:36:25 -0300, Alvaro Herrera wrote:

On 2018-Oct-14, Andres Freund wrote:

On 2018-10-14 13:26:24 +0000, Michael Paquier wrote:

Avoid duplicate XIDs at recovery when building initial snapshot

I'm unhappy this approach was taken over objections. Without a real
warning. Even leaving the crummyness aside, did you check other users
of XLOG_RUNNING_XACTS, e.g. logical decoding?

Mumble. Is there a real problem here -- I mean, did this break logical
decoding? Maybe we need some more tests to ensure this stuff works
sanely for logical decoding.

Hm? My point is that this fix just puts a band-aid onto *one* of the
places that read a XLOG_RUNNING_XACTS. Which still leaves the contents
of WAL record corrupted. There's not even a note at the WAL-record's
definition or its logging denoting that the contents are not what you'd
expect. I don't mean that the fix would break logical decoding, but
that it's possible that an equivalent of the problem affecting hot
standby also affects logical decoding. And even leaving those two users
aside, it's possible that there will be further vulernable internal
users or extensions parsing the WAL.

FWIW and not directly related, I recently became aware (because of a
customer question) that txid_current_snapshot() is oblivious of
XLOG_RUNNING_XACTS in a standby. AFAICS it's not a really serious
concern, but it was surprising.

That's more fundamental than just XLOG_RUNNING_XACTS though, no? The
whole way running transactions (i.e. including those that are just
detected by looking at their xid) are handled in the known xids struct
and in snapshots seems incompatible with that, no?

Greetings,

Andres Freund

#6Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#3)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

Hi,

On 2018-10-22 12:03:26 +0900, Michael Paquier wrote:

(moving to -hackers)

On Sun, Oct 14, 2018 at 10:42:40AM -0700, Andres Freund wrote:

I'm unhappy this approach was taken over objections. Without a real
warning.

Oops, that was not clear to me. Sorry about that! I did not see you
objecting again after the last arguments I raised. The end of PREPARE
TRANSACTION is designed like that since it has been introduced, so
changing the way the dummy GXACT is inserted before the main one is
cleared from its XID does not sound wise to me. The current design is
also present for a couple of reasons, please see this thread:
/messages/by-id/13905.1119109281@sss.pgh.pa.us
This has resulted in e26b0abd.

None of them explains why having "corrupt" WAL that's later fixed up is
a good idea.

Among the things I thought are:
- Clearing the XID at the same time the dummy entry is added, which
actually means to hold on ProcArrayLock longer while doing more at the
end of prepare. I actually don't think you can do that cleanly without
endangering the transaction visibility for other backends, and syncrep
may cause the window to get wider.

- Changing GetRunningTransactionData so as duplicates are removed at
this stage. However this also requires to hold ProcArrayLock for
longer.

That's *MUCH* better than what we have right
now. GetRunningTransactionData() isn't called all that often, for once,
and for another the WAL then is correct.

Even leaving the crummyness aside, did you check other users of
XLOG_RUNNING_XACTS, e.g. logical decoding?

I actually spent some time checking that, so it is not innocent.

"innocent"?

SnapBuildWaitSnapshot() waits for transactions to commit or abort based
on the XIDs in the record. And that's the only place where those XIDs
are used so it won't matter to wait twice for the same transaction to
finish. The error callback would be used only once.

Right. We used to use it more (and it'd probably caused problems), but
since 955a684e0401954a58e956535107bc4b7136d952 it should be ok...

Greetings,

Andres Freund

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#5)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On 2018-Oct-22, Andres Freund wrote:

Hi,

On 2018-10-22 12:36:25 -0300, Alvaro Herrera wrote:

On 2018-Oct-14, Andres Freund wrote:

On 2018-10-14 13:26:24 +0000, Michael Paquier wrote:

Avoid duplicate XIDs at recovery when building initial snapshot

I'm unhappy this approach was taken over objections. Without a real
warning. Even leaving the crummyness aside, did you check other users
of XLOG_RUNNING_XACTS, e.g. logical decoding?

Mumble. Is there a real problem here -- I mean, did this break logical
decoding? Maybe we need some more tests to ensure this stuff works
sanely for logical decoding.

Hm? My point is that this fix just puts a band-aid onto *one* of the
places that read a XLOG_RUNNING_XACTS. Which still leaves the contents
of WAL record corrupted. There's not even a note at the WAL-record's
definition or its logging denoting that the contents are not what you'd
expect. I don't mean that the fix would break logical decoding, but
that it's possible that an equivalent of the problem affecting hot
standby also affects logical decoding. And even leaving those two users
aside, it's possible that there will be further vulernable internal
users or extensions parsing the WAL.

Ah! I misinterpreted what you were saying. I agree we shouldn't let
the WAL message have wrong data. Of course we shouldn't just add a
code comment stating that the data is wrong :-)

FWIW and not directly related, I recently became aware (because of a
customer question) that txid_current_snapshot() is oblivious of
XLOG_RUNNING_XACTS in a standby. AFAICS it's not a really serious
concern, but it was surprising.

That's more fundamental than just XLOG_RUNNING_XACTS though, no? The
whole way running transactions (i.e. including those that are just
detected by looking at their xid) are handled in the known xids struct
and in snapshots seems incompatible with that, no?

hmm ... as I recall, txid_current_snapshot also only considers xacts by
xid, so read-only xacts are not considered -- seems to me that if you
think of snapshots in a general way, you're right, but for whatever you
want txid_current_snapshot for (and I don't quite remember what that is)
then it's not really important.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#7)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Mon, Oct 22, 2018 at 07:15:38PM -0300, Alvaro Herrera wrote:

On 2018-Oct-22, Andres Freund wrote:

Hm? My point is that this fix just puts a band-aid onto *one* of the
places that read a XLOG_RUNNING_XACTS. Which still leaves the contents
of WAL record corrupted. There's not even a note at the WAL-record's
definition or its logging denoting that the contents are not what you'd
expect. I don't mean that the fix would break logical decoding, but
that it's possible that an equivalent of the problem affecting hot
standby also affects logical decoding. And even leaving those two users
aside, it's possible that there will be further vulernable internal
users or extensions parsing the WAL.

Ah! I misinterpreted what you were saying. I agree we shouldn't let
the WAL message have wrong data. Of course we shouldn't just add a
code comment stating that the data is wrong :-)

Well, following the same kind of thoughts, txid_current_snapshot() uses
sort_snapshot() to remove all the duplicates after fetching its data
from GetSnapshotData(), so wouldn't we want to do something about
removal of duplicates if dummy PGXACT entries are found while scanning
the ProcArray also in this case? What I would think we should do is not
only to patch GetRunningTransactionData() but also GetSnapshotData() so
as we don't have duplicates also in this case, and do things in such a
way that both code paths use the same logic, and that we don't need to
have sort_snapshot() anymore. That would be more costly though...
--
Michael

#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
1 attachment(s)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote:

Well, following the same kind of thoughts, txid_current_snapshot() uses
sort_snapshot() to remove all the duplicates after fetching its data
from GetSnapshotData(), so wouldn't we want to do something about
removal of duplicates if dummy PGXACT entries are found while scanning
the ProcArray also in this case? What I would think we should do is not
only to patch GetRunningTransactionData() but also GetSnapshotData() so
as we don't have duplicates also in this case, and do things in such a
way that both code paths use the same logic, and that we don't need to
have sort_snapshot() anymore. That would be more costly though...

My apologies it took a bit longer than I thought. I got a patch on my
desk for a couple of days, and finally took the time to finish something
which would address the concerns raised here. As long as we don't reach
more than hundreds of thousands of entries, there is not going to be any
performance impact. So what I do in the attached is to revert 1df21ddb,
and then have GetRunningTransactionData() sort the XIDs in the snapshot
and remove duplicates only if at least one dummy proc entry is found
while scanning, for xids and subxids. This way, there is no need to
impact most of the instance deployments with the extra sort/removal
phase as most don't use two-phase transactions. The sorting done at
recovery when initializing the standby snapshot still needs to happen of
course.

The patch is added to the upcoming CF for review.

Thanks,
--
Michael

Attachments:

duplicate-xid-snapshot-v1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 908f62d37e..625c8ddf48 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -799,21 +799,10 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
 
 		/*
-		 * Add the sorted snapshot into KnownAssignedXids.  The running-xacts
-		 * snapshot may include duplicated xids because of prepared
-		 * transactions, so ignore them.
+		 * Add the sorted snapshot into KnownAssignedXids.
 		 */
 		for (i = 0; i < nxids; i++)
-		{
-			if (i > 0 && TransactionIdEquals(xids[i - 1], xids[i]))
-			{
-				elog(DEBUG1,
-					 "found duplicated transaction %u for KnownAssignedXids insertion",
-					 xids[i]);
-				continue;
-			}
 			KnownAssignedXidsAdd(xids[i], xids[i], true);
-		}
 
 		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	}
@@ -1924,10 +1913,9 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
  * This is never executed during recovery so there is no need to look at
  * KnownAssignedXids.
  *
- * Dummy PGXACTs from prepared transaction are included, meaning that this
- * may return entries with duplicated TransactionId values coming from
- * transaction finishing to prepare.  Nothing is done about duplicated
- * entries here to not hold on ProcArrayLock more than necessary.
+ * If dummy entries for prepared transactions are found in the proc array
+ * make sure to discard any duplicates as a transaction finishing to
+ * prepare may cause two entries with the same TransactionId to be found.
  *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
@@ -1951,6 +1939,7 @@ GetRunningTransactionData(void)
 	int			count;
 	int			subcount;
 	bool		suboverflowed;
+	bool		found_dummy = false;
 
 	Assert(!RecoveryInProgress());
 
@@ -1999,6 +1988,7 @@ GetRunningTransactionData(void)
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		volatile PGPROC *proc = &allProcs[pgprocno];
 		TransactionId xid;
 
 		/* Fetch xid just once - see GetNewTransactionId */
@@ -2022,6 +2012,13 @@ GetRunningTransactionData(void)
 		if (pgxact->overflowed)
 			suboverflowed = true;
 
+		/*
+		 * Mark that a dummy entry has been found, and that it is necessary to
+		 * check for duplicates entries.
+		 */
+		if (proc->pid == 0)
+			found_dummy = true;
+
 		/*
 		 * If we wished to exclude xids this would be the right place for it.
 		 * Procs with the PROC_IN_VACUUM flag set don't usually assign xids,
@@ -2033,6 +2030,34 @@ GetRunningTransactionData(void)
 		xids[count++] = xid;
 	}
 
+	/*
+	 * If some dummy entries have been found, it is possible that a two-phase
+	 * transaction just finishing to prepare has a duplicated entry with still
+	 * the active transaction entry in the procArray. There is no need to
+	 * remove any duplicates if no dummy entries have been found.
+	 */
+	if (found_dummy && count > 1)
+	{
+		TransactionId last = 0;
+		int			idx1,
+					idx2;
+		int			former_count;
+
+		qsort(xids, count, sizeof(TransactionId), xidComparator);
+
+		/* remove duplicate xids */
+		former_count = count;
+		idx1 = idx2 = 0;
+		while (idx1 < former_count)
+		{
+			if (TransactionIdEquals(xids[idx1], last))
+				last = xids[idx2++] = xids[idx1];
+			else
+				count--;
+			idx1++;
+		}
+	}
+
 	/*
 	 * Spin over procArray collecting all subxids, but only if there hasn't
 	 * been a suboverflow.
@@ -2065,6 +2090,40 @@ GetRunningTransactionData(void)
 				 */
 			}
 		}
+
+		/*
+		 * Per the first array scanning, we know that there are dummy proc
+		 * array entries, so sort and scan again the result, removing this
+		 * time all subxids.
+		 */
+		if (found_dummy && subcount > 1)
+		{
+			TransactionId last = 0;
+			int			idx1,
+						idx2;
+			int			former_count;
+
+			qsort(&xids[count], count - subcount,
+				  sizeof(TransactionId), xidComparator);
+
+			/*
+			 * Remove duplicate subxids, beginning from the position they have
+			 * been added.
+			 */
+			former_count = count;
+			idx1 = idx2 = count - subcount;
+			while (idx1 < former_count)
+			{
+				if (TransactionIdEquals(xids[idx1], last))
+					last = xids[idx2++] = xids[idx1];
+				else
+				{
+					count--;
+					subcount--;
+				}
+				idx1++;
+			}
+		}
 	}
 
 	/*
#10Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#9)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Thu, Nov 1, 2018 at 7:09 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote:

Well, following the same kind of thoughts, txid_current_snapshot() uses
sort_snapshot() to remove all the duplicates after fetching its data
from GetSnapshotData(), so wouldn't we want to do something about
removal of duplicates if dummy PGXACT entries are found while scanning
the ProcArray also in this case? What I would think we should do is not
only to patch GetRunningTransactionData() but also GetSnapshotData() so
as we don't have duplicates also in this case, and do things in such a
way that both code paths use the same logic, and that we don't need to
have sort_snapshot() anymore. That would be more costly though...

My apologies it took a bit longer than I thought. I got a patch on my
desk for a couple of days, and finally took the time to finish something
which would address the concerns raised here. As long as we don't reach
more than hundreds of thousands of entries, there is not going to be any
performance impact. So what I do in the attached is to revert 1df21ddb,
and then have GetRunningTransactionData() sort the XIDs in the snapshot
and remove duplicates only if at least one dummy proc entry is found
while scanning, for xids and subxids. This way, there is no need to
impact most of the instance deployments with the extra sort/removal
phase as most don't use two-phase transactions. The sorting done at
recovery when initializing the standby snapshot still needs to happen of
course.

The patch is added to the upcoming CF for review.

Unfortunately, patch has some conflict with the current master. Could you
please post a rebased version?

#11Simon Riggs
simon@2ndquadrant.com
In reply to: Michael Paquier (#9)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Thu, 1 Nov 2018 at 06:09, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote:

Well, following the same kind of thoughts, txid_current_snapshot() uses
sort_snapshot() to remove all the duplicates after fetching its data
from GetSnapshotData(), so wouldn't we want to do something about
removal of duplicates if dummy PGXACT entries are found while scanning
the ProcArray also in this case? What I would think we should do is not
only to patch GetRunningTransactionData() but also GetSnapshotData() so
as we don't have duplicates also in this case, and do things in such a
way that both code paths use the same logic, and that we don't need to
have sort_snapshot() anymore. That would be more costly though...

My apologies it took a bit longer than I thought. I got a patch on my
desk for a couple of days, and finally took the time to finish something
which would address the concerns raised here. As long as we don't reach
more than hundreds of thousands of entries, there is not going to be any
performance impact. So what I do in the attached is to revert 1df21ddb,
and then have GetRunningTransactionData() sort the XIDs in the snapshot
and remove duplicates only if at least one dummy proc entry is found
while scanning, for xids and subxids. This way, there is no need to
impact most of the instance deployments with the extra sort/removal
phase as most don't use two-phase transactions. The sorting done at
recovery when initializing the standby snapshot still needs to happen of
course.

The patch is added to the upcoming CF for review.

1df21ddb looks OK to me and was simple enough to backpatch safely.

Seems excessive to say that the WAL record is corrupt, it just contains
duplicates, just as exported snapshots do. There's a few other imprecise
things around in WAL, that is why we need the RunningXact data in the first
place. So we have a choice of whether to remove the duplicates eagerly or
lazily.

For GetRunningTransactionData(), we can do eager or lazy, since its not a
foreground process. I don't object to changing it to be eager in this path,
but this patch is more complex than 1df21ddb and I don't think we should
backpatch this change, assuming it is acceptable.

This patch doesn't do it, but the suggestion that we touch
GetSnapshotData() in the same way so we de-duplicate eagerly is a different
matter and would need careful performance testing to ensure we don't slow
down 2PC users.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Michael Paquier
michael@paquier.xyz
In reply to: Simon Riggs (#11)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Fri, Nov 30, 2018 at 02:55:47PM +0000, Simon Riggs wrote:

1df21ddb looks OK to me and was simple enough to backpatch safely.

Thanks for the feedback!

Seems excessive to say that the WAL record is corrupt, it just contains
duplicates, just as exported snapshots do. There's a few other imprecise
things around in WAL, that is why we need the RunningXact data in the first
place. So we have a choice of whether to remove the duplicates eagerly or
lazily.

For GetRunningTransactionData(), we can do eager or lazy, since its not a
foreground process. I don't object to changing it to be eager in this path,
but this patch is more complex than 1df21ddb and I don't think we should
backpatch this change, assuming it is acceptable.

Yes, I would avoid a backpatch for this more complicated one, and
we need a solution for already-generated WAL. It is not complicated to
handle duplicates for xacts and subxacts however holding ProcArrayLock
for a longer time stresses me as it is already a bottleneck.

This patch doesn't do it, but the suggestion that we touch
GetSnapshotData() in the same way so we de-duplicate eagerly is a different
matter and would need careful performance testing to ensure we don't slow
down 2PC users.

Definitely. That's a much hotter code path. I am not completely sure
if that's an effort worth pursuing either..
--
Michael

#13Simon Riggs
simon@2ndquadrant.com
In reply to: Michael Paquier (#12)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Fri, 30 Nov 2018 at 23:08, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Nov 30, 2018 at 02:55:47PM +0000, Simon Riggs wrote:

1df21ddb looks OK to me and was simple enough to backpatch safely.

Thanks for the feedback!

Seems excessive to say that the WAL record is corrupt, it just contains
duplicates, just as exported snapshots do. There's a few other imprecise
things around in WAL, that is why we need the RunningXact data in the

first

place. So we have a choice of whether to remove the duplicates eagerly or
lazily.

For GetRunningTransactionData(), we can do eager or lazy, since its not a
foreground process. I don't object to changing it to be eager in this

path,

but this patch is more complex than 1df21ddb and I don't think we should
backpatch this change, assuming it is acceptable.

Yes, I would avoid a backpatch for this more complicated one, and
we need a solution for already-generated WAL.

Yes, that is an important reason not to backpatch.

It is not complicated to
handle duplicates for xacts and subxacts however holding ProcArrayLock
for a longer time stresses me as it is already a bottleneck.

I hadn't realised this patch holds ProcArrayLock while removing duplicates.
Now I know I vote against applying this patch unless someone can show that
the performance effects of doing so are negligable, which I doubt.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14Michael Paquier
michael@paquier.xyz
In reply to: Simon Riggs (#13)
Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

On Sat, Dec 01, 2018 at 10:51:10AM +0000, Simon Riggs wrote:

On Fri, 30 Nov 2018 at 23:08, Michael Paquier <michael@paquier.xyz> wrote:

It is not complicated to
handle duplicates for xacts and subxacts however holding ProcArrayLock
for a longer time stresses me as it is already a bottleneck.

I hadn't realised this patch holds ProcArrayLock while removing duplicates.
Now I know I vote against applying this patch unless someone can show that
the performance effects of doing so are negligable, which I doubt.

Me too after more thoughts on that. Please note that I have marked the
patch as returned with feedback for now.
--
Michael