From f3a3f0d088576e52534f13fc3cd3c23b409d8a9f Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Thu, 9 Mar 2017 10:04:20 +0800
Subject: [PATCH 1/3] Fix off-by-one in logical slot resource retention

START_REPLICATION ... LOGICAL, the snapshot builder's
SnapBuildXactNeedsSkip(...), etc treat a LSN that exactly equals the start of a
commit record as being inclusive, i.e. we should process that commit and replay
it to the client. If the client does not specify a commit, the start value used
is the replication slot's confirmed_flush lsn, so "confirmed flush" really
means "confirmed flushed up to and not including" - we're replayng the
transaction so we're assuming the client does not already have it.

This is consistent with usage elsewhere, such as walsender's sentPtr meaning
"sent up to but not including this point".

However, LogicalIncreaseXminForSlot(...) and
LogicalIncreaseRestartDecodingForSlot(...) treat confirmed_flush as meaning
"client has flushed everything up to and including this LSN" so they may remove
resources needed by a commit that started at exactly confirmed_flush.

This isn't a big issue in practice. Typically a walsender-using client will
promptly send standby status updates that advance confirmed_flush past the
start-of-commit LSN. The SQL interface advances the replication identifier
to the LSN of end-of-commit + 1, so it is unaffected.
---
 src/backend/replication/logical/logical.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5529ac8..7e03f33 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -778,8 +778,11 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 	 * If the client has already confirmed up to this lsn, we directly can
 	 * mark this as accepted. This can happen if we restart decoding in a
 	 * slot.
+	 *
+	 * confirmed_flush actually points to the first unconfirmed LSN, so we must
+	 * not remove resources exactly at confirmed_flush.
 	 */
-	else if (current_lsn <= slot->data.confirmed_flush)
+	else if (current_lsn < slot->data.confirmed_flush)
 	{
 		slot->candidate_catalog_xmin = xmin;
 		slot->candidate_xmin_lsn = current_lsn;
@@ -833,8 +836,12 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart
 	/*
 	 * We might have already flushed far enough to directly accept this lsn,
 	 * in this case there is no need to check for existing candidate LSNs
+	 *
+	 * The < is intentional, confirmed_flush points to the LSN immediately
+	 * after the point confirmed on-disk by the client so we mustn't remove
+	 * resources for commits exactly at confirmed_flush.
 	 */
-	else if (current_lsn <= slot->data.confirmed_flush)
+	else if (current_lsn < slot->data.confirmed_flush)
 	{
 		slot->candidate_restart_valid = current_lsn;
 		slot->candidate_restart_lsn = restart_lsn;
-- 
2.5.5

