A few patches to clarify snapshot management

Started by Heikki Linnakangasabout 1 year ago13 messages

hlinnaka@iki.fi

about 1 year ago

4 attachment(s)

While working on the CSN snapshot patch, I got sidetracked looking
closer into the snapshot tracking in snapmgr.c. Attached are a few
patches to clarify some things.

# Patch 1: Remove unnecessary GetTransactionSnapshot() calls and FIXME
comments

In commit dc7420c2c927, Andres added FIXME comments like these in a few
places:

autovacuum.c, get_database_list(void):

/*
* Start a transaction so we can access pg_database, and get a snapshot.
* We don't have a use for the snapshot itself, but we're interested in
* the secondary effect that it sets RecentGlobalXmin. (This is critical
* for anything that reads heap pages, because HOT may decide to prune
* them even if the process doesn't attempt to modify any tuples.)
*
* FIXME: This comment is inaccurate / the code buggy. A snapshot that is
* not pushed/active does not reliably prevent HOT pruning (->xmin could
* e.g. be cleared when cache invalidations are processed).
*/
StartTransactionCommand();
(void) GetTransactionSnapshot();

Those GetTransactionSnapshot() calls are unnecessary, because we hold
onto registered copy of CatalogSnapshot throughout the catalog scans.
This patch removes those unnecessary calls, and the FIXMEs.

# Patch 2: Assert that a snapshot is active or registered before it's used

GetTransactionSnapshot() comment said:

* Note that the return value may point at static storage that will be modified
* by future calls and by CommandCounterIncrement(). Callers should call
* RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
* used very long.

That's pretty vague. Firstly, it says the returned value _may_ point to
static storage, but ISTM it _always_ does, if you interpret "static
storage" liberally. Some callers actually rely on the fact that you can
call GetTransactionSnapshot() and throw away the result without having a
leak. So I propose rewording that to "return value points at static
storage", rather than just "may point".

In REPEATABLE READ mode, the returned CurrentSnapshot is palloc'd, not a
pointer directly to a static variable, but all calls within the same
transaction return the same palloc'd Snapshot pointer, and will be
modified by CommandCounterIncrement(). From the caller's point of view,
it's like a static.

Secondly, what exactly is "used very long"? It means until the next call
of any of the Get*Snapshot() functions, CommandCounterIncrement(), or
anything that might call SnapshotResetXmin() like PopActiveSnapshot().
Given how complicated that gets, I feel it's dangerous to do pretty much
anything else than immediately call PushActiveSnapshot() or
RegisterSnapshot() with it. To try to enforce that, this patch adds an
assertion in HeapTupleSatisfiesMVCC() that the snapshot must be
registered or pushed active. That's not a very accurate check of that
stricter rule: some callers were violating the new assertion and had
comments to explain why it was safe, and OTOH it won't catch calls to
those invalidating functions that don't involve visibility checks.

We were violating that assertion in a few places, which were not wrong
and had explaining comments, but this patch changes them to just
register the snapshot instead of explaining why it's safe to skip it.

# Patch 3: Add comment with more details on active snapshots

Now that I have this swapped in my head, I wrote a few paragraphs on how
the active snapshot stack works at high level.

# Patch 4: Add checks that no snapshots are "leaked"

This patch is not to be committed right now, just for discussion.

I'm not very happy with how GetTransactionSnapshot() and friends return
a statically allocated snapshot. The whole "return value should not be
used very long" thing is just so vague. If we changed it to return a
palloc'd snapshot, would we introduce leaks? This patch adds assertions
that every call to GetTransactionSnapshot() is paired with a
PushActiveSnapshot() or RegsiterSnapshot() call, and changes a few
places that were violating that stricter rule. Some of those changes
seem nice anyway, like registering the snapshot in verify_heapam(), even
though they're not strictly necessary today.

A perhaps better way to enforce that would be to replace
GetTransactionSnapshot() with functions that also push or register the
snapshot:

RegisterSnapshot(GetTransactionSnapshot()) -> RegisterTransactionSnapshot()

PushActiveSnapshot(GetTransactionSnapshot()) -> PushTransactionSnapshot()

That function signature would eliminate the concept of a returned
statically-allocated snapshot, and the whole question of what does "used
very long" mean in GetTransactionSnapshot(). Thoughts on that?

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Remove-unnecessary-GetTransactionSnapshot-calls-and-.patchtext/x-patch; charset=UTF-8; name=0001-Remove-unnecessary-GetTransactionSnapshot-calls-and-.patchDownload

From 7a32da753d05819c991d93cce3e3174f5a142238 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 4 Dec 2024 18:10:40 +0200
Subject: [PATCH 1/5] Remove unnecessary GetTransactionSnapshot() calls and
 FIXME comments

In get_database_list() and get_subscription_list(), the
GetTransactionSnapshot() call is not required because the catalog
table scans use the catalog snapshot, which is held until the end of
the scan. See table_beginscan_catalog(), which calls
RegisterSnapshot(GetCatalogSnapshot(relid)).

In InitPostgres, it's a little less obvious that it's not required,
but still true I believe. All the catalog lookups in InitPostgres()
also use the catalog snapshot, and looked up values are copied.

Furthermore, as the removed comments said, calling
GetTransactionSnapshot() didn't really prevent MyProc->xmin from being
reset anyway.
---
 src/backend/postmaster/autovacuum.c        | 11 +----------
 src/backend/replication/logical/launcher.c | 11 +----------
 src/backend/utils/init/postinit.c          | 13 +------------
 3 files changed, 3 insertions(+), 32 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index dc3cf87abab..8078eeef62e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1799,18 +1799,9 @@ get_database_list(void)
 	resultcxt = CurrentMemoryContext;
 
 	/*
-	 * Start a transaction so we can access pg_database, and get a snapshot.
-	 * We don't have a use for the snapshot itself, but we're interested in
-	 * the secondary effect that it sets RecentGlobalXmin.  (This is critical
-	 * for anything that reads heap pages, because HOT may decide to prune
-	 * them even if the process doesn't attempt to modify any tuples.)
-	 *
-	 * FIXME: This comment is inaccurate / the code buggy. A snapshot that is
-	 * not pushed/active does not reliably prevent HOT pruning (->xmin could
-	 * e.g. be cleared when cache invalidations are processed).
+	 * Start a transaction so we can access pg_database.
 	 */
 	StartTransactionCommand();
-	(void) GetTransactionSnapshot();
 
 	rel = table_open(DatabaseRelationId, AccessShareLock);
 	scan = table_beginscan_catalog(rel, 0, NULL);
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index e5fdca8bbf6..8b196420445 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -121,18 +121,9 @@ get_subscription_list(void)
 	resultcxt = CurrentMemoryContext;
 
 	/*
-	 * Start a transaction so we can access pg_database, and get a snapshot.
-	 * We don't have a use for the snapshot itself, but we're interested in
-	 * the secondary effect that it sets RecentGlobalXmin.  (This is critical
-	 * for anything that reads heap pages, because HOT may decide to prune
-	 * them even if the process doesn't attempt to modify any tuples.)
-	 *
-	 * FIXME: This comment is inaccurate / the code buggy. A snapshot that is
-	 * not pushed/active does not reliably prevent HOT pruning (->xmin could
-	 * e.g. be cleared when cache invalidations are processed).
+	 * Start a transaction so we can access pg_subscription.
 	 */
 	StartTransactionCommand();
-	(void) GetTransactionSnapshot();
 
 	rel = table_open(SubscriptionRelationId, AccessShareLock);
 	scan = table_beginscan_catalog(rel, 0, NULL);
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 5b657a3f135..770ab6906e7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -813,16 +813,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
 	}
 
 	/*
-	 * Start a new transaction here before first access to db, and get a
-	 * snapshot.  We don't have a use for the snapshot itself, but we're
-	 * interested in the secondary effect that it sets RecentGlobalXmin. (This
-	 * is critical for anything that reads heap pages, because HOT may decide
-	 * to prune them even if the process doesn't attempt to modify any
-	 * tuples.)
-	 *
-	 * FIXME: This comment is inaccurate / the code buggy. A snapshot that is
-	 * not pushed/active does not reliably prevent HOT pruning (->xmin could
-	 * e.g. be cleared when cache invalidations are processed).
+	 * Start a new transaction here before first access to db.
 	 */
 	if (!bootstrap)
 	{
@@ -837,8 +828,6 @@ InitPostgres(const char *in_dbname, Oid dboid,
 		 * Fortunately, "read committed" is plenty good enough.
 		 */
 		XactIsoLevel = XACT_READ_COMMITTED;
-
-		(void) GetTransactionSnapshot();
 	}
 
 	/*
-- 
2.39.5

0002-Assert-that-a-snapshot-is-active-or-registered-befor.patchtext/x-patch; charset=UTF-8; name=0002-Assert-that-a-snapshot-is-active-or-registered-befor.patchDownload

From f6d9c033a1e686b1600a435c857e728f98901997 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 13 Dec 2024 14:26:07 +0200
Subject: [PATCH 2/5] Assert that a snapshot is active or registered before
 it's used

This is to catch potential bugs where a snapshot might get invalidated
while it's still in use. No such bugs were found by this, but the
advice in GetTransactionSnapshot() that you "should call
RegisterSnapshot or PushActiveSnapshot on the returned snap if it is
to be used very long" felt too unclear to me.

Fix a few cases that were playing fast and loose with that and just
assumed that the snapshot cannot be invalidated during a scan. Those
assumptions were not wrong, but they're not performance critical, so
let's drop the excuses and just register the snapshot. This allows us
to have an assertion in HeapTupleSatisfiesMVCC that the snapshot is
appropriately registered.

Adjust the comment in GetTransactionSnapshot() a little, because in a
few places we rely on the fact that GetTransactionSnapshot() returns a
statically allocated Snapshot; we'd have leaks otherwise.
---
 src/backend/access/heap/heapam_visibility.c |  9 +++++++++
 src/backend/access/index/genam.c            |  8 ++------
 src/backend/commands/dbcommands.c           |  3 ++-
 src/backend/utils/cache/relcache.c          | 15 +++++++++------
 src/backend/utils/time/snapmgr.c            |  8 ++++----
 5 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 9243feed01f..6c0f872c730 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -962,6 +962,15 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 {
 	HeapTupleHeader tuple = htup->t_data;
 
+	/*
+	 * Assert that the caller has registered the snapshot. This function
+	 * doesn't care about the registration as such, but in general you
+	 * shouldn't try to use a snapshot without registration because it might
+	 * get invalidated while it's still in use, and this is a convenient place
+	 * to check for that.
+	 */
+	Assert(snapshot->regd_count > 0 || snapshot->active_count > 0);
+
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 4b4ebff6a17..b12bb6c8d70 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -576,17 +576,13 @@ systable_recheck_tuple(SysScanDesc sysscan, HeapTuple tup)
 
 	Assert(tup == ExecFetchSlotHeapTuple(sysscan->slot, false, NULL));
 
-	/*
-	 * Trust that table_tuple_satisfies_snapshot() and its subsidiaries
-	 * (commonly LockBuffer() and HeapTupleSatisfiesMVCC()) do not themselves
-	 * acquire snapshots, so we need not register the snapshot.  Those
-	 * facilities are too low-level to have any business scanning tables.
-	 */
 	freshsnap = GetCatalogSnapshot(RelationGetRelid(sysscan->heap_rel));
+	freshsnap = RegisterSnapshot(freshsnap);
 
 	result = table_tuple_satisfies_snapshot(sysscan->heap_rel,
 											sysscan->slot,
 											freshsnap);
+	UnregisterSnapshot(freshsnap);
 
 	/*
 	 * Handle the concurrent abort while fetching the catalog tuple during
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index aa91a396967..034c5938c66 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -288,7 +288,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 	 * snapshot - or the active snapshot - might not be new enough for that,
 	 * but the return value of GetLatestSnapshot() should work fine.
 	 */
-	snapshot = GetLatestSnapshot();
+	snapshot = RegisterSnapshot(GetLatestSnapshot());
 
 	/* Process the relation block by block. */
 	for (blkno = 0; blkno < nblocks; blkno++)
@@ -313,6 +313,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 
 		UnlockReleaseBuffer(buf);
 	}
+	UnregisterSnapshot(snapshot);
 
 	/* Release relation lock. */
 	UnlockRelationId(&relid, AccessShareLock);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 422509f18d7..dc299ccb760 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -371,14 +371,13 @@ ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic)
 	pg_class_desc = table_open(RelationRelationId, AccessShareLock);
 
 	/*
-	 * The caller might need a tuple that's newer than the one the historic
-	 * snapshot; currently the only case requiring to do so is looking up the
-	 * relfilenumber of non mapped system relations during decoding. That
-	 * snapshot can't change in the midst of a relcache build, so there's no
-	 * need to register the snapshot.
+	 * The caller might need a tuple that's newer than what's visible to the
+	 * historic snapshot; currently the only case requiring to do so is
+	 * looking up the relfilenumber of non mapped system relations during
+	 * decoding.
 	 */
 	if (force_non_historic)
-		snapshot = GetNonHistoricCatalogSnapshot(RelationRelationId);
+		snapshot = RegisterSnapshot(GetNonHistoricCatalogSnapshot(RelationRelationId));
 
 	pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
 									   indexOK && criticalRelcachesBuilt,
@@ -395,6 +394,10 @@ ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic)
 
 	/* all done */
 	systable_endscan(pg_class_scan);
+
+	if (snapshot)
+		UnregisterSnapshot(snapshot);
+
 	table_close(pg_class_desc, AccessShareLock);
 
 	return pg_class_tuple;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index a1a0c2adeb6..3c408762728 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -203,10 +203,10 @@ typedef struct SerializedSnapshotData
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
  *
- * Note that the return value may point at static storage that will be modified
- * by future calls and by CommandCounterIncrement().  Callers should call
- * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
- * used very long.
+ * Note that the return value points at static storage that will be modified
+ * by future calls and by CommandCounterIncrement().  Callers must call
+ * RegisterSnapshot or PushActiveSnapshot on the returned snap before doing
+ * any other non-trivial work that could invalidate it.
  */
 Snapshot
 GetTransactionSnapshot(void)
-- 
2.39.5

0003-Add-comment-with-more-details-on-active-snapshots.patchtext/x-patch; charset=UTF-8; name=0003-Add-comment-with-more-details-on-active-snapshots.patchDownload

From 0d56ca03b2f290ab8e38e775e8d0b19631233ea0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 11 Dec 2024 14:22:54 +0200
Subject: [PATCH 3/5] Add comment with more details on active snapshots

---
 src/backend/utils/time/snapmgr.c | 55 ++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 3c408762728..05f16666192 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -3,11 +3,66 @@
  * snapmgr.c
  *		PostgreSQL snapshot manager
  *
+ * The following functions return a snapshot that can be used in visibility
+ * checks:
+ *
+ * - GetTransactionSnapshot
+ * - GetLatestSnapshot
+ * - GetCatalogSnapshot
+ * - GetNonHistoricCatalogSnapshot
+ *
+ * All of these functions return a reference to a statically allocated
+ * snapshot, which must be copied and registered by calling
+ * PushActiveSnapshot() or RegisterSnapshot() before use.
+ *
+ * In addition to the above, there are some special snapshots, like
+ * SnapshotSelf, SnapshotAny, and "dirty" snapshots.
+ *
  * We keep track of snapshots in two ways: those "registered" by resowner.c,
  * and the "active snapshot" stack.  All snapshots in either of them live in
  * persistent memory.  When a snapshot is no longer in any of these lists
  * (tracked by separate refcounts on each snapshot), its memory can be freed.
  *
+ * ActiveSnapshot stack
+ * --------------------
+ *
+ * Most visibility checks use the current "active snapshot".  When running
+ * normal queries, the active snapshot is set when query execution begins,
+ * depending on transaction isolation level.
+ *
+ * The active snapshot is tracked in a stack, so that the currently active one
+ * is at the top of the stack. It mirrors the process call stack: whenever we
+ * recurse or switch context to fetch rows from a different portal for
+ * example, the appropriate snapshot is pushed to become the active snapshot,
+ * and popped on return.  Once upon a time, ActiveSnapshot was just a global
+ * variable that was saved and restored similar to CurrentMemoryContext, but
+ * nowadays it's managed as a separate data structure so that we can keep
+ * track of which snapshots are in use and reset MyProc->xmin when there is no
+ * active snapshot.
+ *
+ * However, there are a couple of exceptions where the active snapshot stack
+ * does not strictly mirror the call stack:
+ *
+ * - VACUUM and a few other utility commands manage their own transactions,
+ *   which take their own snapshots.  They are called with an active snapshot
+ *   set, like most utility commands, but they pop the active snapshot that
+ *   was pushed by the caller. PortalRunUtility knows about the possibility
+ *   that the snapshot it pushed is no longer active on return.
+ *
+ * - When COMMIT or ROLLBACK is executed within a procedure or DO-block, the
+ *   active snapshot stack is destroyed, and re-established later when
+ *   subsequent statements in the procedure are executed.  There are many
+ *   limitations on when in-procedure COMMIT/ROLLBACK is allowed; one such
+ *   limitation is that all the snapshots on the active snapshot stack are
+ *   known to portals that are being executed, which makes it safe to reset
+ *   the stack.  See EnsurePortalSnapshotExists().
+ *
+ * Registered snapshots
+ * --------------------
+ *
+ * In addition to snapshots pushed to the active snapshot stack, a snapshot
+ * can be registered with a resource owner.
+ *
  * The FirstXactSnapshot, if any, is treated a bit specially: we increment its
  * regd_count and list it in RegisteredSnapshots, but this reference is not
  * tracked by a resource owner. We used to use the TopTransactionResourceOwner
-- 
2.39.5

0004-WIP-Add-checks-that-no-snapshots-are-leaked.patchtext/x-patch; charset=UTF-8; name=0004-WIP-Add-checks-that-no-snapshots-are-leaked.patchDownload

From 6b66d743b5113fe504260b6a230ae23d002c4d66 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 11 Dec 2024 23:59:49 +0200
Subject: [PATCH 4/5] WIP: Add checks that no snapshots are "leaked"

GetTransactionSnapshot() and friends currently return a pointer to a
statically allocated SnapshotData. That makes it OK to call
GetTransactionSnapshot() and throw away the result, without leaking
the snapshot. The comment on GetTransactionSnapshot() said that you
"the returned value *may* point to static storage", but we were
actually relying on that in a few places, to not leak.

This adds an assertion that every call to GetTransactionSnapshot() is
paired with a call to PushActiveSnapshot() or RegisterSnapshot(). With
this, GetTransactionSnapshot() coult return a dynamically allocated
Snapshot without leaking.
---
 contrib/amcheck/verify_heapam.c        | 16 +++++++----
 src/backend/access/transam/parallel.c  |  4 ++-
 src/backend/executor/execReplication.c |  6 ++--
 src/backend/executor/spi.c             |  3 +-
 src/backend/storage/lmgr/predicate.c   |  7 +++--
 src/backend/tcop/pquery.c              | 24 ++++++++--------
 src/backend/utils/adt/ri_triggers.c    |  9 ++++--
 src/backend/utils/time/snapmgr.c       | 39 +++++++++++++++++++++++++-
 8 files changed, 81 insertions(+), 27 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index e16557aca36..b0412e3064e 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -231,6 +231,7 @@ verify_heapam(PG_FUNCTION_ARGS)
 	BlockNumber last_block;
 	BlockNumber nblocks;
 	const char *skip;
+	Snapshot	snapshot;
 
 	/* Check supplied arguments */
 	if (PG_ARGISNULL(0))
@@ -272,12 +273,6 @@ verify_heapam(PG_FUNCTION_ARGS)
 	ctx.cached_xid = InvalidTransactionId;
 	ctx.toasted_attributes = NIL;
 
-	/*
-	 * Any xmin newer than the xmin of our snapshot can't become all-visible
-	 * while we're running.
-	 */
-	ctx.safe_xmin = GetTransactionSnapshot()->xmin;
-
 	/*
 	 * If we report corruption when not examining some individual attribute,
 	 * we need attnum to be reported as NULL.  Set that up before any
@@ -338,6 +333,13 @@ verify_heapam(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 	}
 
+	/*
+	 * Any xmin newer than the xmin of our snapshot can't become all-visible
+	 * while we're running.
+	 */
+	snapshot = RegisterSnapshot(GetTransactionSnapshot());
+	ctx.safe_xmin = snapshot->xmin;
+
 	ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
 	ctx.buffer = InvalidBuffer;
 	ctx.page = NULL;
@@ -802,6 +804,8 @@ verify_heapam(PG_FUNCTION_ARGS)
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
 
+	UnregisterSnapshot(snapshot);
+
 	/* Close the associated toast table and indexes, if any. */
 	if (ctx.toast_indexes)
 		toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 0a1e089ec1d..7d6d9636e7e 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -222,7 +222,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	int			i;
 	FixedParallelState *fps;
 	dsm_handle	session_dsm_handle = DSM_HANDLE_INVALID;
-	Snapshot	transaction_snapshot = GetTransactionSnapshot();
+	Snapshot	transaction_snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	Snapshot	active_snapshot = GetActiveSnapshot();
 
 	/* We might be running in a very short-lived memory context. */
@@ -494,6 +494,8 @@ InitializeParallelDSM(ParallelContext *pcxt)
 
 	/* Restore previous memory context. */
 	MemoryContextSwitchTo(oldcontext);
+
+	UnregisterSnapshot(transaction_snapshot);
 }
 
 /*
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 68deea50f66..5a1efe9c3a7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -285,7 +285,7 @@ retry:
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+		res = table_tuple_lock(rel, &(outslot->tts_tid), GetActiveSnapshot(),
 							   outslot,
 							   GetCurrentCommandId(false),
 							   lockmode,
@@ -443,7 +443,7 @@ retry:
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+		res = table_tuple_lock(rel, &(outslot->tts_tid), GetActiveSnapshot(),
 							   outslot,
 							   GetCurrentCommandId(false),
 							   lockmode,
@@ -500,7 +500,7 @@ retry:
 
 	PushActiveSnapshot(GetLatestSnapshot());
 
-	res = table_tuple_lock(rel, &conflictTid, GetLatestSnapshot(),
+	res = table_tuple_lock(rel, &conflictTid, GetActiveSnapshot(),
 						   *conflictslot,
 						   GetCurrentCommandId(false),
 						   LockTupleShare,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index c1d8fd08c6c..0cc83de5870 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1752,7 +1752,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	else
 	{
 		CommandCounterIncrement();
-		snapshot = GetTransactionSnapshot();
+		/* let PortalStart call GetTransactionSnapshot() */
+		snapshot = InvalidSnapshot;
 	}
 
 	/*
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 2030322f957..a365a4359d2 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3950,6 +3950,8 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 	LWLockRelease(SerializableXactHashLock);
 }
 
+extern List *dangling_snapshots;
+
 /*
  * Tests whether the given top level transaction is concurrent with
  * (overlaps) our current transaction.
@@ -3967,6 +3969,7 @@ XidIsConcurrent(TransactionId xid)
 	Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
 
 	snap = GetTransactionSnapshot();
+	dangling_snapshots = list_delete_ptr(dangling_snapshots, snap);
 
 	if (TransactionIdPrecedes(xid, snap->xmin))
 		return false;
@@ -4214,7 +4217,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		}
 		else if (!SxactIsDoomed(sxact)
 				 && (!SxactIsCommitted(sxact)
-					 || TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
+					 || TransactionIdPrecedes(TransactionXmin,
 											  sxact->finishedBefore))
 				 && !RWConflictExists(sxact, MySerializableXact))
 		{
@@ -4227,7 +4230,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 			 */
 			if (!SxactIsDoomed(sxact)
 				&& (!SxactIsCommitted(sxact)
-					|| TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
+					|| TransactionIdPrecedes(TransactionXmin,
 											 sxact->finishedBefore))
 				&& !RWConflictExists(sxact, MySerializableXact))
 			{
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 89d704df8d1..4ce8417f63b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1242,18 +1242,20 @@ PortalRunMulti(Portal portal,
 				{
 					snapshot = RegisterSnapshot(snapshot);
 					portal->holdSnapshot = snapshot;
-				}
 
-				/*
-				 * We can't have the holdSnapshot also be the active one,
-				 * because UpdateActiveSnapshotCommandId would complain.  So
-				 * force an extra snapshot copy.  Plain PushActiveSnapshot
-				 * would have copied the transaction snapshot anyway, so this
-				 * only adds a copy step when setHoldSnapshot is true.  (It's
-				 * okay for the command ID of the active snapshot to diverge
-				 * from what holdSnapshot has.)
-				 */
-				PushCopiedSnapshot(snapshot);
+					/* XXX
+					 * We can't have the holdSnapshot also be the active one,
+					 * because UpdateActiveSnapshotCommandId would complain.  So
+					 * force an extra snapshot copy.  Plain PushActiveSnapshot
+					 * would have copied the transaction snapshot anyway, so this
+					 * only adds a copy step when setHoldSnapshot is true.  (It's
+					 * okay for the command ID of the active snapshot to diverge
+					 * from what holdSnapshot has.)
+					 */
+					PushCopiedSnapshot(snapshot);
+				}
+				else
+					PushActiveSnapshot(snapshot);
 
 				/*
 				 * As for PORTAL_ONE_SELECT portals, it does not seem
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3185f48afa6..bb6fa20ee03 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2465,8 +2465,8 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
 		CommandCounterIncrement();	/* be sure all my own work is visible */
-		test_snapshot = GetLatestSnapshot();
-		crosscheck_snapshot = GetTransactionSnapshot();
+		test_snapshot = RegisterSnapshot(GetLatestSnapshot());
+		crosscheck_snapshot = RegisterSnapshot(GetTransactionSnapshot());
 	}
 	else
 	{
@@ -2495,6 +2495,11 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 									  test_snapshot, crosscheck_snapshot,
 									  false, false, limit);
 
+	if (test_snapshot != NULL)
+		UnregisterSnapshot(test_snapshot);
+	if (crosscheck_snapshot != NULL)
+		UnregisterSnapshot(crosscheck_snapshot);
+
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 05f16666192..f4831ed989c 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -254,6 +254,20 @@ typedef struct SerializedSnapshotData
 	CommandId	curcid;
 } SerializedSnapshotData;
 
+List *dangling_snapshots = NIL;
+
+static inline Snapshot
+add_dangling(Snapshot snapshot)
+{
+	MemoryContext save_cxt = CurrentMemoryContext;
+
+	MemoryContextSwitchTo(TopMemoryContext);
+	dangling_snapshots = lappend(dangling_snapshots, snapshot);
+	MemoryContextSwitchTo(save_cxt);
+	return snapshot;
+}
+
+
 /*
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
@@ -275,6 +289,7 @@ GetTransactionSnapshot(void)
 	if (HistoricSnapshotActive())
 	{
 		Assert(!FirstSnapshotSet);
+		add_dangling(HistoricSnapshot);
 		return HistoricSnapshot;
 	}
 
@@ -319,17 +334,22 @@ GetTransactionSnapshot(void)
 			CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
 		FirstSnapshotSet = true;
+		add_dangling(CurrentSnapshot);
 		return CurrentSnapshot;
 	}
 
 	if (IsolationUsesXactSnapshot())
+	{
+		add_dangling(CurrentSnapshot);
 		return CurrentSnapshot;
+	}
 
 	/* Don't allow catalog snapshot to be older than xact snapshot. */
 	InvalidateCatalogSnapshot();
 
 	CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
+	add_dangling(CurrentSnapshot);
 	return CurrentSnapshot;
 }
 
@@ -357,10 +377,12 @@ GetLatestSnapshot(void)
 
 	/* If first call in transaction, go ahead and set the xact snapshot */
 	if (!FirstSnapshotSet)
+	{
 		return GetTransactionSnapshot();
+	}
 
 	SecondarySnapshot = GetSnapshotData(&SecondarySnapshotData);
-
+	add_dangling(SecondarySnapshot);
 	return SecondarySnapshot;
 }
 
@@ -380,7 +402,10 @@ GetCatalogSnapshot(Oid relid)
 	 * finishing decoding.
 	 */
 	if (HistoricSnapshotActive())
+	{
+		add_dangling(HistoricSnapshot);
 		return HistoricSnapshot;
+	}
 
 	return GetNonHistoricCatalogSnapshot(relid);
 }
@@ -426,6 +451,7 @@ GetNonHistoricCatalogSnapshot(Oid relid)
 		pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 	}
 
+	add_dangling(CatalogSnapshot);
 	return CatalogSnapshot;
 }
 
@@ -684,6 +710,8 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 {
 	ActiveSnapshotElt *newactive;
 
+	dangling_snapshots = list_delete_ptr(dangling_snapshots, snapshot);
+
 	Assert(snapshot != InvalidSnapshot);
 	Assert(ActiveSnapshot == NULL || snap_level >= ActiveSnapshot->as_level);
 
@@ -695,7 +723,9 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 	 */
 	if (snapshot == CurrentSnapshot || snapshot == SecondarySnapshot ||
 		!snapshot->copied)
+	{
 		newactive->as_snap = CopySnapshot(snapshot);
+	}
 	else
 		newactive->as_snap = snapshot;
 
@@ -828,6 +858,8 @@ RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
 	if (snapshot == InvalidSnapshot)
 		return InvalidSnapshot;
 
+	dangling_snapshots = list_delete_ptr(dangling_snapshots, snapshot);
+
 	/* Static snapshot?  Create a persistent copy */
 	snap = snapshot->copied ? snapshot : CopySnapshot(snapshot);
 
@@ -1019,6 +1051,11 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	}
 	FirstXactSnapshot = NULL;
 
+	foreach_ptr (Snapshot, snapshot, dangling_snapshots)
+	{
+		elog(PANIC, "had a dangling snapshot %p", snapshot);
+	}
+
 	/*
 	 * If we exported any snapshots, clean them up.
 	 */
-- 
2.39.5

Nathan Bossart

nathandbossart@gmail.com

about 1 year ago

In reply to: Heikki Linnakangas (#1)

Re: A few patches to clarify snapshot management

On Mon, Dec 16, 2024 at 12:06:33PM +0200, Heikki Linnakangas wrote:

While working on the CSN snapshot patch, I got sidetracked looking closer
into the snapshot tracking in snapmgr.c. Attached are a few patches to
clarify some things.

I haven't yet looked closely at what you are proposing, but big +1 from me
for the general idea. I recently found myself wishing for a lot more
commentary about this stuff [0]/messages/by-id/Z0dB1ld2iPcS6nC9@nathan.

[0]: /messages/by-id/Z0dB1ld2iPcS6nC9@nathan

--
nathan

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Nathan Bossart (#2)

1 attachment(s)

Re: A few patches to clarify snapshot management

On 16/12/2024 23:56, Nathan Bossart wrote:

On Mon, Dec 16, 2024 at 12:06:33PM +0200, Heikki Linnakangas wrote:

While working on the CSN snapshot patch, I got sidetracked looking closer
into the snapshot tracking in snapmgr.c. Attached are a few patches to
clarify some things.

I haven't yet looked closely at what you are proposing, but big +1 from me
for the general idea. I recently found myself wishing for a lot more
commentary about this stuff [0].

[0] /messages/by-id/Z0dB1ld2iPcS6nC9@nathan

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been
dead code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make
sense to call GetTransactionSnapshot() during logical decoding. We jump
through hoops to make the historic catalog decoding possible with
historic snapshots, tracking subtransactions that modify catalogs and
WAL-logging command ids, but they're not suitable for general purpose
queries. So I think we should turn that into an error, per attached patch.

Another observation is that we only ever use regular MVCC snapshots as
active snapshots. I added a "Assert(snapshot->snapshot_type ==
SNAPSHOT_MVCC);" to PushActiveSnapshotWithLevel() and all regression
tests passed. That's also good, because we assumed that much in a few
places anyway: there are a couple of calls that amount to
"XidInMVCCSnapshot(..., GetActiveSnapshot()"), in
find_inheritance_children_extended() and RelationGetPartitionDesc(). We
could add comments and that assertion to make that assumption explicit.

And that thought takes me deeper down the rabbit hole:

/*
* Struct representing all kind of possible snapshots.
*
* There are several different kinds of snapshots:
* * Normal MVCC snapshots
* * MVCC snapshots taken during recovery (in Hot-Standby mode)
* * Historic MVCC snapshots used during logical decoding
* * snapshots passed to HeapTupleSatisfiesDirty()
* * snapshots passed to HeapTupleSatisfiesNonVacuumable()
* * snapshots used for SatisfiesAny, Toast, Self where no members are
* accessed.
*
* TODO: It's probably a good idea to split this struct using a NodeTag
* similar to how parser and executor nodes are handled, with one type for
* each different kind of snapshot to avoid overloading the meaning of
* individual fields.
*/
typedef struct SnapshotData

I'm thinking of implementing that TODO, splitting SnapshotData into
separate structs like MVCCSnapshotData, SnapshotDirtyData, etc. It seems
to me most places can assume that you're dealing with MVCC snapshots,
and if we had separate types for them, could be using MVCCSnapshot
instead of the generic Snapshot. Only the table and index AM functions
need to deal with non-MVCC snapshots.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchtext/x-patch; charset=UTF-8; name=0001-Don-t-allow-GetTransactionSnapshot-in-logical-decodi.patchDownload

From ec248c69cb42a0747ecc6a63ac4e4682cce2ee93 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 18:37:44 +0200
Subject: [PATCH 1/1] Don't allow GetTransactionSnapshot() in logical decoding

A historic snapshot should only be used for catalog access, not
general queries. We never call GetTransactionSnapshot() during logical
decoding, which is good because it wouldn't be very sensible, so the
code to deal with that was unreachable and untested. Turn it into an
error, to avoid doing that in the future either.
---
 src/backend/utils/time/snapmgr.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e60360338d5..3717869f736 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,16 +212,12 @@ Snapshot
 GetTransactionSnapshot(void)
 {
 	/*
-	 * Return historic snapshot if doing logical decoding. We'll never need a
-	 * non-historic transaction snapshot in this (sub-)transaction, so there's
-	 * no need to be careful to set one up for later calls to
-	 * GetTransactionSnapshot().
+	 * This should not be called while doing logical decoding.  Historic
+	 * snapshots are only usable for catalog access, not for general-purpose
+	 * queries.
 	 */
 	if (HistoricSnapshotActive())
-	{
-		Assert(!FirstSnapshotSet);
-		return HistoricSnapshot;
-	}
+		elog(ERROR, "cannot take query snapshot during logical decoding");
 
 	/* First call in transaction? */
 	if (!FirstSnapshotSet)
-- 
2.39.5

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Heikki Linnakangas (#3)

3 attachment(s)

Re: A few patches to clarify snapshot management

On 20/12/2024 19:31, Heikki Linnakangas wrote:

/*
* Struct representing all kind of possible snapshots.
*
* There are several different kinds of snapshots:
* * Normal MVCC snapshots
* * MVCC snapshots taken during recovery (in Hot-Standby mode)
* * Historic MVCC snapshots used during logical decoding
* * snapshots passed to HeapTupleSatisfiesDirty()
* * snapshots passed to HeapTupleSatisfiesNonVacuumable()
* * snapshots used for SatisfiesAny, Toast, Self where no members are
* accessed.
*
* TODO: It's probably a good idea to split this struct using a NodeTag
* similar to how parser and executor nodes are handled, with one type
for
* each different kind of snapshot to avoid overloading the meaning of
* individual fields.
*/
typedef struct SnapshotData

I'm thinking of implementing that TODO, splitting SnapshotData into
separate structs like MVCCSnapshotData, SnapshotDirtyData, etc. It seems
to me most places can assume that you're dealing with MVCC snapshots,
and if we had separate types for them, could be using MVCCSnapshot
instead of the generic Snapshot. Only the table and index AM functions
need to deal with non-MVCC snapshots.

Here's a draft of that. Going through this exercise clarified a few
things to me that I didn't realize before:

- The executor only deals with MVCC snapshots. Special snapshots are
only for the lower-level AM interfaces.
- Only MVCC snapshots can be pushed to the active stack
- Only MVCC or historic MVCC snapshots can be registered with a resource
owner

Thoughts?

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v1-0001-Add-comment-with-more-details-on-active-snapshots.patchtext/x-patch; charset=UTF-8; name=v1-0001-Add-comment-with-more-details-on-active-snapshots.patchDownload

From 481c6c9828b4975d7523e6a57073b6d7fead1bb2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 11 Dec 2024 14:22:54 +0200
Subject: [PATCH v1 1/3] Add comment with more details on active snapshots

---
 src/backend/utils/time/snapmgr.c | 55 ++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 8f1508b1ee2..3900e1452ca 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -3,11 +3,66 @@
  * snapmgr.c
  *		PostgreSQL snapshot manager
  *
+ * The following functions return a snapshot that can be used in visibility
+ * checks:
+ *
+ * - GetTransactionSnapshot
+ * - GetLatestSnapshot
+ * - GetCatalogSnapshot
+ * - GetNonHistoricCatalogSnapshot
+ *
+ * All of these functions return a reference to a statically allocated
+ * snapshot, which must be copied and registered by calling
+ * PushActiveSnapshot() or RegisterSnapshot() before use.
+ *
+ * In addition to the above, there are some special snapshots, like
+ * SnapshotSelf, SnapshotAny, and "dirty" snapshots.
+ *
  * We keep track of snapshots in two ways: those "registered" by resowner.c,
  * and the "active snapshot" stack.  All snapshots in either of them live in
  * persistent memory.  When a snapshot is no longer in any of these lists
  * (tracked by separate refcounts on each snapshot), its memory can be freed.
  *
+ * ActiveSnapshot stack
+ * --------------------
+ *
+ * Most visibility checks use the current "active snapshot".  When running
+ * normal queries, the active snapshot is set when query execution begins,
+ * depending on transaction isolation level.
+ *
+ * The active snapshot is tracked in a stack, so that the currently active one
+ * is at the top of the stack. It mirrors the process call stack: whenever we
+ * recurse or switch context to fetch rows from a different portal for
+ * example, the appropriate snapshot is pushed to become the active snapshot,
+ * and popped on return.  Once upon a time, ActiveSnapshot was just a global
+ * variable that was saved and restored similar to CurrentMemoryContext, but
+ * nowadays it's managed as a separate data structure so that we can keep
+ * track of which snapshots are in use and reset MyProc->xmin when there is no
+ * active snapshot.
+ *
+ * However, there are a couple of exceptions where the active snapshot stack
+ * does not strictly mirror the call stack:
+ *
+ * - VACUUM and a few other utility commands manage their own transactions,
+ *   which take their own snapshots.  They are called with an active snapshot
+ *   set, like most utility commands, but they pop the active snapshot that
+ *   was pushed by the caller. PortalRunUtility knows about the possibility
+ *   that the snapshot it pushed is no longer active on return.
+ *
+ * - When COMMIT or ROLLBACK is executed within a procedure or DO-block, the
+ *   active snapshot stack is destroyed, and re-established later when
+ *   subsequent statements in the procedure are executed.  There are many
+ *   limitations on when in-procedure COMMIT/ROLLBACK is allowed; one such
+ *   limitation is that all the snapshots on the active snapshot stack are
+ *   known to portals that are being executed, which makes it safe to reset
+ *   the stack.  See EnsurePortalSnapshotExists().
+ *
+ * Registered snapshots
+ * --------------------
+ *
+ * In addition to snapshots pushed to the active snapshot stack, a snapshot
+ * can be registered with a resource owner.
+ *
  * The FirstXactSnapshot, if any, is treated a bit specially: we increment its
  * regd_count and list it in RegisteredSnapshots, but this reference is not
  * tracked by a resource owner. We used to use the TopTransactionResourceOwner
-- 
2.39.5

v1-0002-Add-assertions.patchtext/x-patch; charset=UTF-8; name=v1-0002-Add-assertions.patchDownload

From b2770262caaf5c4beadf16ed168746b475f00d68 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 00:36:33 +0200
Subject: [PATCH v1 2/3] Add assertions

---
 src/backend/utils/time/snapmgr.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 3900e1452ca..76992eb094f 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -595,6 +595,7 @@ CopySnapshot(Snapshot snapshot)
 	Size		size;
 
 	Assert(snapshot != InvalidSnapshot);
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC || snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = subxipoff = sizeof(SnapshotData) +
@@ -680,6 +681,8 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 {
 	ActiveSnapshotElt *newactive;
 
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
+
 	Assert(snapshot != InvalidSnapshot);
 	Assert(ActiveSnapshot == NULL || snap_level >= ActiveSnapshot->as_level);
 
@@ -824,6 +827,8 @@ RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
 	if (snapshot == InvalidSnapshot)
 		return InvalidSnapshot;
 
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC || snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
+
 	/* Static snapshot?  Create a persistent copy */
 	snap = snapshot->copied ? snapshot : CopySnapshot(snapshot);
 
-- 
2.39.5

v1-0003-wip-Split-SnapshotData-into-multiple-structs.patchtext/x-patch; charset=UTF-8; name=v1-0003-wip-Split-SnapshotData-into-multiple-structs.patchDownload

From 5b48cf9bec8a9f1b0bacde30acf9b28e7e9d34e5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 6 Jan 2025 23:29:41 +0200
Subject: [PATCH v1 3/3] wip: Split SnapshotData into multiple structs

---
 contrib/amcheck/verify_nbtree.c               |   6 +-
 contrib/pgrowlocks/pgrowlocks.c               |   2 +-
 src/backend/access/brin/brin.c                |   2 +-
 src/backend/access/heap/heapam.c              |   3 +-
 src/backend/access/heap/heapam_handler.c      |  16 +-
 src/backend/access/heap/heapam_visibility.c   |  20 +-
 src/backend/access/index/genam.c              |   4 +-
 src/backend/access/index/indexam.c            |  13 +-
 src/backend/access/nbtree/nbtinsert.c         |   6 +-
 src/backend/access/nbtree/nbtsort.c           |   2 +-
 src/backend/access/table/tableam.c            |  11 +-
 src/backend/access/transam/parallel.c         |   8 +-
 src/backend/catalog/index.c                   |   2 +-
 src/backend/catalog/pg_inherits.c             |   2 +-
 src/backend/catalog/pg_largeobject.c          |   2 +-
 src/backend/commands/async.c                  |   8 +-
 src/backend/commands/copyto.c                 |   4 +-
 src/backend/commands/createas.c               |   2 +-
 src/backend/commands/dbcommands.c             |   4 +-
 src/backend/commands/explain.c                |   2 +-
 src/backend/commands/indexcmds.c              |  16 +-
 src/backend/commands/matview.c                |   2 +-
 src/backend/commands/tablecmds.c              |   6 +-
 src/backend/commands/trigger.c                |   2 +-
 src/backend/commands/typecmds.c               |   4 +-
 src/backend/executor/execIndexing.c           |  12 +-
 src/backend/executor/execMain.c               |   8 +-
 src/backend/executor/execParallel.c           |   2 +-
 src/backend/executor/execReplication.c        |  18 +-
 src/backend/executor/execUtils.c              |   4 +-
 src/backend/executor/functions.c              |   2 +-
 src/backend/executor/nodeBitmapHeapscan.c     |   2 +-
 src/backend/executor/nodeBitmapIndexscan.c    |   2 +-
 src/backend/executor/nodeIndexonlyscan.c      |   4 +-
 src/backend/executor/nodeIndexscan.c          |   4 +-
 src/backend/executor/nodeLockRows.c           |   2 +-
 src/backend/executor/nodeModifyTable.c        |  18 +-
 src/backend/executor/nodeSamplescan.c         |   2 +-
 src/backend/executor/nodeSeqscan.c            |   6 +-
 src/backend/executor/nodeTidrangescan.c       |   2 +-
 src/backend/executor/nodeTidscan.c            |   4 +-
 src/backend/executor/spi.c                    |  64 +++---
 src/backend/libpq/be-fsstubs.c                |   2 +-
 src/backend/partitioning/partbounds.c         |   2 +-
 src/backend/partitioning/partdesc.c           |   2 +-
 src/backend/replication/logical/decode.c      |   2 +-
 .../replication/logical/reorderbuffer.c       |  76 +++----
 src/backend/replication/logical/snapbuild.c   |  99 ++++-----
 src/backend/replication/walsender.c           |   2 +-
 src/backend/storage/ipc/procarray.c           |   6 +-
 src/backend/storage/large_object/inv_api.c    |  16 +-
 src/backend/storage/lmgr/predicate.c          |  30 +--
 src/backend/tcop/postgres.c                   |   4 +-
 src/backend/tcop/pquery.c                     |  26 +--
 src/backend/utils/adt/acl.c                   |   2 +-
 src/backend/utils/adt/ri_triggers.c           |  12 +-
 src/backend/utils/adt/ruleutils.c             |   2 +-
 src/backend/utils/adt/tid.c                   |   2 +-
 src/backend/utils/adt/xid8funcs.c             |   2 +-
 src/backend/utils/init/postinit.c             |   2 +-
 src/backend/utils/mmgr/portalmem.c            |   4 +-
 src/backend/utils/time/snapmgr.c              | 197 ++++++++++--------
 src/include/access/genam.h                    |   4 +-
 src/include/access/heapam.h                   |   2 +-
 src/include/access/relscan.h                  |   6 +-
 src/include/executor/execdesc.h               |   8 +-
 src/include/executor/spi.h                    |   4 +-
 src/include/nodes/execnodes.h                 |   4 +-
 src/include/replication/reorderbuffer.h       |  12 +-
 src/include/replication/snapbuild.h           |   6 +-
 src/include/replication/snapbuild_internal.h  |   2 +-
 src/include/storage/large_object.h            |   2 +-
 src/include/storage/predicate.h               |   4 +-
 src/include/storage/procarray.h               |   2 +-
 src/include/tcop/pquery.h                     |   2 +-
 src/include/utils/portal.h                    |   4 +-
 src/include/utils/snapmgr.h                   |  33 +--
 src/include/utils/snapshot.h                  | 161 +++++++++-----
 src/tools/pgindent/typedefs.list              |   4 +
 79 files changed, 578 insertions(+), 477 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 7f7b55d902a..2bf9c3aa903 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -565,7 +565,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 		 */
 		if (!state->readonly)
 		{
-			snapshot = RegisterSnapshot(GetTransactionSnapshot());
+			snapshot = (Snapshot) RegisterSnapshot(GetTransactionSnapshot());
 
 			/*
 			 * GetTransactionSnapshot() always acquires a new MVCC snapshot in
@@ -582,7 +582,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 			 */
 			if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin &&
 				!TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data),
-									   snapshot->xmin))
+									   snapshot->mvcc.xmin))
 				ereport(ERROR,
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 						 errmsg("index \"%s\" cannot be verified using transaction snapshot",
@@ -603,7 +603,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 			if (snapshot != SnapshotAny)
 				state->snapshot = snapshot;
 			else
-				state->snapshot = RegisterSnapshot(GetTransactionSnapshot());
+				state->snapshot = (Snapshot) RegisterSnapshot(GetTransactionSnapshot());
 		}
 	}
 
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 7e40ab21dda..2c5d2964a55 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -111,7 +111,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, (Snapshot) GetActiveSnapshot(), 0, NULL);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 9a984547578..47dd092fc41 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2393,7 +2393,7 @@ _brin_begin_parallel(BrinBuildState *buildstate, Relation heap, Relation index,
 	if (!isconcurrent)
 		snapshot = SnapshotAny;
 	else
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+		snapshot = (Snapshot) RegisterSnapshot(GetTransactionSnapshot());
 
 	/*
 	 * Estimate size for our own PARALLEL_KEY_BRIN_SHARED workspace.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 485525f4d64..ab3f07b7ffc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -536,7 +536,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	 * full page write. Until we can prove that beyond doubt, let's check each
 	 * tuple for visibility the hard way.
 	 */
-	all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
+	all_visible = PageIsAllVisible(page) &&
+		(snapshot->snapshot_type != SNAPSHOT_MVCC || !snapshot->mvcc.takenDuringRecovery);
 	check_serializable =
 		CheckForSerializableConflictOutNeeded(scan->rs_base.rs_rd, snapshot);
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e817f8f8f84..6d537fa6ab8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -431,11 +431,11 @@ tuple_lock_retry:
 					}
 
 					/* otherwise xmin should not be dirty... */
-					if (TransactionIdIsValid(SnapshotDirty.xmin))
+					if (TransactionIdIsValid(SnapshotDirty.dirty.xmin))
 						ereport(ERROR,
 								(errcode(ERRCODE_DATA_CORRUPTED),
 								 errmsg_internal("t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table \"%s\"",
-												 SnapshotDirty.xmin,
+												 SnapshotDirty.dirty.xmin,
 												 ItemPointerGetBlockNumber(&tuple->t_self),
 												 ItemPointerGetOffsetNumber(&tuple->t_self),
 												 RelationGetRelationName(relation))));
@@ -444,23 +444,23 @@ tuple_lock_retry:
 					 * If tuple is being updated by other transaction then we
 					 * have to wait for its commit/abort, or die trying.
 					 */
-					if (TransactionIdIsValid(SnapshotDirty.xmax))
+					if (TransactionIdIsValid(SnapshotDirty.dirty.xmax))
 					{
 						ReleaseBuffer(buffer);
 						switch (wait_policy)
 						{
 							case LockWaitBlock:
-								XactLockTableWait(SnapshotDirty.xmax,
+								XactLockTableWait(SnapshotDirty.dirty.xmax,
 												  relation, &tuple->t_self,
 												  XLTW_FetchUpdated);
 								break;
 							case LockWaitSkip:
-								if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
+								if (!ConditionalXactLockTableWait(SnapshotDirty.dirty.xmax))
 									/* skip instead of waiting */
 									return TM_WouldBlock;
 								break;
 							case LockWaitError:
-								if (!ConditionalXactLockTableWait(SnapshotDirty.xmax))
+								if (!ConditionalXactLockTableWait(SnapshotDirty.dirty.xmax))
 									ereport(ERROR,
 											(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
 											 errmsg("could not obtain lock on row in relation \"%s\"",
@@ -1250,7 +1250,7 @@ heapam_index_build_range_scan(Relation heapRelation,
 		 */
 		if (!TransactionIdIsValid(OldestXmin))
 		{
-			snapshot = RegisterSnapshot(GetTransactionSnapshot());
+			snapshot = (Snapshot) RegisterSnapshot(GetTransactionSnapshot());
 			need_unregister_snapshot = true;
 		}
 		else
@@ -2440,7 +2440,7 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
 
 	page = (Page) BufferGetPage(hscan->rs_cbuf);
 	all_visible = PageIsAllVisible(page) &&
-		!scan->rs_snapshot->takenDuringRecovery;
+		(scan->rs_snapshot->snapshot_type != SNAPSHOT_MVCC || !scan->rs_snapshot->mvcc.takenDuringRecovery);
 	maxoffset = PageGetMaxOffsetNumber(page);
 
 	for (;;)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index e146605bd57..9d7d1abd929 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -740,7 +740,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
  * token is also returned in snapshot->speculativeToken.
  */
 static bool
-HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesDirty(HeapTuple htup, DirtySnapshotData *snapshot,
 						Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -957,7 +957,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
  * and more contention on ProcArrayLock.
  */
 static bool
-HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesMVCC(HeapTuple htup, MVCCSnapshot snapshot,
 					   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1426,7 +1426,7 @@ HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *de
  *	snapshot->vistest must have been set up with the horizon to use.
  */
 static bool
-HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesNonVacuumable(HeapTuple htup, NonVacuumableSnapshotData *snapshot,
 								Buffer buffer)
 {
 	TransactionId dead_after = InvalidTransactionId;
@@ -1584,7 +1584,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
  * complicated than when dealing "only" with the present.
  */
 static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, HistoricMVCCSnapshot snapshot,
 							   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1660,7 +1660,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 	/* check if it's a committed transaction in [xmin, xmax) */
-	else if (TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
+	else if (TransactionIdInArray(xmin, snapshot->committed_xids, snapshot->xcnt))
 	{
 		/* fall through */
 	}
@@ -1746,7 +1746,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
 		return true;
 	/* xmax is between [xmin, xmax), check known committed array */
-	else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
+	else if (TransactionIdInArray(xmax, snapshot->committed_xids, snapshot->xcnt))
 		return false;
 	/* xmax is between [xmin, xmax), but known not to have committed yet */
 	else
@@ -1769,7 +1769,7 @@ HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	switch (snapshot->snapshot_type)
 	{
 		case SNAPSHOT_MVCC:
-			return HeapTupleSatisfiesMVCC(htup, snapshot, buffer);
+			return HeapTupleSatisfiesMVCC(htup, &snapshot->mvcc, buffer);
 		case SNAPSHOT_SELF:
 			return HeapTupleSatisfiesSelf(htup, snapshot, buffer);
 		case SNAPSHOT_ANY:
@@ -1777,11 +1777,11 @@ HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 		case SNAPSHOT_TOAST:
 			return HeapTupleSatisfiesToast(htup, snapshot, buffer);
 		case SNAPSHOT_DIRTY:
-			return HeapTupleSatisfiesDirty(htup, snapshot, buffer);
+			return HeapTupleSatisfiesDirty(htup, &snapshot->dirty, buffer);
 		case SNAPSHOT_HISTORIC_MVCC:
-			return HeapTupleSatisfiesHistoricMVCC(htup, snapshot, buffer);
+			return HeapTupleSatisfiesHistoricMVCC(htup, &snapshot->historic_mvcc, buffer);
 		case SNAPSHOT_NON_VACUUMABLE:
-			return HeapTupleSatisfiesNonVacuumable(htup, snapshot, buffer);
+			return HeapTupleSatisfiesNonVacuumable(htup, &snapshot->nonvacuumable, buffer);
 	}
 
 	return false;				/* keep compiler quiet */
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 07bae342e25..9d1312abbd0 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -410,7 +410,7 @@ systable_beginscan(Relation heapRelation,
 	{
 		Oid			relid = RelationGetRelid(heapRelation);
 
-		snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+		snapshot = RegisterCatalogSnapshot(GetCatalogSnapshot(relid));
 		sysscan->snapshot = snapshot;
 	}
 	else
@@ -680,7 +680,7 @@ systable_beginscan_ordered(Relation heapRelation,
 	{
 		Oid			relid = RelationGetRelid(heapRelation);
 
-		snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+		snapshot = RegisterCatalogSnapshot(GetCatalogSnapshot(relid));
 		sysscan->snapshot = snapshot;
 	}
 	else
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 8b1f555435b..0bcf06ea68b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -451,12 +451,10 @@ index_restrpos(IndexScanDesc scan)
  */
 Size
 index_parallelscan_estimate(Relation indexRelation, int nkeys, int norderbys,
-							Snapshot snapshot)
+							MVCCSnapshot snapshot)
 {
 	Size		nbytes;
 
-	Assert(snapshot != InvalidSnapshot);
-
 	RELATION_CHECKS;
 
 	nbytes = offsetof(ParallelIndexScanDescData, ps_snapshot_data);
@@ -488,12 +486,10 @@ index_parallelscan_estimate(Relation indexRelation, int nkeys, int norderbys,
  */
 void
 index_parallelscan_initialize(Relation heapRelation, Relation indexRelation,
-							  Snapshot snapshot, ParallelIndexScanDesc target)
+							  MVCCSnapshot snapshot, ParallelIndexScanDesc target)
 {
 	Size		offset;
 
-	Assert(snapshot != InvalidSnapshot);
-
 	RELATION_CHECKS;
 
 	offset = add_size(offsetof(ParallelIndexScanDescData, ps_snapshot_data),
@@ -541,14 +537,15 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel, int nkeys,
 						 int norderbys, ParallelIndexScanDesc pscan)
 {
+	MVCCSnapshot mvccsnapshot;
 	Snapshot	snapshot;
 	IndexScanDesc scan;
 
 	Assert(RelFileLocatorEquals(heaprel->rd_locator, pscan->ps_locator));
 	Assert(RelFileLocatorEquals(indexrel->rd_locator, pscan->ps_indexlocator));
 
-	snapshot = RestoreSnapshot(pscan->ps_snapshot_data);
-	RegisterSnapshot(snapshot);
+	mvccsnapshot = RestoreSnapshot(pscan->ps_snapshot_data);
+	snapshot = (Snapshot) RegisterSnapshot(mvccsnapshot);
 	scan = index_beginscan_internal(indexrel, nkeys, norderbys, snapshot,
 									pscan, true);
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 3eddbcf3a82..d8b26072012 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -583,15 +583,15 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 					 * If this tuple is being updated by other transaction
 					 * then we have to wait for its commit/abort.
 					 */
-					xwait = (TransactionIdIsValid(SnapshotDirty.xmin)) ?
-						SnapshotDirty.xmin : SnapshotDirty.xmax;
+					xwait = (TransactionIdIsValid(SnapshotDirty.dirty.xmin)) ?
+						SnapshotDirty.dirty.xmin : SnapshotDirty.dirty.xmax;
 
 					if (TransactionIdIsValid(xwait))
 					{
 						if (nbuf != InvalidBuffer)
 							_bt_relbuf(rel, nbuf);
 						/* Tell _bt_doinsert to wait... */
-						*speculativeToken = SnapshotDirty.speculativeToken;
+						*speculativeToken = SnapshotDirty.dirty.speculativeToken;
 						/* Caller releases lock on buf immediately */
 						insertstate->bounds_valid = false;
 						return xwait;
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 7aba852db90..956d073988c 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1437,7 +1437,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	if (!isconcurrent)
 		snapshot = SnapshotAny;
 	else
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+		snapshot = (Snapshot) RegisterSnapshot(GetTransactionSnapshot());
 
 	/*
 	 * Estimate size for our own PARALLEL_KEY_BTREE_SHARED workspace, and
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index e18a8f8250f..a4175ec7c88 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -114,7 +114,7 @@ table_beginscan_catalog(Relation relation, int nkeys, struct ScanKeyData *key)
 	uint32		flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | SO_TEMP_SNAPSHOT;
 	Oid			relid = RelationGetRelid(relation);
-	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
+	Snapshot	snapshot = RegisterCatalogSnapshot(GetCatalogSnapshot(relid));
 
 	return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, key,
 											NULL, flags);
@@ -132,7 +132,7 @@ table_parallelscan_estimate(Relation rel, Snapshot snapshot)
 	Size		sz = 0;
 
 	if (IsMVCCSnapshot(snapshot))
-		sz = add_size(sz, EstimateSnapshotSpace(snapshot));
+		sz = add_size(sz, EstimateSnapshotSpace(&snapshot->mvcc));
 	else
 		Assert(snapshot == SnapshotAny);
 
@@ -151,7 +151,7 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 
 	if (IsMVCCSnapshot(snapshot))
 	{
-		SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off);
+		SerializeSnapshot(&snapshot->mvcc, (char *) pscan + pscan->phs_snapshot_off);
 		pscan->phs_snapshot_any = false;
 	}
 	else
@@ -164,6 +164,7 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 TableScanDesc
 table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 {
+	MVCCSnapshot mvccsnapshot;
 	Snapshot	snapshot;
 	uint32		flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
@@ -173,8 +174,8 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	if (!pscan->phs_snapshot_any)
 	{
 		/* Snapshot was serialized -- restore it */
-		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
-		RegisterSnapshot(snapshot);
+		mvccsnapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
+		snapshot = (Snapshot) RegisterSnapshot(mvccsnapshot);
 		flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 7817bedc2ef..fa0cb9d7543 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -222,8 +222,8 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	int			i;
 	FixedParallelState *fps;
 	dsm_handle	session_dsm_handle = DSM_HANDLE_INVALID;
-	Snapshot	transaction_snapshot = GetTransactionSnapshot();
-	Snapshot	active_snapshot = GetActiveSnapshot();
+	MVCCSnapshot transaction_snapshot = GetTransactionSnapshot();
+	MVCCSnapshot active_snapshot = GetActiveSnapshot();
 
 	/* We might be running in a very short-lived memory context. */
 	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
@@ -1309,8 +1309,8 @@ ParallelWorkerMain(Datum main_arg)
 	char	   *uncommittedenumsspace;
 	char	   *clientconninfospace;
 	char	   *session_dsm_handle_space;
-	Snapshot	tsnapshot;
-	Snapshot	asnapshot;
+	MVCCSnapshot tsnapshot;
+	MVCCSnapshot asnapshot;
 
 	/* Set flag to indicate that we're initializing a parallel worker. */
 	InitializingParallelWorker = true;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 221fbb4e286..5b11109e332 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3200,7 +3200,7 @@ IndexCheckExclusion(Relation heapRelation,
 	/*
 	 * Scan all live tuples in the base relation.
 	 */
-	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 	scan = table_beginscan_strat(heapRelation,	/* relation */
 								 snapshot,	/* snapshot */
 								 0, /* number of keys */
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..af5876c8608 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -143,7 +143,7 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (omit_detached && ActiveSnapshotSet())
 			{
 				TransactionId xmin;
-				Snapshot	snap;
+				MVCCSnapshot snap;
 
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
 				snap = GetActiveSnapshot();
diff --git a/src/backend/catalog/pg_largeobject.c b/src/backend/catalog/pg_largeobject.c
index 0a477a8e8a9..a082796c0c7 100644
--- a/src/backend/catalog/pg_largeobject.c
+++ b/src/backend/catalog/pg_largeobject.c
@@ -177,7 +177,7 @@ LargeObjectExistsWithSnapshot(Oid loid, Snapshot snapshot)
 
 	sd = systable_beginscan(pg_lo_meta,
 							LargeObjectMetadataOidIndexId, true,
-							snapshot, 1, skey);
+							(Snapshot) snapshot, 1, skey);
 
 	tuple = systable_getnext(sd);
 	if (HeapTupleIsValid(tuple))
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bd37d5beb5..b6489c19681 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -449,7 +449,7 @@ static void asyncQueueReadAllNotifications(void);
 static bool asyncQueueProcessPageEntries(volatile QueuePosition *current,
 										 QueuePosition stop,
 										 char *page_buffer,
-										 Snapshot snapshot);
+										 MVCCSnapshot snapshot);
 static void asyncQueueAdvanceTail(void);
 static void ProcessIncomingNotify(bool flush);
 static bool AsyncExistsPendingNotify(Notification *n);
@@ -1852,7 +1852,7 @@ asyncQueueReadAllNotifications(void)
 {
 	volatile QueuePosition pos;
 	QueuePosition head;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 
 	/* page_buffer must be adequately aligned, so use a union */
 	union
@@ -1993,7 +1993,7 @@ asyncQueueReadAllNotifications(void)
 	PG_END_TRY();
 
 	/* Done with snapshot */
-	UnregisterSnapshot(snapshot);
+	UnregisterSnapshot((Snapshot) snapshot);
 }
 
 /*
@@ -2016,7 +2016,7 @@ static bool
 asyncQueueProcessPageEntries(volatile QueuePosition *current,
 							 QueuePosition stop,
 							 char *page_buffer,
-							 Snapshot snapshot)
+							 MVCCSnapshot snapshot)
 {
 	bool		reachedStop = false;
 	bool		reachedEndOfPage;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..3b74cbcb432 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
 		/* Create a QueryDesc requesting no output */
 		cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
 											GetActiveSnapshot(),
-											InvalidSnapshot,
+											InvalidMVCCSnapshot,
 											dest, NULL, NULL, 0);
 
 		/*
@@ -852,7 +852,7 @@ DoCopyTo(CopyToState cstate)
 		TupleTableSlot *slot;
 		TableScanDesc scandesc;
 
-		scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
+		scandesc = table_beginscan(cstate->rel, (Snapshot) GetActiveSnapshot(), 0, NULL);
 		slot = table_slot_create(cstate->rel, NULL);
 
 		processed = 0;
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 23cecd99c9e..74d6caaa4a6 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -333,7 +333,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 
 		/* Create a QueryDesc, redirecting output to our tuple receiver */
 		queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
-									GetActiveSnapshot(), InvalidSnapshot,
+									GetActiveSnapshot(), InvalidMVCCSnapshot,
 									dest, params, queryEnv, 0);
 
 		/* call ExecutorStart to prepare the plan for execution */
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 46310add459..35ad13af985 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -257,7 +257,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 	Page		page;
 	List	   *rlocatorlist = NIL;
 	LockRelId	relid;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 	SMgrRelation smgr;
 	BufferAccessStrategy bstrategy;
 
@@ -309,7 +309,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		/* Append relevant pg_class tuples for current page to rlocatorlist. */
 		rlocatorlist = ScanSourceDatabasePgClassPage(page, buf, tbid, dbid,
 													 srcpath, rlocatorlist,
-													 snapshot);
+													 (Snapshot) snapshot);
 
 		UnlockReleaseBuffer(buf);
 	}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c24e66f82e1..d77e1e5c04b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -698,7 +698,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 
 	/* Create a QueryDesc for the query */
 	queryDesc = CreateQueryDesc(plannedstmt, queryString,
-								GetActiveSnapshot(), InvalidSnapshot,
+								GetActiveSnapshot(), InvalidMVCCSnapshot,
 								dest, params, queryEnv, instrument_option);
 
 	/* Select execution options */
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d6e23caef17..c5f23ced82d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -587,7 +587,7 @@ DefineIndex(Oid tableId,
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
@@ -1722,13 +1722,13 @@ DefineIndex(Oid tableId,
 	 * We also set ActiveSnapshot to this snap, since functions in indexes may
 	 * need a snapshot.
 	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
+	snapshot = (MVCCSnapshot) RegisterSnapshot(GetTransactionSnapshot());
 	PushActiveSnapshot(snapshot);
 
 	/*
 	 * Scan the index and the heap, insert any missing index entries.
 	 */
-	validate_index(tableId, indexRelationId, snapshot);
+	validate_index(tableId, indexRelationId, (Snapshot) snapshot);
 
 	/*
 	 * Drop the reference snapshot.  We must do this before waiting out other
@@ -1740,7 +1740,7 @@ DefineIndex(Oid tableId,
 	limitXmin = snapshot->xmin;
 
 	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
+	UnregisterSnapshot((Snapshot) snapshot);
 
 	/*
 	 * The snapshot subsystem could still contain registered snapshots that
@@ -4128,7 +4128,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 	{
 		ReindexIndexInfo *newidx = lfirst(lc);
 		TransactionId limitXmin;
-		Snapshot	snapshot;
+		MVCCSnapshot snapshot;
 
 		StartTransactionCommand();
 
@@ -4147,7 +4147,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * Take the "reference snapshot" that will be used by validate_index()
 		 * to filter candidate tuples.
 		 */
-		snapshot = RegisterSnapshot(GetTransactionSnapshot());
+		snapshot = (MVCCSnapshot) RegisterSnapshot(GetTransactionSnapshot());
 		PushActiveSnapshot(snapshot);
 
 		/*
@@ -4161,7 +4161,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		progress_vals[3] = newidx->amId;
 		pgstat_progress_update_multi_param(4, progress_index, progress_vals);
 
-		validate_index(newidx->tableId, newidx->indexId, snapshot);
+		validate_index(newidx->tableId, newidx->indexId, (Snapshot) snapshot);
 
 		/*
 		 * We can now do away with our active snapshot, we still need to save
@@ -4170,7 +4170,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		limitXmin = snapshot->xmin;
 
 		PopActiveSnapshot();
-		UnregisterSnapshot(snapshot);
+		UnregisterSnapshot((Snapshot) snapshot);
 
 		/*
 		 * To ensure no deadlocks, we must commit and start yet another
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 4b3d4822872..33783cf6050 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 
 	/* Create a QueryDesc, redirecting output to our tuple receiver */
 	queryDesc = CreateQueryDesc(plan, queryString,
-								GetActiveSnapshot(), InvalidSnapshot,
+								GetActiveSnapshot(), InvalidMVCCSnapshot,
 								dest, NULL, NULL, 0);
 
 	/* call ExecutorStart to prepare the plan for execution */
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 33ea619224b..2f7e87ab111 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6222,7 +6222,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 		 * Scan through the rows, generating a new row if needed and then
 		 * checking all the constraints.
 		 */
-		snapshot = RegisterSnapshot(GetLatestSnapshot());
+		snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 		scan = table_beginscan(oldrel, snapshot, 0, NULL);
 
 		/*
@@ -12565,7 +12565,7 @@ validateForeignKeyConstraint(char *conname,
 	 * if that tuple had just been inserted.  If any of those fail, it should
 	 * ereport(ERROR) and that's that.
 	 */
-	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
 	scan = table_beginscan(rel, snapshot, 0, NULL);
 
@@ -20132,7 +20132,7 @@ ATExecDetachPartitionFinalize(Relation rel, RangeVar *name)
 {
 	Relation	partRel;
 	ObjectAddress address;
-	Snapshot	snap = GetActiveSnapshot();
+	MVCCSnapshot snap = GetActiveSnapshot();
 
 	partRel = table_openrv(name, AccessExclusiveLock);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 32f25f4d911..709b1feb178 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3312,7 +3312,7 @@ GetTupleForTrigger(EState *estate,
 		 */
 		if (!IsolationUsesXactSnapshot())
 			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-		test = table_tuple_lock(relation, tid, estate->es_snapshot, oldslot,
+		test = table_tuple_lock(relation, tid, (Snapshot) estate->es_snapshot, oldslot,
 								estate->es_output_cid,
 								lockmode, LockWaitBlock,
 								lockflags,
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 6b1d2383514..beb459c6dd0 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3171,7 +3171,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 		Snapshot	snapshot;
 
 		/* Scan all tuples in this relation */
-		snapshot = RegisterSnapshot(GetLatestSnapshot());
+		snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 		scan = table_beginscan(testrel, snapshot, 0, NULL);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
@@ -3247,7 +3247,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin)
 		Snapshot	snapshot;
 
 		/* Scan all tuples in this relation */
-		snapshot = RegisterSnapshot(GetLatestSnapshot());
+		snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 		scan = table_beginscan(testrel, snapshot, 0, NULL);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 7c87f012c30..6942b148dd8 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -863,21 +863,21 @@ retry:
 		 * happen often enough to be worth trying harder, and anyway we don't
 		 * want to hold any index internal locks while waiting.
 		 */
-		xwait = TransactionIdIsValid(DirtySnapshot.xmin) ?
-			DirtySnapshot.xmin : DirtySnapshot.xmax;
+		xwait = TransactionIdIsValid(DirtySnapshot.dirty.xmin) ?
+			DirtySnapshot.dirty.xmin : DirtySnapshot.dirty.xmax;
 
 		if (TransactionIdIsValid(xwait) &&
 			(waitMode == CEOUC_WAIT ||
 			 (waitMode == CEOUC_LIVELOCK_PREVENTING_WAIT &&
-			  DirtySnapshot.speculativeToken &&
+			  DirtySnapshot.dirty.speculativeToken &&
 			  TransactionIdPrecedes(GetCurrentTransactionId(), xwait))))
 		{
 			reason_wait = indexInfo->ii_ExclusionOps ?
 				XLTW_RecheckExclusionConstr : XLTW_InsertIndex;
 			index_endscan(index_scan);
-			if (DirtySnapshot.speculativeToken)
-				SpeculativeInsertionWait(DirtySnapshot.xmin,
-										 DirtySnapshot.speculativeToken);
+			if (DirtySnapshot.dirty.speculativeToken)
+				SpeculativeInsertionWait(DirtySnapshot.dirty.xmin,
+										 DirtySnapshot.dirty.speculativeToken);
 			else
 				XactLockTableWait(xwait, heap,
 								  &existing_slot->tts_tid, reason_wait);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a06295b6ba7..613a545d0a0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -239,8 +239,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	/*
 	 * Copy other important information into the EState
 	 */
-	estate->es_snapshot = RegisterSnapshot(queryDesc->snapshot);
-	estate->es_crosscheck_snapshot = RegisterSnapshot(queryDesc->crosscheck_snapshot);
+	estate->es_snapshot = (MVCCSnapshot) RegisterSnapshot(queryDesc->snapshot);
+	estate->es_crosscheck_snapshot = (MVCCSnapshot) RegisterSnapshot(queryDesc->crosscheck_snapshot);
 	estate->es_top_eflags = eflags;
 	estate->es_instrument = queryDesc->instrument_options;
 	estate->es_jit_flags = queryDesc->plannedstmt->jitFlags;
@@ -501,8 +501,8 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
 	ExecEndPlan(queryDesc->planstate, estate);
 
 	/* do away with our snapshots */
-	UnregisterSnapshot(estate->es_snapshot);
-	UnregisterSnapshot(estate->es_crosscheck_snapshot);
+	UnregisterSnapshot((Snapshot) estate->es_snapshot);
+	UnregisterSnapshot((Snapshot) estate->es_crosscheck_snapshot);
 
 	/*
 	 * Must switch out of context before destroying it
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ff4d9dd1bb3..3683aa3c232 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1259,7 +1259,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
 	/* Create a QueryDesc for the query. */
 	return CreateQueryDesc(pstmt,
 						   queryString,
-						   GetActiveSnapshot(), InvalidSnapshot,
+						   GetActiveSnapshot(), InvalidMVCCSnapshot,
 						   receiver, paramLI, NULL, instrument_options);
 }
 
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index e3e4e41ac38..45eacdad96f 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -259,8 +259,8 @@ retry:
 
 		ExecMaterializeSlot(outslot);
 
-		xwait = TransactionIdIsValid(snap.xmin) ?
-			snap.xmin : snap.xmax;
+		xwait = TransactionIdIsValid(snap.dirty.xmin) ?
+			snap.dirty.xmin : snap.dirty.xmax;
 
 		/*
 		 * If the tuple is locked, wait for locking transaction to finish and
@@ -285,7 +285,7 @@ retry:
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+		res = table_tuple_lock(rel, &(outslot->tts_tid), (Snapshot) GetActiveSnapshot(),
 							   outslot,
 							   GetCurrentCommandId(false),
 							   lockmode,
@@ -418,8 +418,8 @@ retry:
 		found = true;
 		ExecCopySlot(outslot, scanslot);
 
-		xwait = TransactionIdIsValid(snap.xmin) ?
-			snap.xmin : snap.xmax;
+		xwait = TransactionIdIsValid(snap.dirty.xmin) ?
+			snap.dirty.xmin : snap.dirty.xmax;
 
 		/*
 		 * If the tuple is locked, wait for locking transaction to finish and
@@ -443,7 +443,7 @@ retry:
 
 		PushActiveSnapshot(GetLatestSnapshot());
 
-		res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),
+		res = table_tuple_lock(rel, &(outslot->tts_tid), (Snapshot) GetLatestSnapshot(),
 							   outslot,
 							   GetCurrentCommandId(false),
 							   lockmode,
@@ -500,7 +500,7 @@ retry:
 
 	PushActiveSnapshot(GetLatestSnapshot());
 
-	res = table_tuple_lock(rel, &conflictTid, GetLatestSnapshot(),
+	res = table_tuple_lock(rel, &conflictTid, (Snapshot) GetLatestSnapshot(),
 						   *conflictslot,
 						   GetCurrentCommandId(false),
 						   LockTupleShare,
@@ -687,7 +687,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 		if (rel->rd_rel->relispartition)
 			ExecPartitionCheck(resultRelInfo, slot, estate, true);
 
-		simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
+		simple_table_tuple_update(rel, tid, slot, (Snapshot) estate->es_snapshot,
 								  &update_indexes);
 
 		conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
@@ -746,7 +746,7 @@ ExecSimpleRelationDelete(ResultRelInfo *resultRelInfo,
 	if (!skip_tuple)
 	{
 		/* OK, delete the tuple */
-		simple_table_tuple_delete(rel, tid, estate->es_snapshot);
+		simple_table_tuple_delete(rel, tid, (Snapshot) estate->es_snapshot);
 
 		/* AFTER ROW DELETE Triggers */
 		ExecARDeleteTriggers(estate, resultRelInfo,
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f71899463b8..494dfbc015d 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -110,8 +110,8 @@ CreateExecutorState(void)
 	 * Initialize all fields of the Executor State structure
 	 */
 	estate->es_direction = ForwardScanDirection;
-	estate->es_snapshot = InvalidSnapshot;	/* caller must initialize this */
-	estate->es_crosscheck_snapshot = InvalidSnapshot;	/* no crosscheck */
+	estate->es_snapshot = InvalidMVCCSnapshot;	/* caller must initialize this */
+	estate->es_crosscheck_snapshot = InvalidMVCCSnapshot;	/* no crosscheck */
 	estate->es_range_table = NIL;
 	estate->es_range_table_size = 0;
 	estate->es_relations = NULL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 757f8068e21..6733f06b3fe 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
 	es->qd = CreateQueryDesc(es->stmt,
 							 fcache->src,
 							 GetActiveSnapshot(),
-							 InvalidSnapshot,
+							 InvalidMVCCSnapshot,
 							 dest,
 							 fcache->paramLI,
 							 es->qd ? es->qd->queryEnv : NULL,
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index be616683f98..f8b20c12bb8 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -172,7 +172,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 						   node->ss.ps.plan->targetlist != NIL);
 
 			scan = table_beginscan_bm(node->ss.ss_currentRelation,
-									  node->ss.ps.state->es_snapshot,
+									  (Snapshot) node->ss.ps.state->es_snapshot,
 									  0,
 									  NULL,
 									  need_tuples);
diff --git a/src/backend/executor/nodeBitmapIndexscan.c b/src/backend/executor/nodeBitmapIndexscan.c
index d3ef5a00040..54def3c4c94 100644
--- a/src/backend/executor/nodeBitmapIndexscan.c
+++ b/src/backend/executor/nodeBitmapIndexscan.c
@@ -301,7 +301,7 @@ ExecInitBitmapIndexScan(BitmapIndexScan *node, EState *estate, int eflags)
 	 */
 	indexstate->biss_ScanDesc =
 		index_beginscan_bitmap(indexstate->biss_RelationDesc,
-							   estate->es_snapshot,
+							   (Snapshot) estate->es_snapshot,
 							   indexstate->biss_NumScanKeys);
 
 	/*
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index e6635233155..1809571e3f6 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -91,7 +91,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   estate->es_snapshot,
+								   (Snapshot) estate->es_snapshot,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys);
 
@@ -245,7 +245,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		if (!tuple_from_heap)
 			PredicateLockPage(scandesc->heapRelation,
 							  ItemPointerGetBlockNumber(tid),
-							  estate->es_snapshot);
+							  (Snapshot) estate->es_snapshot);
 
 		return slot;
 	}
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 865aba08e8a..39d1da8cb57 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -108,7 +108,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   estate->es_snapshot,
+								   (Snapshot) estate->es_snapshot,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys);
 
@@ -203,7 +203,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   estate->es_snapshot,
+								   (Snapshot) estate->es_snapshot,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys);
 
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 4e4e3db0b38..02e98660df6 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -182,7 +182,7 @@ lnext:
 		if (!IsolationUsesXactSnapshot())
 			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
 
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
+		test = table_tuple_lock(erm->relation, &tid, (Snapshot) estate->es_snapshot,
 								markSlot, estate->es_output_cid,
 								lockmode, erm->waitPolicy,
 								lockflags,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 1af8c9caf6c..bd0fa98564b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -292,7 +292,7 @@ ExecCheckTupleVisible(EState *estate,
 	if (!IsolationUsesXactSnapshot())
 		return;
 
-	if (!table_tuple_satisfies_snapshot(rel, slot, estate->es_snapshot))
+	if (!table_tuple_satisfies_snapshot(rel, slot, (Snapshot) estate->es_snapshot))
 	{
 		Datum		xminDatum;
 		TransactionId xmin;
@@ -1354,8 +1354,8 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
-							  estate->es_snapshot,
-							  estate->es_crosscheck_snapshot,
+							  (Snapshot) estate->es_snapshot,
+							  (Snapshot) estate->es_crosscheck_snapshot,
 							  true /* wait for commit */ ,
 							  &context->tmfd,
 							  changingPart);
@@ -1568,7 +1568,7 @@ ldelete:
 												 resultRelInfo->ri_RangeTableIndex);
 
 					result = table_tuple_lock(resultRelationDesc, tupleid,
-											  estate->es_snapshot,
+											  (Snapshot) estate->es_snapshot,
 											  inputslot, estate->es_output_cid,
 											  LockTupleExclusive, LockWaitBlock,
 											  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
@@ -2114,8 +2114,8 @@ lreplace:
 	 */
 	result = table_tuple_update(resultRelationDesc, tupleid, slot,
 								estate->es_output_cid,
-								estate->es_snapshot,
-								estate->es_crosscheck_snapshot,
+								(Snapshot) estate->es_snapshot,
+								(Snapshot) estate->es_crosscheck_snapshot,
 								true /* wait for commit */ ,
 								&context->tmfd, &updateCxt->lockmode,
 								&updateCxt->updateIndexes);
@@ -2404,7 +2404,7 @@ redo_act:
 												 resultRelInfo->ri_RangeTableIndex);
 
 					result = table_tuple_lock(resultRelationDesc, tupleid,
-											  estate->es_snapshot,
+											  (Snapshot) estate->es_snapshot,
 											  inputslot, estate->es_output_cid,
 											  updateCxt.lockmode, LockWaitBlock,
 											  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
@@ -2558,7 +2558,7 @@ ExecOnConflictUpdate(ModifyTableContext *context,
 	 * true anymore.
 	 */
 	test = table_tuple_lock(relation, conflictTid,
-							context->estate->es_snapshot,
+							(Snapshot) context->estate->es_snapshot,
 							existing, context->estate->es_output_cid,
 							lockmode, LockWaitBlock, 0,
 							&tmfd);
@@ -3188,7 +3188,7 @@ lmerge_matched:
 						inputslot = resultRelInfo->ri_oldTupleSlot;
 
 					result = table_tuple_lock(resultRelationDesc, tupleid,
-											  estate->es_snapshot,
+											  (Snapshot) estate->es_snapshot,
 											  inputslot, estate->es_output_cid,
 											  lockmode, LockWaitBlock,
 											  TUPLE_LOCK_FLAG_FIND_LAST_VERSION,
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b3db7548ed..9e19bbd11a9 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -293,7 +293,7 @@ tablesample_init(SampleScanState *scanstate)
 	{
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
-									 scanstate->ss.ps.state->es_snapshot,
+									 (Snapshot) scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index fa2d522b25f..8112bd621ab 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -69,7 +69,7 @@ SeqNext(SeqScanState *node)
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
+								   (Snapshot) estate->es_snapshot,
 								   0, NULL);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
@@ -241,7 +241,7 @@ ExecSeqScanEstimate(SeqScanState *node,
 	EState	   *estate = node->ss.ps.state;
 
 	node->pscan_len = table_parallelscan_estimate(node->ss.ss_currentRelation,
-												  estate->es_snapshot);
+												  (Snapshot) estate->es_snapshot);
 	shm_toc_estimate_chunk(&pcxt->estimator, node->pscan_len);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 }
@@ -262,7 +262,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
-								  estate->es_snapshot);
+								  (Snapshot) estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index ab2eab9596e..626ad9fd6b9 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -241,7 +241,7 @@ TidRangeNext(TidRangeScanState *node)
 		if (scandesc == NULL)
 		{
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
-												estate->es_snapshot,
+												(Snapshot) estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid);
 			node->ss.ss_currentScanDesc = scandesc;
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 5e56e29a15f..cf4a15eae31 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -148,7 +148,7 @@ TidListEval(TidScanState *tidstate)
 	if (tidstate->ss.ss_currentScanDesc == NULL)
 		tidstate->ss.ss_currentScanDesc =
 			table_beginscan_tid(tidstate->ss.ss_currentRelation,
-								tidstate->ss.ps.state->es_snapshot);
+								(Snapshot) tidstate->ss.ps.state->es_snapshot);
 	scan = tidstate->ss.ss_currentScanDesc;
 
 	/*
@@ -326,7 +326,7 @@ TidNext(TidScanState *node)
 	 */
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
-	snapshot = estate->es_snapshot;
+	snapshot = (Snapshot) estate->es_snapshot;
 	heapRelation = node->ss.ss_currentRelation;
 	slot = node->ss.ss_ScanTupleSlot;
 
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index ecb2e4ccaa1..7ac1da62ef8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -64,7 +64,7 @@ static void _SPI_prepare_plan(const char *src, SPIPlanPtr plan);
 static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan);
 
 static int	_SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
-							  Snapshot snapshot, Snapshot crosscheck_snapshot,
+							  MVCCSnapshot snapshot, MVCCSnapshot crosscheck_snapshot,
 							  bool fire_triggers);
 
 static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
@@ -618,7 +618,7 @@ SPI_execute(const char *src, bool read_only, long tcount)
 	options.tcount = tcount;
 
 	res = _SPI_execute_plan(&plan, &options,
-							InvalidSnapshot, InvalidSnapshot,
+							InvalidMVCCSnapshot, InvalidMVCCSnapshot,
 							true);
 
 	_SPI_end_call(true);
@@ -660,7 +660,7 @@ SPI_execute_extended(const char *src,
 	_SPI_prepare_oneshot_plan(src, &plan);
 
 	res = _SPI_execute_plan(&plan, options,
-							InvalidSnapshot, InvalidSnapshot,
+							InvalidMVCCSnapshot, InvalidMVCCSnapshot,
 							true);
 
 	_SPI_end_call(true);
@@ -692,7 +692,7 @@ SPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls,
 	options.tcount = tcount;
 
 	res = _SPI_execute_plan(plan, &options,
-							InvalidSnapshot, InvalidSnapshot,
+							InvalidMVCCSnapshot, InvalidMVCCSnapshot,
 							true);
 
 	_SPI_end_call(true);
@@ -721,7 +721,7 @@ SPI_execute_plan_extended(SPIPlanPtr plan,
 		return res;
 
 	res = _SPI_execute_plan(plan, options,
-							InvalidSnapshot, InvalidSnapshot,
+							InvalidMVCCSnapshot, InvalidMVCCSnapshot,
 							true);
 
 	_SPI_end_call(true);
@@ -749,7 +749,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
 	options.tcount = tcount;
 
 	res = _SPI_execute_plan(plan, &options,
-							InvalidSnapshot, InvalidSnapshot,
+							InvalidMVCCSnapshot, InvalidMVCCSnapshot,
 							true);
 
 	_SPI_end_call(true);
@@ -766,13 +766,13 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * This is currently not documented in spi.sgml because it is only intended
  * for use by RI triggers.
  *
- * Passing snapshot == InvalidSnapshot will select the normal behavior of
+ * Passing snapshot == InvalidMVCCSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
  */
 int
 SPI_execute_snapshot(SPIPlanPtr plan,
 					 Datum *Values, const char *Nulls,
-					 Snapshot snapshot, Snapshot crosscheck_snapshot,
+					 MVCCSnapshot snapshot, MVCCSnapshot crosscheck_snapshot,
 					 bool read_only, bool fire_triggers, long tcount)
 {
 	SPIExecuteOptions options;
@@ -849,7 +849,7 @@ SPI_execute_with_args(const char *src,
 	options.tcount = tcount;
 
 	res = _SPI_execute_plan(&plan, &options,
-							InvalidSnapshot, InvalidSnapshot,
+							InvalidMVCCSnapshot, InvalidMVCCSnapshot,
 							true);
 
 	_SPI_end_call(true);
@@ -1581,7 +1581,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	CachedPlan *cplan;
 	List	   *stmt_list;
 	char	   *query_string;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 	MemoryContext oldcontext;
 	Portal		portal;
 	SPICallbackArg spicallbackarg;
@@ -2389,15 +2389,15 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
  *		if NULL, CurrentResourceOwner is used (ignored for non-saved plan)
  *
  * Additional, only-internally-accessible options:
- * snapshot: query snapshot to use, or InvalidSnapshot for the normal
+ * snapshot: query snapshot to use, or InvalidMVCCSnapshot for the normal
  *		behavior of taking a new snapshot for each query.
- * crosscheck_snapshot: for RI use, all others pass InvalidSnapshot
+ * crosscheck_snapshot: for RI use, all others pass InvalidMVCCSnapshot
  * fire_triggers: true to fire AFTER triggers at end of query (normal case);
  *		false means any AFTER triggers are postponed to end of outer query
  */
 static int
 _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
-				  Snapshot snapshot, Snapshot crosscheck_snapshot,
+				  MVCCSnapshot snapshot, MVCCSnapshot crosscheck_snapshot,
 				  bool fire_triggers)
 {
 	int			my_res = 0;
@@ -2434,31 +2434,31 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	/*
 	 * We support four distinct snapshot management behaviors:
 	 *
-	 * snapshot != InvalidSnapshot, read_only = true: use exactly the given
-	 * snapshot.
+	 * snapshot != InvalidMVCCSnapshot, read_only = true: use exactly the
+	 * given snapshot.
 	 *
-	 * snapshot != InvalidSnapshot, read_only = false: use the given snapshot,
-	 * modified by advancing its command ID before each querytree.
+	 * snapshot != InvalidMVCCSnapshot, read_only = false: use the given
+	 * snapshot, modified by advancing its command ID before each querytree.
 	 *
-	 * snapshot == InvalidSnapshot, read_only = true: do nothing for queries
-	 * that require no snapshot.  For those that do, ensure that a Portal
-	 * snapshot exists; then use that, or use the entry-time ActiveSnapshot if
-	 * that exists and is different.
+	 * snapshot == InvalidMVCCSnapshot, read_only = true: do nothing for
+	 * queries that require no snapshot.  For those that do, ensure that a
+	 * Portal snapshot exists; then use that, or use the entry-time
+	 * ActiveSnapshot if that exists and is different.
 	 *
-	 * snapshot == InvalidSnapshot, read_only = false: do nothing for queries
-	 * that require no snapshot.  For those that do, ensure that a Portal
-	 * snapshot exists; then, in atomic execution (!allow_nonatomic) take a
-	 * full new snapshot for each user command, and advance its command ID
-	 * before each querytree within the command.  In allow_nonatomic mode we
-	 * just use the Portal snapshot unmodified.
+	 * snapshot == InvalidMVCCSnapshot, read_only = false: do nothing for
+	 * queries that require no snapshot.  For those that do, ensure that a
+	 * Portal snapshot exists; then, in atomic execution (!allow_nonatomic)
+	 * take a full new snapshot for each user command, and advance its command
+	 * ID before each querytree within the command.  In allow_nonatomic mode
+	 * we just use the Portal snapshot unmodified.
 	 *
 	 * In the first two cases, we can just push the snap onto the stack once
 	 * for the whole plan list.
 	 *
-	 * Note that snapshot != InvalidSnapshot implies an atomic execution
+	 * Note that snapshot != InvalidMVCCSnapshot implies an atomic execution
 	 * context.
 	 */
-	if (snapshot != InvalidSnapshot)
+	if (snapshot != InvalidMVCCSnapshot)
 	{
 		/* this intentionally tests the options field not the derived value */
 		Assert(!options->allow_nonatomic);
@@ -2583,7 +2583,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 		 * If we weren't given a specific snapshot to use, and the statement
 		 * list requires a snapshot, set that up.
 		 */
-		if (snapshot == InvalidSnapshot &&
+		if (snapshot == InvalidMVCCSnapshot &&
 			(list_length(stmt_list) > 1 ||
 			 (list_length(stmt_list) == 1 &&
 			  PlannedStmtRequiresSnapshot(linitial_node(PlannedStmt,
@@ -2682,12 +2682,12 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 			if (stmt->utilityStmt == NULL)
 			{
 				QueryDesc  *qdesc;
-				Snapshot	snap;
+				MVCCSnapshot snap;
 
 				if (ActiveSnapshotSet())
 					snap = GetActiveSnapshot();
 				else
-					snap = InvalidSnapshot;
+					snap = InvalidMVCCSnapshot;
 
 				qdesc = CreateQueryDesc(stmt,
 										plansource->query_string,
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index a272e82b850..12273a8b8d3 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -725,7 +725,7 @@ closeLOfd(int fd)
 	cookies[fd] = NULL;
 
 	if (lobj->snapshot)
-		UnregisterSnapshotFromOwner(lobj->snapshot,
+		UnregisterSnapshotFromOwner((Snapshot) lobj->snapshot,
 									TopTransactionResourceOwner);
 	inv_close(lobj);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 4bdc2941efb..4c1795cda05 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3367,7 +3367,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		partqualstate = ExecPrepareExpr(partition_constraint, estate);
 
 		econtext = GetPerTupleExprContext(estate);
-		snapshot = RegisterSnapshot(GetLatestSnapshot());
+		snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
 		scan = table_beginscan(part_rel, snapshot, 0, NULL);
 
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 328b4d450e4..d2047f4037e 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -97,7 +97,7 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 		rel->rd_partdesc_nodetached &&
 		ActiveSnapshotSet())
 	{
-		Snapshot	activesnap;
+		MVCCSnapshot activesnap;
 
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
 		activesnap = GetActiveSnapshot();
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 0bff0f10652..aa69cd06fec 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -586,7 +586,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(r);
 	uint8		info = XLogRecGetInfo(r) & ~XLR_INFO_MASK;
 	RepOriginId origin_id = XLogRecGetOrigin(r);
-	Snapshot	snapshot = NULL;
+	HistoricMVCCSnapshot snapshot = NULL;
 	xl_logical_message *message;
 
 	if (info != XLOG_LOGICAL_MESSAGE)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 79b60df7cf0..f16ac73e9f5 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -265,9 +265,9 @@ static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
 										TransactionId xid, XLogSegNo segno);
 static int	ReorderBufferTXNSizeCompare(const pairingheap_node *a, const pairingheap_node *b, void *arg);
 
-static void ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap);
-static Snapshot ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
-									  ReorderBufferTXN *txn, CommandId cid);
+static void ReorderBufferFreeSnap(ReorderBuffer *rb, HistoricMVCCSnapshot snap);
+static HistoricMVCCSnapshot ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
+												  ReorderBufferTXN *txn, CommandId cid);
 
 /*
  * ---------------------------------------
@@ -849,7 +849,7 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
  */
 void
 ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
-						  Snapshot snap, XLogRecPtr lsn,
+						  HistoricMVCCSnapshot snap, XLogRecPtr lsn,
 						  bool transactional, const char *prefix,
 						  Size message_size, const char *message)
 {
@@ -883,7 +883,7 @@ ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
 	else
 	{
 		ReorderBufferTXN *txn = NULL;
-		volatile Snapshot snapshot_now = snap;
+		volatile	HistoricMVCCSnapshot snapshot_now = snap;
 
 		/* Non-transactional changes require a valid snapshot. */
 		Assert(snapshot_now);
@@ -1829,35 +1829,35 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
  * that catalog modifying transactions can look into intermediate catalog
  * states.
  */
-static Snapshot
-ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
+static HistoricMVCCSnapshot
+ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
 					  ReorderBufferTXN *txn, CommandId cid)
 {
-	Snapshot	snap;
+	HistoricMVCCSnapshot snap;
 	dlist_iter	iter;
 	int			i = 0;
 	Size		size;
 
-	size = sizeof(SnapshotData) +
+	size = sizeof(HistoricMVCCSnapshotData) +
 		sizeof(TransactionId) * orig_snap->xcnt +
 		sizeof(TransactionId) * (txn->nsubtxns + 1);
 
 	snap = MemoryContextAllocZero(rb->context, size);
-	memcpy(snap, orig_snap, sizeof(SnapshotData));
+	memcpy(snap, orig_snap, sizeof(HistoricMVCCSnapshotData));
 
 	snap->copied = true;
-	snap->active_count = 1;		/* mark as active so nobody frees it */
+	snap->refcount = 1;			/* mark as active so nobody frees it */
 	snap->regd_count = 0;
-	snap->xip = (TransactionId *) (snap + 1);
+	snap->committed_xids = (TransactionId *) (snap + 1);
 
-	memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
+	memcpy(snap->committed_xids, orig_snap->committed_xids, sizeof(TransactionId) * snap->xcnt);
 
 	/*
 	 * snap->subxip contains all txids that belong to our transaction which we
 	 * need to check via cmin/cmax. That's why we store the toplevel
 	 * transaction in there as well.
 	 */
-	snap->subxip = snap->xip + snap->xcnt;
+	snap->subxip = snap->committed_xids + snap->xcnt;
 	snap->subxip[i++] = txn->xid;
 
 	/*
@@ -1889,7 +1889,7 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
  * Free a previously ReorderBufferCopySnap'ed snapshot
  */
 static void
-ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap)
+ReorderBufferFreeSnap(ReorderBuffer *rb, HistoricMVCCSnapshot snap)
 {
 	if (snap->copied)
 		pfree(snap);
@@ -2040,7 +2040,7 @@ ReorderBufferApplyMessage(ReorderBuffer *rb, ReorderBufferTXN *txn,
  */
 static inline void
 ReorderBufferSaveTXNSnapshot(ReorderBuffer *rb, ReorderBufferTXN *txn,
-							 Snapshot snapshot_now, CommandId command_id)
+							 HistoricMVCCSnapshot snapshot_now, CommandId command_id)
 {
 	txn->command_id = command_id;
 
@@ -2061,7 +2061,7 @@ ReorderBufferSaveTXNSnapshot(ReorderBuffer *rb, ReorderBufferTXN *txn,
  */
 static void
 ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
-					  Snapshot snapshot_now,
+					  HistoricMVCCSnapshot snapshot_now,
 					  CommandId command_id,
 					  XLogRecPtr last_lsn,
 					  ReorderBufferChange *specinsert)
@@ -2108,7 +2108,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 static void
 ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 						XLogRecPtr commit_lsn,
-						volatile Snapshot snapshot_now,
+						volatile HistoricMVCCSnapshot snapshot_now,
 						volatile CommandId command_id,
 						bool streaming)
 {
@@ -2682,7 +2682,7 @@ ReorderBufferReplay(ReorderBufferTXN *txn,
 					TimestampTz commit_time,
 					RepOriginId origin_id, XLogRecPtr origin_lsn)
 {
-	Snapshot	snapshot_now;
+	HistoricMVCCSnapshot snapshot_now;
 	CommandId	command_id = FirstCommandId;
 
 	txn->final_lsn = commit_lsn;
@@ -3142,7 +3142,7 @@ ReorderBufferProcessXid(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
  */
 void
 ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
-						 XLogRecPtr lsn, Snapshot snap)
+						 XLogRecPtr lsn, HistoricMVCCSnapshot snap)
 {
 	ReorderBufferChange *change = ReorderBufferGetChange(rb);
 
@@ -3160,7 +3160,7 @@ ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
  */
 void
 ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
-							 XLogRecPtr lsn, Snapshot snap)
+							 XLogRecPtr lsn, HistoricMVCCSnapshot snap)
 {
 	ReorderBufferTXN *txn;
 	bool		is_new;
@@ -3920,12 +3920,12 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	snap;
+				HistoricMVCCSnapshot snap;
 				char	   *data;
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
+				sz += sizeof(HistoricMVCCSnapshotData) +
 					sizeof(TransactionId) * snap->xcnt +
 					sizeof(TransactionId) * snap->subxcnt;
 
@@ -3935,12 +3935,12 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
-				memcpy(data, snap, sizeof(SnapshotData));
-				data += sizeof(SnapshotData);
+				memcpy(data, snap, sizeof(HistoricMVCCSnapshotData));
+				data += sizeof(HistoricMVCCSnapshotData);
 
 				if (snap->xcnt)
 				{
-					memcpy(data, snap->xip,
+					memcpy(data, snap->committed_xids,
 						   sizeof(TransactionId) * snap->xcnt);
 					data += sizeof(TransactionId) * snap->xcnt;
 				}
@@ -4054,7 +4054,7 @@ ReorderBufferCanStartStreaming(ReorderBuffer *rb)
 static void
 ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 {
-	Snapshot	snapshot_now;
+	HistoricMVCCSnapshot snapshot_now;
 	CommandId	command_id;
 	Size		stream_bytes;
 	bool		txn_is_streamed;
@@ -4222,11 +4222,11 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	snap;
+				HistoricMVCCSnapshot snap;
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
+				sz += sizeof(HistoricMVCCSnapshotData) +
 					sizeof(TransactionId) * snap->xcnt +
 					sizeof(TransactionId) * snap->subxcnt;
 
@@ -4506,13 +4506,13 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	oldsnap;
-				Snapshot	newsnap;
+				HistoricMVCCSnapshot oldsnap;
+				HistoricMVCCSnapshot newsnap;
 				Size		size;
 
-				oldsnap = (Snapshot) data;
+				oldsnap = (HistoricMVCCSnapshot) data;
 
-				size = sizeof(SnapshotData) +
+				size = sizeof(HistoricMVCCSnapshotData) +
 					sizeof(TransactionId) * oldsnap->xcnt +
 					sizeof(TransactionId) * (oldsnap->subxcnt + 0);
 
@@ -4521,9 +4521,9 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				newsnap = change->data.snapshot;
 
 				memcpy(newsnap, data, size);
-				newsnap->xip = (TransactionId *)
-					(((char *) newsnap) + sizeof(SnapshotData));
-				newsnap->subxip = newsnap->xip + newsnap->xcnt;
+				newsnap->committed_xids = (TransactionId *)
+					(((char *) newsnap) + sizeof(HistoricMVCCSnapshotData));
+				newsnap->subxip = newsnap->committed_xids + newsnap->xcnt;
 				newsnap->copied = true;
 				break;
 			}
@@ -5194,7 +5194,7 @@ file_sort_by_lsn(const ListCell *a_p, const ListCell *b_p)
  * transaction for relid.
  */
 static void
-UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
+UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, HistoricMVCCSnapshot snapshot)
 {
 	DIR		   *mapping_dir;
 	struct dirent *mapping_de;
@@ -5273,7 +5273,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
  */
 bool
 ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
-							  Snapshot snapshot,
+							  HistoricMVCCSnapshot snapshot,
 							  HeapTuple htup, Buffer buffer,
 							  CommandId *cmin, CommandId *cmax)
 {
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index bbedd3de318..f2fc352c634 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -155,11 +155,11 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
+static HistoricMVCCSnapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
-static void SnapBuildFreeSnapshot(Snapshot snap);
+static void SnapBuildFreeSnapshot(HistoricMVCCSnapshot snap);
 
-static void SnapBuildSnapIncRefcount(Snapshot snap);
+static void SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap);
 
 static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
@@ -249,23 +249,21 @@ FreeSnapshotBuilder(SnapBuild *builder)
  * Free an unreferenced snapshot that has previously been built by us.
  */
 static void
-SnapBuildFreeSnapshot(Snapshot snap)
+SnapBuildFreeSnapshot(HistoricMVCCSnapshot snap)
 {
 	/* make sure we don't get passed an external snapshot */
 	Assert(snap->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
-	Assert(!snap->takenDuringRecovery);
 	Assert(snap->regd_count == 0);
 
 	/* slightly more likely, so it's checked even without c-asserts */
 	if (snap->copied)
 		elog(ERROR, "cannot free a copied snapshot");
 
-	if (snap->active_count)
-		elog(ERROR, "cannot free an active snapshot");
+	if (snap->refcount)
+		elog(ERROR, "cannot free a snapshot that's in use");
 
 	pfree(snap);
 }
@@ -313,9 +311,9 @@ SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
  * adding a Snapshot as builder->snapshot.
  */
 static void
-SnapBuildSnapIncRefcount(Snapshot snap)
+SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap)
 {
-	snap->active_count++;
+	snap->refcount++;
 }
 
 /*
@@ -325,26 +323,23 @@ SnapBuildSnapIncRefcount(Snapshot snap)
  * IncRef'ed Snapshot can adjust its refcount easily.
  */
 void
-SnapBuildSnapDecRefcount(Snapshot snap)
+SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap)
 {
 	/* make sure we don't get passed an external snapshot */
 	Assert(snap->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
-	Assert(!snap->takenDuringRecovery);
 
+	Assert(snap->refcount > 0);
 	Assert(snap->regd_count == 0);
 
-	Assert(snap->active_count > 0);
-
 	/* slightly more likely, so it's checked even without casserts */
 	if (snap->copied)
 		elog(ERROR, "cannot free a copied snapshot");
 
-	snap->active_count--;
-	if (snap->active_count == 0)
+	snap->refcount--;
+	if (snap->refcount == 0)
 		SnapBuildFreeSnapshot(snap);
 }
 
@@ -356,15 +351,15 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * these snapshots; they have to copy them and fill in appropriate ->curcid
  * and ->subxip/subxcnt values.
  */
-static Snapshot
+static HistoricMVCCSnapshot
 SnapBuildBuildSnapshot(SnapBuild *builder)
 {
-	Snapshot	snapshot;
+	HistoricMVCCSnapshot snapshot;
 	Size		ssize;
 
 	Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
 
-	ssize = sizeof(SnapshotData)
+	ssize = sizeof(HistoricMVCCSnapshotData)
 		+ sizeof(TransactionId) * builder->committed.xcnt
 		+ sizeof(TransactionId) * 1 /* toplevel xid */ ;
 
@@ -400,15 +395,15 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->xmax = builder->xmax;
 
 	/* store all transactions to be treated as committed by this snapshot */
-	snapshot->xip =
-		(TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
+	snapshot->committed_xids =
+		(TransactionId *) ((char *) snapshot + sizeof(HistoricMVCCSnapshotData));
 	snapshot->xcnt = builder->committed.xcnt;
-	memcpy(snapshot->xip,
+	memcpy(snapshot->committed_xids,
 		   builder->committed.xip,
 		   builder->committed.xcnt * sizeof(TransactionId));
 
 	/* sort so we can bsearch() */
-	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+	qsort(snapshot->committed_xids, snapshot->xcnt, sizeof(TransactionId), xidComparator);
 
 	/*
 	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
@@ -418,13 +413,10 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->subxcnt = 0;
 	snapshot->subxip = NULL;
 
-	snapshot->suboverflowed = false;
-	snapshot->takenDuringRecovery = false;
 	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
-	snapshot->active_count = 0;
+	snapshot->refcount = 0;
 	snapshot->regd_count = 0;
-	snapshot->snapXactCompletionCount = 0;
 
 	return snapshot;
 }
@@ -436,13 +428,13 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
  * The snapshot will be usable directly in current transaction or exported
  * for loading in different transaction.
  */
-Snapshot
+MVCCSnapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
-	Snapshot	snap;
+	HistoricMVCCSnapshot historicsnap;
+	MVCCSnapshot mvccsnap;
 	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
 	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -464,10 +456,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyProc->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyProc->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder);
+	historicsnap = SnapBuildBuildSnapshot(builder);
 
 	/*
-	 * We know that snap->xmin is alive, enforced by the logical xmin
+	 * We know that historicsnap->xmin is alive, enforced by the logical xmin
 	 * mechanism. Due to that we can do this without locks, we're only
 	 * changing our own value.
 	 *
@@ -479,15 +471,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	safeXid = GetOldestSafeDecodingTransactionId(false);
 	LWLockRelease(ProcArrayLock);
 
-	if (TransactionIdFollows(safeXid, snap->xmin))
+	if (TransactionIdFollows(safeXid, historicsnap->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot as oldest safe xid %u follows snapshot's xmin %u",
-			 safeXid, snap->xmin);
+			 safeXid, historicsnap->xmin);
 
-	MyProc->xmin = snap->xmin;
+	MyProc->xmin = historicsnap->xmin;
 
 	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	mvccsnap = palloc(sizeof(MVCCSnapshotData) + sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	mvccsnap->snapshot_type = SNAPSHOT_MVCC;
+	mvccsnap->xmin = historicsnap->xmin;
+	mvccsnap->xmax = historicsnap->xmax;
+	mvccsnap->xip = (TransactionId *) ((char *) mvccsnap + sizeof(MVCCSnapshotData));
 
 	/*
 	 * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -495,7 +490,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = historicsnap->xmin; NormalTransactionIdPrecedes(xid, historicsnap->xmax);)
 	{
 		void	   *test;
 
@@ -503,7 +498,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 		 * Check whether transaction committed using the decoding snapshot
 		 * meaning of ->xip.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, historicsnap->committed_xids, historicsnap->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -513,18 +508,26 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 						 errmsg("initial slot snapshot too large")));
 
-			newxip[newxcnt++] = xid;
+			mvccsnap->xip[newxcnt++] = xid;
 		}
 
 		TransactionIdAdvance(xid);
 	}
 
 	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
-
-	return snap;
+	mvccsnap->xcnt = newxcnt;
+
+	mvccsnap->subxip = NULL;
+	mvccsnap->subxcnt = 0;
+	mvccsnap->suboverflowed = false;
+	mvccsnap->takenDuringRecovery = false;
+	mvccsnap->copied = false;
+	mvccsnap->curcid = FirstCommandId;
+	mvccsnap->active_count = 0;
+	mvccsnap->regd_count = 0;
+	mvccsnap->snapXactCompletionCount = 0;
+
+	return mvccsnap;
 }
 
 /*
@@ -538,7 +541,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 const char *
 SnapBuildExportSnapshot(SnapBuild *builder)
 {
-	Snapshot	snap;
+	MVCCSnapshot snap;
 	char	   *snapname;
 
 	if (IsTransactionOrTransactionBlock())
@@ -575,7 +578,7 @@ SnapBuildExportSnapshot(SnapBuild *builder)
 /*
  * Ensure there is a snapshot and if not build one for current transaction.
  */
-Snapshot
+HistoricMVCCSnapshot
 SnapBuildGetOrBuildSnapshot(SnapBuild *builder)
 {
 	Assert(builder->state == SNAPBUILD_CONSISTENT);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index a0782b1bbf6..7e208e34784 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1305,7 +1305,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		}
 		else if (snapshot_action == CRS_USE_SNAPSHOT)
 		{
-			Snapshot	snap;
+			MVCCSnapshot snap;
 
 			snap = SnapBuildInitialSnapshot(ctx->snapshot_builder);
 			RestoreTransactionSnapshot(snap, MyProc);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 2e54c11f880..b2751dfa63b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2092,7 +2092,7 @@ GetMaxSnapshotSubxidCount(void)
  * least in the case we already hold a snapshot), but that's for another day.
  */
 static bool
-GetSnapshotDataReuse(Snapshot snapshot)
+GetSnapshotDataReuse(MVCCSnapshot snapshot)
 {
 	uint64		curXactCompletionCount;
 
@@ -2171,8 +2171,8 @@ GetSnapshotDataReuse(Snapshot snapshot)
  * Note: this function should probably not be called with an argument that's
  * not statically allocated (see xip allocation below).
  */
-Snapshot
-GetSnapshotData(Snapshot snapshot)
+MVCCSnapshot
+GetSnapshotData(MVCCSnapshot snapshot)
 {
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId *other_xids = ProcGlobal->xids;
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
index 68b76f2cc18..5bbfd0abb65 100644
--- a/src/backend/storage/large_object/inv_api.c
+++ b/src/backend/storage/large_object/inv_api.c
@@ -215,7 +215,7 @@ LargeObjectDesc *
 inv_open(Oid lobjId, int flags, MemoryContext mcxt)
 {
 	LargeObjectDesc *retval;
-	Snapshot	snapshot = NULL;
+	MVCCSnapshot snapshot = NULL;
 	int			descflags = 0;
 
 	/*
@@ -241,7 +241,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
 		snapshot = GetActiveSnapshot();
 
 	/* Can't use LargeObjectExists here because we need to specify snapshot */
-	if (!LargeObjectExistsWithSnapshot(lobjId, snapshot))
+	if (!LargeObjectExistsWithSnapshot(lobjId, (Snapshot) snapshot))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("large object %u does not exist", lobjId)));
@@ -253,7 +253,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
 			pg_largeobject_aclcheck_snapshot(lobjId,
 											 GetUserId(),
 											 ACL_SELECT,
-											 snapshot) != ACLCHECK_OK)
+											 (Snapshot) snapshot) != ACLCHECK_OK)
 			ereport(ERROR,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied for large object %u",
@@ -265,7 +265,7 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt)
 			pg_largeobject_aclcheck_snapshot(lobjId,
 											 GetUserId(),
 											 ACL_UPDATE,
-											 snapshot) != ACLCHECK_OK)
+											 (Snapshot) snapshot) != ACLCHECK_OK)
 			ereport(ERROR,
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied for large object %u",
@@ -354,7 +354,7 @@ inv_getsize(LargeObjectDesc *obj_desc)
 				ObjectIdGetDatum(obj_desc->id));
 
 	sd = systable_beginscan_ordered(lo_heap_r, lo_index_r,
-									obj_desc->snapshot, 1, skey);
+									(Snapshot) obj_desc->snapshot, 1, skey);
 
 	/*
 	 * Because the pg_largeobject index is on both loid and pageno, but we
@@ -484,7 +484,7 @@ inv_read(LargeObjectDesc *obj_desc, char *buf, int nbytes)
 				Int32GetDatum(pageno));
 
 	sd = systable_beginscan_ordered(lo_heap_r, lo_index_r,
-									obj_desc->snapshot, 2, skey);
+									(Snapshot) obj_desc->snapshot, 2, skey);
 
 	while ((tuple = systable_getnext_ordered(sd, ForwardScanDirection)) != NULL)
 	{
@@ -604,7 +604,7 @@ inv_write(LargeObjectDesc *obj_desc, const char *buf, int nbytes)
 				Int32GetDatum(pageno));
 
 	sd = systable_beginscan_ordered(lo_heap_r, lo_index_r,
-									obj_desc->snapshot, 2, skey);
+									(Snapshot) obj_desc->snapshot, 2, skey);
 
 	oldtuple = NULL;
 	olddata = NULL;
@@ -797,7 +797,7 @@ inv_truncate(LargeObjectDesc *obj_desc, int64 len)
 				Int32GetDatum(pageno));
 
 	sd = systable_beginscan_ordered(lo_heap_r, lo_index_r,
-									obj_desc->snapshot, 2, skey);
+									(Snapshot) obj_desc->snapshot, 2, skey);
 
 	/*
 	 * If possible, get the page the truncation point is in. The truncation
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 5b21a053981..5adf0b1ffe9 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -449,10 +449,10 @@ static void SerialSetActiveSerXmin(TransactionId xid);
 
 static uint32 predicatelock_hash(const void *key, Size keysize);
 static void SummarizeOldestCommittedSxact(void);
-static Snapshot GetSafeSnapshot(Snapshot origSnapshot);
-static Snapshot GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-													  VirtualTransactionId *sourcevxid,
-													  int sourcepid);
+static MVCCSnapshot GetSafeSnapshot(MVCCSnapshot origSnapshot);
+static MVCCSnapshot GetSerializableTransactionSnapshotInt(MVCCSnapshot snapshot,
+														  VirtualTransactionId *sourcevxid,
+														  int sourcepid);
 static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag);
 static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
 									  PREDICATELOCKTARGETTAG *parent);
@@ -1544,10 +1544,10 @@ SummarizeOldestCommittedSxact(void)
  *		for), the passed-in Snapshot pointer should reference a static data
  *		area that can safely be passed to GetSnapshotData.
  */
-static Snapshot
-GetSafeSnapshot(Snapshot origSnapshot)
+static MVCCSnapshot
+GetSafeSnapshot(MVCCSnapshot origSnapshot)
 {
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 
 	Assert(XactReadOnly && XactDeferrable);
 
@@ -1668,8 +1668,8 @@ GetSafeSnapshotBlockingPids(int blocked_pid, int *output, int output_size)
  * always this same pointer; no new snapshot data structure is allocated
  * within this function.
  */
-Snapshot
-GetSerializableTransactionSnapshot(Snapshot snapshot)
+MVCCSnapshot
+GetSerializableTransactionSnapshot(MVCCSnapshot snapshot)
 {
 	Assert(IsolationIsSerializable());
 
@@ -1709,7 +1709,7 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
  * read-only.
  */
 void
-SetSerializableTransactionSnapshot(Snapshot snapshot,
+SetSerializableTransactionSnapshot(MVCCSnapshot snapshot,
 								   VirtualTransactionId *sourcevxid,
 								   int sourcepid)
 {
@@ -1750,8 +1750,8 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
  * source xact is still running after we acquire SerializableXactHashLock.
  * We do that by calling ProcArrayInstallImportedXmin.
  */
-static Snapshot
-GetSerializableTransactionSnapshotInt(Snapshot snapshot,
+static MVCCSnapshot
+GetSerializableTransactionSnapshotInt(MVCCSnapshot snapshot,
 									  VirtualTransactionId *sourcevxid,
 									  int sourcepid)
 {
@@ -3961,7 +3961,7 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 static bool
 XidIsConcurrent(TransactionId xid)
 {
-	Snapshot	snap;
+	MVCCSnapshot snap;
 
 	Assert(TransactionIdIsValid(xid));
 	Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -4214,7 +4214,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		}
 		else if (!SxactIsDoomed(sxact)
 				 && (!SxactIsCommitted(sxact)
-					 || TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
+					 || TransactionIdPrecedes(TransactionXmin,
 											  sxact->finishedBefore))
 				 && !RWConflictExists(sxact, MySerializableXact))
 		{
@@ -4227,7 +4227,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 			 */
 			if (!SxactIsDoomed(sxact)
 				&& (!SxactIsCommitted(sxact)
-					|| TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
+					|| TransactionIdPrecedes(TransactionXmin,
 											 sxact->finishedBefore))
 				&& !RWConflictExists(sxact, MySerializableXact))
 			{
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c01cff9d650..939a402540c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1229,7 +1229,7 @@ exec_simple_query(const char *query_string)
 		/*
 		 * Start the portal.  No parameters here.
 		 */
-		PortalStart(portal, NULL, 0, InvalidSnapshot);
+		PortalStart(portal, NULL, 0, InvalidMVCCSnapshot);
 
 		/*
 		 * Select the appropriate output format: text unless we are doing a
@@ -2034,7 +2034,7 @@ exec_bind_message(StringInfo input_message)
 	/*
 	 * And we're ready to start portal execution.
 	 */
-	PortalStart(portal, params, 0, InvalidSnapshot);
+	PortalStart(portal, params, 0, InvalidMVCCSnapshot);
 
 	/*
 	 * Apply the result format requests to the portal.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 6f22496305a..2a4142d65ea 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -66,8 +66,8 @@ static void DoPortalRewind(Portal portal);
 QueryDesc *
 CreateQueryDesc(PlannedStmt *plannedstmt,
 				const char *sourceText,
-				Snapshot snapshot,
-				Snapshot crosscheck_snapshot,
+				MVCCSnapshot snapshot,
+				MVCCSnapshot crosscheck_snapshot,
 				DestReceiver *dest,
 				ParamListInfo params,
 				QueryEnvironment *queryEnv,
@@ -78,9 +78,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
 	qd->operation = plannedstmt->commandType;	/* operation */
 	qd->plannedstmt = plannedstmt;	/* plan */
 	qd->sourceText = sourceText;	/* query text */
-	qd->snapshot = RegisterSnapshot(snapshot);	/* snapshot */
+	qd->snapshot = (MVCCSnapshot) RegisterSnapshot(snapshot);	/* snapshot */
 	/* RI check snapshot */
-	qd->crosscheck_snapshot = RegisterSnapshot(crosscheck_snapshot);
+	qd->crosscheck_snapshot = (MVCCSnapshot) RegisterSnapshot(crosscheck_snapshot);
 	qd->dest = dest;			/* output dest */
 	qd->params = params;		/* parameter values passed into query */
 	qd->queryEnv = queryEnv;
@@ -108,8 +108,8 @@ FreeQueryDesc(QueryDesc *qdesc)
 	Assert(qdesc->estate == NULL);
 
 	/* forget our snapshots */
-	UnregisterSnapshot(qdesc->snapshot);
-	UnregisterSnapshot(qdesc->crosscheck_snapshot);
+	UnregisterSnapshot((Snapshot) qdesc->snapshot);
+	UnregisterSnapshot((Snapshot) qdesc->crosscheck_snapshot);
 
 	/* Only the QueryDesc itself need be freed */
 	pfree(qdesc);
@@ -146,7 +146,7 @@ ProcessQuery(PlannedStmt *plan,
 	 * Create the QueryDesc object
 	 */
 	queryDesc = CreateQueryDesc(plan, sourceText,
-								GetActiveSnapshot(), InvalidSnapshot,
+								GetActiveSnapshot(), InvalidMVCCSnapshot,
 								dest, params, queryEnv, 0);
 
 	/*
@@ -431,7 +431,7 @@ FetchStatementTargetList(Node *stmt)
  */
 void
 PortalStart(Portal portal, ParamListInfo params,
-			int eflags, Snapshot snapshot)
+			int eflags, MVCCSnapshot snapshot)
 {
 	Portal		saveActivePortal;
 	ResourceOwner saveResourceOwner;
@@ -495,7 +495,7 @@ PortalStart(Portal portal, ParamListInfo params,
 				queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
 											portal->sourceText,
 											GetActiveSnapshot(),
-											InvalidSnapshot,
+											InvalidMVCCSnapshot,
 											None_Receiver,
 											params,
 											portal->queryEnv,
@@ -1127,12 +1127,12 @@ PortalRunUtility(Portal portal, PlannedStmt *pstmt,
 	 */
 	if (PlannedStmtRequiresSnapshot(pstmt))
 	{
-		Snapshot	snapshot = GetTransactionSnapshot();
+		MVCCSnapshot snapshot = GetTransactionSnapshot();
 
 		/* If told to, register the snapshot we're using and save in portal */
 		if (setHoldSnapshot)
 		{
-			snapshot = RegisterSnapshot(snapshot);
+			snapshot = (MVCCSnapshot) RegisterSnapshot(snapshot);
 			portal->holdSnapshot = snapshot;
 		}
 
@@ -1235,12 +1235,12 @@ PortalRunMulti(Portal portal,
 			 */
 			if (!active_snapshot_set)
 			{
-				Snapshot	snapshot = GetTransactionSnapshot();
+				MVCCSnapshot snapshot = GetTransactionSnapshot();
 
 				/* If told to, register the snapshot and save in portal */
 				if (setHoldSnapshot)
 				{
-					snapshot = RegisterSnapshot(snapshot);
+					snapshot = (MVCCSnapshot) RegisterSnapshot(snapshot);
 					portal->holdSnapshot = snapshot;
 				}
 
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 6a76550a5e2..ac253f7330c 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -4692,7 +4692,7 @@ has_lo_priv_byid(Oid roleid, Oid lobjId, AclMode priv, bool *is_missing)
 	if (priv & ACL_UPDATE)
 		snapshot = NULL;
 	else
-		snapshot = GetActiveSnapshot();
+		snapshot = (Snapshot) GetActiveSnapshot();
 
 	if (!LargeObjectExistsWithSnapshot(lobjId, snapshot))
 	{
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 0d8b53d1b75..cab86849a13 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1639,7 +1639,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	spi_result = SPI_execute_snapshot(qplan,
 									  NULL, NULL,
 									  GetLatestSnapshot(),
-									  InvalidSnapshot,
+									  InvalidMVCCSnapshot,
 									  true, false, 1);
 
 	/* Check result */
@@ -1878,7 +1878,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	spi_result = SPI_execute_snapshot(qplan,
 									  NULL, NULL,
 									  GetLatestSnapshot(),
-									  InvalidSnapshot,
+									  InvalidMVCCSnapshot,
 									  true, false, 1);
 
 	/* Check result */
@@ -2400,8 +2400,8 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Relation	query_rel,
 				source_rel;
 	bool		source_is_pk;
-	Snapshot	test_snapshot;
-	Snapshot	crosscheck_snapshot;
+	MVCCSnapshot test_snapshot;
+	MVCCSnapshot crosscheck_snapshot;
 	int			limit;
 	int			spi_result;
 	Oid			save_userid;
@@ -2471,8 +2471,8 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	else
 	{
 		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
-		crosscheck_snapshot = InvalidSnapshot;
+		test_snapshot = InvalidMVCCSnapshot;
+		crosscheck_snapshot = InvalidMVCCSnapshot;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2089b52d575..e4e5cdb7cae 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2192,7 +2192,7 @@ pg_get_constraintdef_worker(Oid constraintId, bool fullCommand,
 	StringInfoData buf;
 	SysScanDesc scandesc;
 	ScanKeyData scankey[1];
-	Snapshot	snapshot = RegisterSnapshot(GetTransactionSnapshot());
+	Snapshot	snapshot = (Snapshot) RegisterSnapshot(GetTransactionSnapshot());
 	Relation	relation = table_open(ConstraintRelationId, AccessShareLock);
 
 	ScanKeyInit(&scankey[0],
diff --git a/src/backend/utils/adt/tid.c b/src/backend/utils/adt/tid.c
index 1b0df111717..de2ded46a1e 100644
--- a/src/backend/utils/adt/tid.c
+++ b/src/backend/utils/adt/tid.c
@@ -320,7 +320,7 @@ currtid_internal(Relation rel, ItemPointer tid)
 
 	ItemPointerCopy(tid, result);
 
-	snapshot = RegisterSnapshot(GetLatestSnapshot());
+	snapshot = (Snapshot) RegisterSnapshot(GetLatestSnapshot());
 	scan = table_beginscan_tid(rel, snapshot);
 	table_tuple_get_latest_tid(scan, result);
 	table_endscan(scan);
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index 4736755b298..4303acbc664 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -409,7 +409,7 @@ pg_current_snapshot(PG_FUNCTION_ARGS)
 	pg_snapshot *snap;
 	uint32		nxip,
 				i;
-	Snapshot	cur;
+	MVCCSnapshot cur;
 	FullTransactionId next_fxid = ReadNextFullTransactionId();
 
 	cur = GetActiveSnapshot();
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 01bb6a410cb..ea1a548d573 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -1286,7 +1286,7 @@ process_settings(Oid databaseid, Oid roleid)
 	relsetting = table_open(DbRoleSettingRelationId, AccessShareLock);
 
 	/* read all the settings under the same snapshot for efficiency */
-	snapshot = RegisterSnapshot(GetCatalogSnapshot(DbRoleSettingRelationId));
+	snapshot = RegisterCatalogSnapshot(GetCatalogSnapshot(DbRoleSettingRelationId));
 
 	/* Later settings are ignored if set earlier. */
 	ApplySetting(snapshot, databaseid, roleid, relsetting, PGC_S_DATABASE_USER);
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 0be1c2b0fff..d4e10a74c79 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -526,7 +526,7 @@ PortalDrop(Portal portal, bool isTopCommit)
 	if (portal->holdSnapshot)
 	{
 		if (portal->resowner)
-			UnregisterSnapshotFromOwner(portal->holdSnapshot,
+			UnregisterSnapshotFromOwner((Snapshot) portal->holdSnapshot,
 										portal->resowner);
 		portal->holdSnapshot = NULL;
 	}
@@ -709,7 +709,7 @@ PreCommit_Portals(bool isPrepare)
 			if (portal->holdSnapshot)
 			{
 				if (portal->resowner)
-					UnregisterSnapshotFromOwner(portal->holdSnapshot,
+					UnregisterSnapshotFromOwner((Snapshot) portal->holdSnapshot,
 												portal->resowner);
 				portal->holdSnapshot = NULL;
 			}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 76992eb094f..6ffe5ebc459 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -133,18 +133,18 @@
  * These SnapshotData structs are static to simplify memory allocation
  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
  */
-static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
-static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
-static SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
+static MVCCSnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
+static MVCCSnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
+static MVCCSnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
 SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
 SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};
 SnapshotData SnapshotToastData = {SNAPSHOT_TOAST};
 
 /* Pointers to valid snapshots */
-static Snapshot CurrentSnapshot = NULL;
-static Snapshot SecondarySnapshot = NULL;
-static Snapshot CatalogSnapshot = NULL;
-static Snapshot HistoricSnapshot = NULL;
+static MVCCSnapshot CurrentSnapshot = NULL;
+static MVCCSnapshot SecondarySnapshot = NULL;
+static MVCCSnapshot CatalogSnapshot = NULL;
+static HistoricMVCCSnapshot HistoricSnapshot = NULL;
 
 /*
  * These are updated by GetSnapshotData.  We initialize them this way
@@ -167,7 +167,7 @@ static HTAB *tuplecid_data = NULL;
  */
 typedef struct ActiveSnapshotElt
 {
-	Snapshot	as_snap;
+	MVCCSnapshot as_snap;
 	int			as_level;
 	struct ActiveSnapshotElt *as_next;
 } ActiveSnapshotElt;
@@ -192,7 +192,7 @@ bool		FirstSnapshotSet = false;
  * FirstSnapshotSet in combination with IsolationUsesXactSnapshot(), because
  * GUC may be reset before us, changing the value of IsolationUsesXactSnapshot.
  */
-static Snapshot FirstXactSnapshot = NULL;
+static MVCCSnapshot FirstXactSnapshot = NULL;
 
 /* Define pathname of exported-snapshot files */
 #define SNAPSHOT_EXPORT_DIR "pg_snapshots"
@@ -201,16 +201,16 @@ static Snapshot FirstXactSnapshot = NULL;
 typedef struct ExportedSnapshot
 {
 	char	   *snapfile;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 } ExportedSnapshot;
 
 /* Current xact's exported snapshots (a list of ExportedSnapshot structs) */
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
+static MVCCSnapshot CopyMVCCSnapshot(MVCCSnapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
+static void FreeMVCCSnapshot(MVCCSnapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -263,7 +263,7 @@ typedef struct SerializedSnapshotData
  * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be
  * used very long.
  */
-Snapshot
+MVCCSnapshot
 GetTransactionSnapshot(void)
 {
 	/*
@@ -304,8 +304,9 @@ GetTransactionSnapshot(void)
 				CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
 			else
 				CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+
 			/* Make a saved copy */
-			CurrentSnapshot = CopySnapshot(CurrentSnapshot);
+			CurrentSnapshot = CopyMVCCSnapshot(CurrentSnapshot);
 			FirstXactSnapshot = CurrentSnapshot;
 			/* Mark it as "registered" in FirstXactSnapshot */
 			FirstXactSnapshot->regd_count++;
@@ -334,7 +335,7 @@ GetTransactionSnapshot(void)
  *		Get a snapshot that is up-to-date as of the current instant,
  *		even if we are executing in transaction-snapshot mode.
  */
-Snapshot
+MVCCSnapshot
 GetLatestSnapshot(void)
 {
 	/*
@@ -376,7 +377,7 @@ GetCatalogSnapshot(Oid relid)
 	 * finishing decoding.
 	 */
 	if (HistoricSnapshotActive())
-		return HistoricSnapshot;
+		return (Snapshot) HistoricSnapshot;
 
 	return GetNonHistoricCatalogSnapshot(relid);
 }
@@ -422,7 +423,7 @@ GetNonHistoricCatalogSnapshot(Oid relid)
 		pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 	}
 
-	return CatalogSnapshot;
+	return (Snapshot) CatalogSnapshot;
 }
 
 /*
@@ -491,7 +492,7 @@ SnapshotSetCommandId(CommandId curcid)
  * in GetTransactionSnapshot.
  */
 static void
-SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
+SetTransactionSnapshot(MVCCSnapshot sourcesnap, VirtualTransactionId *sourcevxid,
 					   int sourcepid, PGPROC *sourceproc)
 {
 	/* Caller should have checked this already */
@@ -570,7 +571,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcevxid,
 											   sourcepid);
 		/* Make a saved copy */
-		CurrentSnapshot = CopySnapshot(CurrentSnapshot);
+		CurrentSnapshot = CopyMVCCSnapshot(CurrentSnapshot);
 		FirstXactSnapshot = CurrentSnapshot;
 		/* Mark it as "registered" in FirstXactSnapshot */
 		FirstXactSnapshot->regd_count++;
@@ -581,30 +582,27 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 }
 
 /*
- * CopySnapshot
+ * CopyMVCCSnapshot
  *		Copy the given snapshot.
  *
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
-CopySnapshot(Snapshot snapshot)
+static MVCCSnapshot
+CopyMVCCSnapshot(MVCCSnapshot snapshot)
 {
-	Snapshot	newsnap;
+	MVCCSnapshot newsnap;
 	Size		subxipoff;
 	Size		size;
 
-	Assert(snapshot != InvalidSnapshot);
-	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC || snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
-
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
+	size = subxipoff = sizeof(MVCCSnapshotData) +
 		snapshot->xcnt * sizeof(TransactionId);
 	if (snapshot->subxcnt > 0)
 		size += snapshot->subxcnt * sizeof(TransactionId);
 
-	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
-	memcpy(newsnap, snapshot, sizeof(SnapshotData));
+	newsnap = (MVCCSnapshot) MemoryContextAlloc(TopTransactionContext, size);
+	memcpy(newsnap, snapshot, sizeof(MVCCSnapshotData));
 
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
@@ -641,11 +639,11 @@ CopySnapshot(Snapshot snapshot)
 }
 
 /*
- * FreeSnapshot
+ * FreeMVCCSnapshot
  *		Free the memory associated with a snapshot.
  */
 static void
-FreeSnapshot(Snapshot snapshot)
+FreeMVCCSnapshot(MVCCSnapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
 	Assert(snapshot->active_count == 0);
@@ -663,7 +661,7 @@ FreeSnapshot(Snapshot snapshot)
  * with active refcount=1.  Otherwise, only increment the refcount.
  */
 void
-PushActiveSnapshot(Snapshot snapshot)
+PushActiveSnapshot(MVCCSnapshot snapshot)
 {
 	PushActiveSnapshotWithLevel(snapshot, GetCurrentTransactionNestLevel());
 }
@@ -677,13 +675,12 @@ PushActiveSnapshot(Snapshot snapshot)
  * must not be deeper than the current top of the snapshot stack.
  */
 void
-PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
+PushActiveSnapshotWithLevel(MVCCSnapshot snapshot, int snap_level)
 {
 	ActiveSnapshotElt *newactive;
 
 	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
 
-	Assert(snapshot != InvalidSnapshot);
 	Assert(ActiveSnapshot == NULL || snap_level >= ActiveSnapshot->as_level);
 
 	newactive = MemoryContextAlloc(TopTransactionContext, sizeof(ActiveSnapshotElt));
@@ -694,7 +691,7 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 	 */
 	if (snapshot == CurrentSnapshot || snapshot == SecondarySnapshot ||
 		!snapshot->copied)
-		newactive->as_snap = CopySnapshot(snapshot);
+		newactive->as_snap = CopyMVCCSnapshot(snapshot);
 	else
 		newactive->as_snap = snapshot;
 
@@ -715,9 +712,9 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
  * The new snapshot will be released when popped from the stack.
  */
 void
-PushCopiedSnapshot(Snapshot snapshot)
+PushCopiedSnapshot(MVCCSnapshot snapshot)
 {
-	PushActiveSnapshot(CopySnapshot(snapshot));
+	PushActiveSnapshot(CopyMVCCSnapshot(snapshot));
 }
 
 /*
@@ -770,7 +767,7 @@ PopActiveSnapshot(void)
 
 	if (ActiveSnapshot->as_snap->active_count == 0 &&
 		ActiveSnapshot->as_snap->regd_count == 0)
-		FreeSnapshot(ActiveSnapshot->as_snap);
+		FreeMVCCSnapshot(ActiveSnapshot->as_snap);
 
 	pfree(ActiveSnapshot);
 	ActiveSnapshot = newstack;
@@ -782,7 +779,7 @@ PopActiveSnapshot(void)
  * GetActiveSnapshot
  *		Return the topmost snapshot in the Active stack.
  */
-Snapshot
+MVCCSnapshot
 GetActiveSnapshot(void)
 {
 	Assert(ActiveSnapshot != NULL);
@@ -806,11 +803,11 @@ ActiveSnapshotSet(void)
  *
  * If InvalidSnapshot is passed, it is not registered.
  */
-Snapshot
-RegisterSnapshot(Snapshot snapshot)
+MVCCSnapshot
+RegisterSnapshot(MVCCSnapshot snapshot)
 {
-	if (snapshot == InvalidSnapshot)
-		return InvalidSnapshot;
+	if (snapshot == InvalidMVCCSnapshot)
+		return InvalidMVCCSnapshot;
 
 	return RegisterSnapshotOnOwner(snapshot, CurrentResourceOwner);
 }
@@ -819,28 +816,47 @@ RegisterSnapshot(Snapshot snapshot)
  * RegisterSnapshotOnOwner
  *		As above, but use the specified resource owner
  */
-Snapshot
-RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
+MVCCSnapshot
+RegisterSnapshotOnOwner(MVCCSnapshot snapshot, ResourceOwner owner)
 {
-	Snapshot	snap;
-
-	if (snapshot == InvalidSnapshot)
-		return InvalidSnapshot;
-
-	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC || snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
+	if (snapshot == InvalidMVCCSnapshot)
+		return InvalidMVCCSnapshot;
 
 	/* Static snapshot?  Create a persistent copy */
-	snap = snapshot->copied ? snapshot : CopySnapshot(snapshot);
+	snapshot = snapshot->copied ? snapshot : CopyMVCCSnapshot(snapshot);
 
 	/* and tell resowner.c about it */
 	ResourceOwnerEnlarge(owner);
-	snap->regd_count++;
-	ResourceOwnerRememberSnapshot(owner, snap);
+	snapshot->regd_count++;
+	ResourceOwnerRememberSnapshot(owner, (Snapshot) snapshot);
+
+	if (snapshot->regd_count == 1)
+		pairingheap_add(&RegisteredSnapshots, &snapshot->ph_node);
+
+	return snapshot;
+}
+
+/*
+ * RegisterCatalogSnapshot
+ *		Like RegisterSnapshot(), but also works for historic snapshots
+ */
+Snapshot
+RegisterCatalogSnapshot(Snapshot snapshot)
+{
+	if (snapshot->snapshot_type == SNAPSHOT_MVCC)
+		return (Snapshot) RegisterSnapshot(&snapshot->mvcc);
+	else if (snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
+	{
+		HistoricMVCCSnapshot historicsnap = &snapshot->historic_mvcc;
 
-	if (snap->regd_count == 1)
-		pairingheap_add(&RegisteredSnapshots, &snap->ph_node);
+		ResourceOwnerEnlarge(CurrentResourceOwner);
+		historicsnap->regd_count++;
+		ResourceOwnerRememberSnapshot(CurrentResourceOwner, (Snapshot) historicsnap);
 
-	return snap;
+		return (Snapshot) historicsnap;
+	}
+	else
+		elog(ERROR, "cannot register non-MVCC snapshot");
 }
 
 /*
@@ -876,18 +892,32 @@ UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
 static void
 UnregisterSnapshotNoOwner(Snapshot snapshot)
 {
-	Assert(snapshot->regd_count > 0);
-	Assert(!pairingheap_is_empty(&RegisteredSnapshots));
+	if (snapshot->snapshot_type == SNAPSHOT_MVCC)
+	{
+		MVCCSnapshot mvccsnap = &snapshot->mvcc;
+
+		Assert(mvccsnap->regd_count > 0);
+		Assert(!pairingheap_is_empty(&RegisteredSnapshots));
 
-	snapshot->regd_count--;
-	if (snapshot->regd_count == 0)
-		pairingheap_remove(&RegisteredSnapshots, &snapshot->ph_node);
+		mvccsnap->regd_count--;
+		if (mvccsnap->regd_count == 0)
+			pairingheap_remove(&RegisteredSnapshots, &mvccsnap->ph_node);
 
-	if (snapshot->regd_count == 0 && snapshot->active_count == 0)
+		if (mvccsnap->regd_count == 0 && mvccsnap->active_count == 0)
+		{
+			FreeMVCCSnapshot(mvccsnap);
+			SnapshotResetXmin();
+		}
+	}
+	else if (snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
 	{
-		FreeSnapshot(snapshot);
-		SnapshotResetXmin();
+		HistoricMVCCSnapshot historicsnap = &snapshot->historic_mvcc;
+
+		Assert(historicsnap->regd_count > 0);
+		historicsnap->regd_count--;
 	}
+	else
+		elog(ERROR, "registered snapshot has unexpected type");
 }
 
 /*
@@ -897,8 +927,8 @@ UnregisterSnapshotNoOwner(Snapshot snapshot)
 static int
 xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
 {
-	const SnapshotData *asnap = pairingheap_const_container(SnapshotData, ph_node, a);
-	const SnapshotData *bsnap = pairingheap_const_container(SnapshotData, ph_node, b);
+	const MVCCSnapshotData *asnap = pairingheap_const_container(MVCCSnapshotData, ph_node, a);
+	const MVCCSnapshotData *bsnap = pairingheap_const_container(MVCCSnapshotData, ph_node, b);
 
 	if (TransactionIdPrecedes(asnap->xmin, bsnap->xmin))
 		return 1;
@@ -924,7 +954,7 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
 static void
 SnapshotResetXmin(void)
 {
-	Snapshot	minSnapshot;
+	MVCCSnapshot minSnapshot;
 
 	if (ActiveSnapshot != NULL)
 		return;
@@ -935,7 +965,7 @@ SnapshotResetXmin(void)
 		return;
 	}
 
-	minSnapshot = pairingheap_container(SnapshotData, ph_node,
+	minSnapshot = pairingheap_container(MVCCSnapshotData, ph_node,
 										pairingheap_first(&RegisteredSnapshots));
 
 	if (TransactionIdPrecedes(MyProc->xmin, minSnapshot->xmin))
@@ -985,7 +1015,7 @@ AtSubAbort_Snapshot(int level)
 
 		if (ActiveSnapshot->as_snap->active_count == 0 &&
 			ActiveSnapshot->as_snap->regd_count == 0)
-			FreeSnapshot(ActiveSnapshot->as_snap);
+			FreeMVCCSnapshot(ActiveSnapshot->as_snap);
 
 		/* and free the stack element */
 		pfree(ActiveSnapshot);
@@ -1007,7 +1037,7 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 * In transaction-snapshot mode we must release our privately-managed
 	 * reference to the transaction snapshot.  We must remove it from
 	 * RegisteredSnapshots to keep the check below happy.  But we don't bother
-	 * to do FreeSnapshot, for two reasons: the memory will go away with
+	 * to do FreeMVCCSnapshot, for two reasons: the memory will go away with
 	 * TopTransactionContext anyway, and if someone has left the snapshot
 	 * stacked as active, we don't want the code below to be chasing through a
 	 * dangling pointer.
@@ -1100,7 +1130,7 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
  *		snapshot.
  */
 char *
-ExportSnapshot(Snapshot snapshot)
+ExportSnapshot(MVCCSnapshot snapshot)
 {
 	TransactionId topXid;
 	TransactionId *children;
@@ -1164,7 +1194,7 @@ ExportSnapshot(Snapshot snapshot)
 	 * ensure that the snapshot's xmin is honored for the rest of the
 	 * transaction.
 	 */
-	snapshot = CopySnapshot(snapshot);
+	snapshot = CopyMVCCSnapshot(snapshot);
 
 	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
 	esnap = (ExportedSnapshot *) palloc(sizeof(ExportedSnapshot));
@@ -1385,7 +1415,7 @@ ImportSnapshot(const char *idstr)
 	Oid			src_dbid;
 	int			src_isolevel;
 	bool		src_readonly;
-	SnapshotData snapshot;
+	MVCCSnapshotData snapshot;
 
 	/*
 	 * Must be at top level of a fresh transaction.  Note in particular that
@@ -1654,7 +1684,7 @@ HaveRegisteredOrActiveSnapshot(void)
  * Needed for logical decoding.
  */
 void
-SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
+SetupHistoricSnapshot(HistoricMVCCSnapshot historic_snapshot, HTAB *tuplecids)
 {
 	Assert(historic_snapshot != NULL);
 
@@ -1697,11 +1727,10 @@ HistoricSnapshotGetTupleCids(void)
  * SerializedSnapshotData.
  */
 Size
-EstimateSnapshotSpace(Snapshot snapshot)
+EstimateSnapshotSpace(MVCCSnapshot snapshot)
 {
 	Size		size;
 
-	Assert(snapshot != InvalidSnapshot);
 	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
@@ -1721,7 +1750,7 @@ EstimateSnapshotSpace(Snapshot snapshot)
  *		memory location at start_address.
  */
 void
-SerializeSnapshot(Snapshot snapshot, char *start_address)
+SerializeSnapshot(MVCCSnapshot snapshot, char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 
@@ -1777,12 +1806,12 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-Snapshot
+MVCCSnapshot
 RestoreSnapshot(char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 	Size		size;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 	TransactionId *serialized_xids;
 
 	memcpy(&serialized_snapshot, start_address,
@@ -1791,12 +1820,12 @@ RestoreSnapshot(char *start_address)
 		(start_address + sizeof(SerializedSnapshotData));
 
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = sizeof(SnapshotData)
+	size = sizeof(MVCCSnapshotData)
 		+ serialized_snapshot.xcnt * sizeof(TransactionId)
 		+ serialized_snapshot.subxcnt * sizeof(TransactionId);
 
 	/* Copy all required fields */
-	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	snapshot = (MVCCSnapshot) MemoryContextAlloc(TopTransactionContext, size);
 	snapshot->snapshot_type = SNAPSHOT_MVCC;
 	snapshot->xmin = serialized_snapshot.xmin;
 	snapshot->xmax = serialized_snapshot.xmax;
@@ -1841,7 +1870,7 @@ RestoreSnapshot(char *start_address)
  * the declaration for PGPROC.
  */
 void
-RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
+RestoreTransactionSnapshot(MVCCSnapshot snapshot, void *source_pgproc)
 {
 	SetTransactionSnapshot(snapshot, NULL, InvalidPid, source_pgproc);
 }
@@ -1857,7 +1886,7 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
  * XID could not be ours anyway.
  */
 bool
-XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
+XidInMVCCSnapshot(TransactionId xid, MVCCSnapshot snapshot)
 {
 	/*
 	 * Make a quick range check to eliminate most XIDs without looking at the
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1be8739573f..a39f5c90538 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -168,9 +168,9 @@ extern void index_endscan(IndexScanDesc scan);
 extern void index_markpos(IndexScanDesc scan);
 extern void index_restrpos(IndexScanDesc scan);
 extern Size index_parallelscan_estimate(Relation indexRelation,
-										int nkeys, int norderbys, Snapshot snapshot);
+										int nkeys, int norderbys, MVCCSnapshot snapshot);
 extern void index_parallelscan_initialize(Relation heapRelation,
-										  Relation indexRelation, Snapshot snapshot,
+										  Relation indexRelation, MVCCSnapshot snapshot,
 										  ParallelIndexScanDesc target);
 extern void index_parallelrescan(IndexScanDesc scan);
 extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7d06dad83fc..650c77eaea2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -423,7 +423,7 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup,
  */
 struct HTAB;
 extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
-										  Snapshot snapshot,
+										  HistoricMVCCSnapshot snapshot,
 										  HeapTuple htup,
 										  Buffer buffer,
 										  CommandId *cmin, CommandId *cmax);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index dc6e0184284..1c098f75e5b 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -34,7 +34,7 @@ typedef struct TableScanDescData
 {
 	/* scan parameters */
 	Relation	rs_rd;			/* heap relation descriptor */
-	struct SnapshotData *rs_snapshot;	/* snapshot to see */
+	union SnapshotData *rs_snapshot;	/* snapshot to see */
 	int			rs_nkeys;		/* number of scan keys */
 	struct ScanKeyData *rs_key; /* array of scan key descriptors */
 
@@ -133,7 +133,7 @@ typedef struct IndexScanDescData
 	/* scan parameters */
 	Relation	heapRelation;	/* heap relation descriptor, or NULL */
 	Relation	indexRelation;	/* index relation descriptor */
-	struct SnapshotData *xs_snapshot;	/* snapshot to see */
+	union SnapshotData *xs_snapshot;	/* snapshot to see */
 	int			numberOfKeys;	/* number of index qualifier conditions */
 	int			numberOfOrderBys;	/* number of ordering operators */
 	struct ScanKeyData *keyData;	/* array of index qualifier descriptors */
@@ -201,7 +201,7 @@ typedef struct SysScanDescData
 	Relation	irel;			/* NULL if doing heap scan */
 	struct TableScanDescData *scan; /* only valid in storage-scan case */
 	struct IndexScanDescData *iscan;	/* only valid in index-scan case */
-	struct SnapshotData *snapshot;	/* snapshot to unregister at end of scan */
+	union SnapshotData *snapshot;	/* snapshot to unregister at end of scan */
 	struct TupleTableSlot *slot;
 }			SysScanDescData;
 
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..2ce0b23ba49 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -36,8 +36,8 @@ typedef struct QueryDesc
 	CmdType		operation;		/* CMD_SELECT, CMD_UPDATE, etc. */
 	PlannedStmt *plannedstmt;	/* planner's output (could be utility, too) */
 	const char *sourceText;		/* source text of the query */
-	Snapshot	snapshot;		/* snapshot to use for query */
-	Snapshot	crosscheck_snapshot;	/* crosscheck for RI update/delete */
+	MVCCSnapshot snapshot;		/* snapshot to use for query */
+	MVCCSnapshot crosscheck_snapshot;	/* crosscheck for RI update/delete */
 	DestReceiver *dest;			/* the destination for tuple output */
 	ParamListInfo params;		/* param values being passed in */
 	QueryEnvironment *queryEnv; /* query environment passed in */
@@ -58,8 +58,8 @@ typedef struct QueryDesc
 /* in pquery.c */
 extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
 								  const char *sourceText,
-								  Snapshot snapshot,
-								  Snapshot crosscheck_snapshot,
+								  MVCCSnapshot snapshot,
+								  MVCCSnapshot crosscheck_snapshot,
 								  DestReceiver *dest,
 								  ParamListInfo params,
 								  QueryEnvironment *queryEnv,
diff --git a/src/include/executor/spi.h b/src/include/executor/spi.h
index d064d1a9b76..eb804f7c3e4 100644
--- a/src/include/executor/spi.h
+++ b/src/include/executor/spi.h
@@ -123,8 +123,8 @@ extern int	SPI_execp(SPIPlanPtr plan, Datum *Values, const char *Nulls,
 					  long tcount);
 extern int	SPI_execute_snapshot(SPIPlanPtr plan,
 								 Datum *Values, const char *Nulls,
-								 Snapshot snapshot,
-								 Snapshot crosscheck_snapshot,
+								 MVCCSnapshot snapshot,
+								 MVCCSnapshot crosscheck_snapshot,
 								 bool read_only, bool fire_triggers, long tcount);
 extern int	SPI_execute_with_args(const char *src,
 								  int nargs, Oid *argtypes,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 29127416076..5ff7f394973 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -629,8 +629,8 @@ typedef struct EState
 
 	/* Basic state for all query types: */
 	ScanDirection es_direction; /* current scan direction */
-	Snapshot	es_snapshot;	/* time qual to use */
-	Snapshot	es_crosscheck_snapshot; /* crosscheck time qual for RI */
+	MVCCSnapshot es_snapshot;	/* time qual to use */
+	MVCCSnapshot es_crosscheck_snapshot;	/* crosscheck time qual for RI */
 	List	   *es_range_table; /* List of RangeTblEntry */
 	Index		es_range_table_size;	/* size of the range table arrays */
 	Relation   *es_relations;	/* Array of per-range-table-entry Relation
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index a669658b3f1..1dd4e2f7527 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -127,7 +127,7 @@ typedef struct ReorderBufferChange
 		}			msg;
 
 		/* New snapshot, set when action == *_INTERNAL_SNAPSHOT */
-		Snapshot	snapshot;
+		HistoricMVCCSnapshot snapshot;
 
 		/*
 		 * New command id for existing snapshot in a catalog changing tx. Set
@@ -330,7 +330,7 @@ typedef struct ReorderBufferTXN
 	 * transaction modifies the catalog, or another catalog-modifying
 	 * transaction commits.
 	 */
-	Snapshot	base_snapshot;
+	HistoricMVCCSnapshot base_snapshot;
 	XLogRecPtr	base_snapshot_lsn;
 	dlist_node	base_snapshot_node; /* link in txns_by_base_snapshot_lsn */
 
@@ -338,7 +338,7 @@ typedef struct ReorderBufferTXN
 	 * Snapshot/CID from the previous streaming run. Only valid for already
 	 * streamed transactions (NULL/InvalidCommandId otherwise).
 	 */
-	Snapshot	snapshot_now;
+	HistoricMVCCSnapshot snapshot_now;
 	CommandId	command_id;
 
 	/*
@@ -678,7 +678,7 @@ extern void ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid,
 									 XLogRecPtr lsn, ReorderBufferChange *change,
 									 bool toast_insert);
 extern void ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
-									  Snapshot snap, XLogRecPtr lsn,
+									  HistoricMVCCSnapshot snap, XLogRecPtr lsn,
 									  bool transactional, const char *prefix,
 									  Size message_size, const char *message);
 extern void ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
@@ -702,9 +702,9 @@ extern void ReorderBufferForget(ReorderBuffer *rb, TransactionId xid, XLogRecPtr
 extern void ReorderBufferInvalidate(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn);
 
 extern void ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
-										 XLogRecPtr lsn, Snapshot snap);
+										 XLogRecPtr lsn, HistoricMVCCSnapshot snap);
 extern void ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
-									 XLogRecPtr lsn, Snapshot snap);
+									 XLogRecPtr lsn, HistoricMVCCSnapshot snap);
 extern void ReorderBufferAddNewCommandId(ReorderBuffer *rb, TransactionId xid,
 										 XLogRecPtr lsn, CommandId cid);
 extern void ReorderBufferAddNewTupleCids(ReorderBuffer *rb, TransactionId xid,
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..5930ffb55a8 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -70,15 +70,15 @@ extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *reorder,
 										  XLogRecPtr two_phase_at);
 extern void FreeSnapshotBuilder(SnapBuild *builder);
 
-extern void SnapBuildSnapDecRefcount(Snapshot snap);
+extern void SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap);
 
-extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern MVCCSnapshot SnapBuildInitialSnapshot(SnapBuild *builder);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
 
 extern SnapBuildState SnapBuildCurrentState(SnapBuild *builder);
-extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder);
+extern HistoricMVCCSnapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr);
 extern XLogRecPtr SnapBuildGetTwoPhaseAt(SnapBuild *builder);
diff --git a/src/include/replication/snapbuild_internal.h b/src/include/replication/snapbuild_internal.h
index 081b01b890a..ef00d125155 100644
--- a/src/include/replication/snapbuild_internal.h
+++ b/src/include/replication/snapbuild_internal.h
@@ -74,7 +74,7 @@ struct SnapBuild
 	/*
 	 * Snapshot that's valid to see the catalog state seen at this moment.
 	 */
-	Snapshot	snapshot;
+	HistoricMVCCSnapshot snapshot;
 
 	/*
 	 * LSN of the last location we are sure a snapshot has been serialized to.
diff --git a/src/include/storage/large_object.h b/src/include/storage/large_object.h
index 6fecf442446..5cb3ebb6a9f 100644
--- a/src/include/storage/large_object.h
+++ b/src/include/storage/large_object.h
@@ -39,7 +39,7 @@
 typedef struct LargeObjectDesc
 {
 	Oid			id;				/* LO's identifier */
-	Snapshot	snapshot;		/* snapshot to use */
+	MVCCSnapshot snapshot;		/* snapshot to use */
 	SubTransactionId subid;		/* owning subtransaction ID */
 	uint64		offset;			/* current seek pointer */
 	int			flags;			/* see flag bits below */
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 267d5d90e94..6a78dfeac96 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -47,8 +47,8 @@ extern void CheckPointPredicate(void);
 extern bool PageIsPredicateLocked(Relation relation, BlockNumber blkno);
 
 /* predicate lock maintenance */
-extern Snapshot GetSerializableTransactionSnapshot(Snapshot snapshot);
-extern void SetSerializableTransactionSnapshot(Snapshot snapshot,
+extern MVCCSnapshot GetSerializableTransactionSnapshot(MVCCSnapshot snapshot);
+extern void SetSerializableTransactionSnapshot(MVCCSnapshot snapshot,
 											   VirtualTransactionId *sourcevxid,
 											   int sourcepid);
 extern void RegisterPredicateLockingXid(TransactionId xid);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ef0b733ebe8..7f5727c2586 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -44,7 +44,7 @@ extern void KnownAssignedTransactionIdsIdleMaintenance(void);
 extern int	GetMaxSnapshotXidCount(void);
 extern int	GetMaxSnapshotSubxidCount(void);
 
-extern Snapshot GetSnapshotData(Snapshot snapshot);
+extern MVCCSnapshot GetSnapshotData(MVCCSnapshot snapshot);
 
 extern bool ProcArrayInstallImportedXmin(TransactionId xmin,
 										 VirtualTransactionId *sourcevxid);
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index fa3cc5f2dfc..1e85e09cec9 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -30,7 +30,7 @@ extern List *FetchPortalTargetList(Portal portal);
 extern List *FetchStatementTargetList(Node *stmt);
 
 extern void PortalStart(Portal portal, ParamListInfo params,
-						int eflags, Snapshot snapshot);
+						int eflags, MVCCSnapshot snapshot);
 
 extern void PortalSetResultFormat(Portal portal, int nFormats,
 								  int16 *formats);
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 0b62143af8b..dd27de9139d 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -166,7 +166,7 @@ typedef struct PortalData
 	 * This ensures that TOAST references in query results can be detoasted,
 	 * and helps to reduce thrashing of the process's exposed xmin.
 	 */
-	Snapshot	portalSnapshot; /* active snapshot, or NULL if none */
+	MVCCSnapshot portalSnapshot;	/* active snapshot, or NULL if none */
 
 	/*
 	 * Where we store tuples for a held cursor or a PORTAL_ONE_RETURNING,
@@ -184,7 +184,7 @@ typedef struct PortalData
 	 * belonging to them.  In the case of a held cursor, we avoid needing to
 	 * keep such a snapshot by forcibly detoasting the data.
 	 */
-	Snapshot	holdSnapshot;	/* registered snapshot, or NULL if none */
+	MVCCSnapshot holdSnapshot;	/* registered snapshot, or NULL if none */
 
 	/*
 	 * atStart, atEnd and portalPos indicate the current cursor position.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..9ee9acf3d50 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -49,15 +49,15 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
  */
 #define InitNonVacuumableSnapshot(snapshotdata, vistestp)  \
 	((snapshotdata).snapshot_type = SNAPSHOT_NON_VACUUMABLE, \
-	 (snapshotdata).vistest = (vistestp))
+	 (snapshotdata).nonvacuumable.vistest = (vistestp))
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
 	((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
 	 (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
 
-extern Snapshot GetTransactionSnapshot(void);
-extern Snapshot GetLatestSnapshot(void);
+extern MVCCSnapshot GetTransactionSnapshot(void);
+extern MVCCSnapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
 
 extern Snapshot GetCatalogSnapshot(Oid relid);
@@ -65,17 +65,18 @@ extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
 extern void InvalidateCatalogSnapshot(void);
 extern void InvalidateCatalogSnapshotConditionally(void);
 
-extern void PushActiveSnapshot(Snapshot snapshot);
-extern void PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level);
-extern void PushCopiedSnapshot(Snapshot snapshot);
+extern void PushActiveSnapshot(MVCCSnapshot snapshot);
+extern void PushActiveSnapshotWithLevel(MVCCSnapshot snapshot, int snap_level);
+extern void PushCopiedSnapshot(MVCCSnapshot snapshot);
 extern void UpdateActiveSnapshotCommandId(void);
 extern void PopActiveSnapshot(void);
-extern Snapshot GetActiveSnapshot(void);
+extern MVCCSnapshot GetActiveSnapshot(void);
 extern bool ActiveSnapshotSet(void);
 
-extern Snapshot RegisterSnapshot(Snapshot snapshot);
+extern MVCCSnapshot RegisterSnapshot(MVCCSnapshot snapshot);
+extern Snapshot RegisterCatalogSnapshot(Snapshot snapshot);
 extern void UnregisterSnapshot(Snapshot snapshot);
-extern Snapshot RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner);
+extern MVCCSnapshot RegisterSnapshotOnOwner(MVCCSnapshot snapshot, ResourceOwner owner);
 extern void UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner);
 
 extern void AtSubCommit_Snapshot(int level);
@@ -89,7 +90,7 @@ extern void WaitForOlderSnapshots(TransactionId limitXmin, bool progress);
 extern bool ThereAreNoPriorRegisteredSnapshots(void);
 extern bool HaveRegisteredOrActiveSnapshot(void);
 
-extern char *ExportSnapshot(Snapshot snapshot);
+extern char *ExportSnapshot(MVCCSnapshot snapshot);
 
 /*
  * These live in procarray.c because they're intimately linked to the
@@ -105,18 +106,18 @@ extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 /*
  * Utility functions for implementing visibility routines in table AMs.
  */
-extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+extern bool XidInMVCCSnapshot(TransactionId xid, MVCCSnapshot snapshot);
 
 /* Support for catalog timetravel for logical decoding */
 struct HTAB;
 extern struct HTAB *HistoricSnapshotGetTupleCids(void);
-extern void SetupHistoricSnapshot(Snapshot historic_snapshot, struct HTAB *tuplecids);
+extern void SetupHistoricSnapshot(HistoricMVCCSnapshot historic_snapshot, struct HTAB *tuplecids);
 extern void TeardownHistoricSnapshot(bool is_error);
 extern bool HistoricSnapshotActive(void);
 
-extern Size EstimateSnapshotSpace(Snapshot snapshot);
-extern void SerializeSnapshot(Snapshot snapshot, char *start_address);
-extern Snapshot RestoreSnapshot(char *start_address);
-extern void RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc);
+extern Size EstimateSnapshotSpace(MVCCSnapshot snapshot);
+extern void SerializeSnapshot(MVCCSnapshot snapshot, char *start_address);
+extern MVCCSnapshot RestoreSnapshot(char *start_address);
+extern void RestoreTransactionSnapshot(MVCCSnapshot snapshot, void *source_pgproc);
 
 #endif							/* SNAPMGR_H */
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 0e546ec1497..19af828ea45 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -17,8 +17,8 @@
 
 
 /*
- * The different snapshot types.  We use SnapshotData structures to represent
- * both "regular" (MVCC) snapshots and "special" snapshots that have non-MVCC
+ * The different snapshot types.  We use SnapshotData union to represent both
+ * "regular" (MVCC) snapshots and "special" snapshots that have non-MVCC
  * semantics.  The specific semantics of a snapshot are encoded by its type.
  *
  * The behaviour of each type of snapshot should be documented alongside its
@@ -27,6 +27,9 @@
  * The reason the snapshot type rather than a callback as it used to be is
  * that that allows to use the same snapshot for different table AMs without
  * having one callback per AM.
+ *
+ * The executor uses MVCC snapshots, and hence use MVCCSnapshot directly.  The
+ * table AM APIs also support the special snapshots.
  */
 typedef enum SnapshotType
 {
@@ -100,7 +103,9 @@ typedef enum SnapshotType
 	/*
 	 * A tuple is visible iff it follows the rules of SNAPSHOT_MVCC, but
 	 * supports being called in timetravel context (for decoding catalog
-	 * contents in the context of logical decoding).
+	 * contents in the context of logical decoding).  A historic MVCC snapshot
+	 * should only be used on catalog tables, as we only track XIDs that
+	 * modify catalogs during logical decoding.
 	 */
 	SNAPSHOT_HISTORIC_MVCC,
 
@@ -114,37 +119,18 @@ typedef enum SnapshotType
 	SNAPSHOT_NON_VACUUMABLE,
 } SnapshotType;
 
-typedef struct SnapshotData *Snapshot;
-
-#define InvalidSnapshot		((Snapshot) NULL)
-
 /*
- * Struct representing all kind of possible snapshots.
+ * Struct representing a normal MVCC snapshot.
  *
- * There are several different kinds of snapshots:
- * * Normal MVCC snapshots
- * * MVCC snapshots taken during recovery (in Hot-Standby mode)
- * * Historic MVCC snapshots used during logical decoding
- * * snapshots passed to HeapTupleSatisfiesDirty()
- * * snapshots passed to HeapTupleSatisfiesNonVacuumable()
- * * snapshots used for SatisfiesAny, Toast, Self where no members are
- *	 accessed.
- *
- * TODO: It's probably a good idea to split this struct using a NodeTag
- * similar to how parser and executor nodes are handled, with one type for
- * each different kind of snapshot to avoid overloading the meaning of
- * individual fields.
+ * MVCC snapshots come in two variants: those taken during recovery in hot
+ * standby mode, and "normal" MVCC snapshots.  They are distinguished by
+ * takenDuringRecovery.
  */
-typedef struct SnapshotData
+typedef struct MVCCSnapshotData
 {
-	SnapshotType snapshot_type; /* type of snapshot */
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
 
 	/*
-	 * The remaining fields are used only for MVCC snapshots, and are normally
-	 * just zeroes in special snapshots.  (But xmin and xmax are used
-	 * specially by HeapTupleSatisfiesDirty, and xmin is used specially by
-	 * HeapTupleSatisfiesNonVacuumable.)
-	 *
 	 * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
 	 * the effects of all older XIDs except those listed in the snapshot. xmin
 	 * is stored as an optimization to avoid needing to search the XID arrays
@@ -154,10 +140,8 @@ typedef struct SnapshotData
 	TransactionId xmax;			/* all XID >= xmax are invisible to me */
 
 	/*
-	 * For normal MVCC snapshot this contains the all xact IDs that are in
-	 * progress, unless the snapshot was taken during recovery in which case
-	 * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
-	 * it contains *committed* transactions between xmin and xmax.
+	 * xip contains the all xact IDs that are in progress, unless the snapshot
+	 * was taken during recovery in which case it's empty.
 	 *
 	 * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
 	 */
@@ -165,10 +149,8 @@ typedef struct SnapshotData
 	uint32		xcnt;			/* # of xact ids in xip[] */
 
 	/*
-	 * For non-historic MVCC snapshots, this contains subxact IDs that are in
-	 * progress (and other transactions that are in progress if taken during
-	 * recovery). For historic snapshot it contains *all* xids assigned to the
-	 * replayed transaction, including the toplevel xid.
+	 * This contains subxact IDs that are in progress (and other transactions
+	 * that are in progress if taken during recovery).
 	 *
 	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
 	 * out any that are >= xmax
@@ -182,18 +164,6 @@ typedef struct SnapshotData
 
 	CommandId	curcid;			/* in my xact, CID < curcid are visible */
 
-	/*
-	 * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
-	 * snapshots.
-	 */
-	uint32		speculativeToken;
-
-	/*
-	 * For SNAPSHOT_NON_VACUUMABLE (and hopefully more in the future) this is
-	 * used to determine whether row could be vacuumed.
-	 */
-	struct GlobalVisState *vistest;
-
 	/*
 	 * Book-keeping information, used by the snapshot manager
 	 */
@@ -207,6 +177,101 @@ typedef struct SnapshotData
 	 * transactions completed since the last GetSnapshotData().
 	 */
 	uint64		snapXactCompletionCount;
+} MVCCSnapshotData;
+
+typedef struct MVCCSnapshotData *MVCCSnapshot;
+
+#define InvalidMVCCSnapshot ((MVCCSnapshot) NULL)
+
+/*
+ * Struct representing a "historic" MVCC snapshot during logical decoding.
+ * These are constructed by src/replication/logical/snapbuild.c.
+ */
+typedef struct HistoricMVCCSnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
+
+	/*
+	 * xmin and xmax like in a normal MVCC snapshot.
+	 */
+	TransactionId xmin;			/* all XID < xmin are visible to me */
+	TransactionId xmax;			/* all XID >= xmax are invisible to me */
+
+	/*
+	 * committed_xids contains *committed* transactions between xmin and xmax.
+	 * (This is the inverse of 'xip' in normal MVCC snapshots, which contains
+	 * all non-committed transactions.)  The array is sorted by XID to allow
+	 * binary search.
+	 *
+	 * note: all ids in committed_xids[] satisfy xmin <= committed_xids[i] <
+	 * xmax
+	 */
+	TransactionId *committed_xids;
+	uint32		xcnt;			/* # of xact ids in committed_xids[] */
+
+	/*
+	 * subxip contains *all* xids assigned to the replayed transaction,
+	 * including the toplevel xid. (This is different from the subxip in a
+	 * normal MVCC snapshot, where it doesn't include the top-level xid. Also,
+	 * there's no 'suboverflowed')
+	 *
+	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
+	 * out any that are >= xmax. FIXME: is that true for historic snapshots?
+	 */
+	TransactionId *subxip;
+	int32		subxcnt;		/* # of xact ids in subxip[] */
+
+	CommandId	curcid;			/* in my xact, CID < curcid are visible */
+
+	bool		copied;			/* false if it's a static snapshot */
+
+	uint32		refcount;		/* refcount managed by snapbuild.c  */
+	uint32		regd_count;		/* refcount registered with resource owners */
+
+} HistoricMVCCSnapshotData;
+
+typedef struct HistoricMVCCSnapshotData *HistoricMVCCSnapshot;
+
+/*
+ * Struct representing a special "snapshot" which sees all tuples as visible
+ * if they are visible to anyone, i.e. if they are not vacuumable.
+ * i.e. SNAPSHOT_NON_VACUUMABLE.
+ */
+typedef struct NonVacuumableSnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
+
+	/* This is used to determine whether row could be vacuumed. */
+	struct GlobalVisState *vistest;
+} NonVacuumableSnapshotData;
+
+/*
+ * Return values to the caller of HeapTupleSatisfyDirty.
+ */
+typedef struct DirtySnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
+
+	TransactionId xmin;
+	TransactionId xmax;
+	uint32		speculativeToken;
+} DirtySnapshotData;
+
+/*
+ * Generic union representing all kind of possible snapshots.  Some have
+ * type-specific structs.
+ */
+typedef union SnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot */
+	struct MVCCSnapshotData mvcc;
+	struct DirtySnapshotData dirty;
+	struct HistoricMVCCSnapshotData historic_mvcc;
+	struct NonVacuumableSnapshotData nonvacuumable;
 } SnapshotData;
 
+typedef union SnapshotData *Snapshot;
+
+#define InvalidSnapshot		((Snapshot) NULL)
+
 #endif							/* SNAPSHOT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f84..a28e6817270 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -617,6 +617,7 @@ DictThesaurus
 DimensionInfo
 DirectoryMethodData
 DirectoryMethodFile
+DirtySnapshotData
 DisableTimeoutParams
 DiscardMode
 DiscardStmt
@@ -1159,6 +1160,7 @@ HeapTupleFreeze
 HeapTupleHeader
 HeapTupleHeaderData
 HeapTupleTableSlot
+HistoricMVCCSnapshotData
 HistControl
 HotStandbyState
 I32
@@ -1605,6 +1607,7 @@ MINIDUMPWRITEDUMP
 MINIDUMP_TYPE
 MJEvalResult
 MTTargetRelLookup
+MVCCSnapshotData
 MVDependencies
 MVDependency
 MVNDistinct
@@ -1703,6 +1706,7 @@ NextValueExpr
 Node
 NodeTag
 NonEmptyRange
+NonVacuumableSnapshotData
 Notification
 NotificationList
 NotifyStmt
-- 
2.39.5

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Heikki Linnakangas (#3)

Re: A few patches to clarify snapshot management

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

On 16/12/2024 23:56, Nathan Bossart wrote:

On Mon, Dec 16, 2024 at 12:06:33PM +0200, Heikki Linnakangas wrote:

While working on the CSN snapshot patch, I got sidetracked looking closer
into the snapshot tracking in snapmgr.c. Attached are a few patches to
clarify some things.

I haven't yet looked closely at what you are proposing, but big +1 from me
for the general idea. I recently found myself wishing for a lot more
commentary about this stuff [0].

[0] /messages/by-id/Z0dB1ld2iPcS6nC9@nathan

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

Greetings,

Andres Freund

Heikki Linnakangas

hlinnaka@iki.fi

about 1 year ago

In reply to: Andres Freund (#5)

Re: A few patches to clarify snapshot management

On 07/01/2025 00:00, Andres Freund wrote:

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while
doing logical decoding anyway, because historical snapshots only track
XIDs that modify catalogs. regclassout and enumout do work because they
use the catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

--
Heikki Linnakangas
Neon (https://neon.tech)

Heikki Linnakangas

hlinnaka@iki.fi

10 months ago

In reply to: Heikki Linnakangas (#4)

2 attachment(s)

Re: A few patches to clarify snapshot management

On 06/01/2025 23:30, Heikki Linnakangas wrote:

On 20/12/2024 19:31, Heikki Linnakangas wrote:

/*
* Struct representing all kind of possible snapshots.
*
* There are several different kinds of snapshots:
* * Normal MVCC snapshots
* * MVCC snapshots taken during recovery (in Hot-Standby mode)
* * Historic MVCC snapshots used during logical decoding
* * snapshots passed to HeapTupleSatisfiesDirty()
* * snapshots passed to HeapTupleSatisfiesNonVacuumable()
* * snapshots used for SatisfiesAny, Toast, Self where no members are
* accessed.
*
* TODO: It's probably a good idea to split this struct using a NodeTag
* similar to how parser and executor nodes are handled, with one
type for
* each different kind of snapshot to avoid overloading the meaning of
* individual fields.
*/
typedef struct SnapshotData

I'm thinking of implementing that TODO, splitting SnapshotData into
separate structs like MVCCSnapshotData, SnapshotDirtyData, etc. It
seems to me most places can assume that you're dealing with MVCC
snapshots, and if we had separate types for them, could be using
MVCCSnapshot instead of the generic Snapshot. Only the table and index
AM functions need to deal with non-MVCC snapshots.

Here's a draft of that. Going through this exercise clarified a few
things to me that I didn't realize before:

- The executor only deals with MVCC snapshots. Special snapshots are
only for the lower-level AM interfaces.
- Only MVCC snapshots can be pushed to the active stack
- Only MVCC or historic MVCC snapshots can be registered with a resource
owner

I committed the patches adding comments on Tuesday. Here's an updated
version of the patch to split SnapshotData into different structs.

The second, new patch simplifies the historic snapshot reference
counting during logical decoding. It's in principle independent from the
first patch, but it was hard to see how the opportunity before splitting
the structs.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Split-SnapshotData-into-separate-structs-for-each.patchtext/x-patch; charset=UTF-8; name=v2-0001-Split-SnapshotData-into-separate-structs-for-each.patchDownload

From 987036f64f95862312cc141fa797bda79c33106e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 20 Dec 2024 00:36:33 +0200
Subject: [PATCH v2 1/2] Split SnapshotData into separate structs for each kind
 of snapshot

The SnapshotData fields were repurposed for different uses depending
the kind of snapshot. Split it into separate structs for different
kinds of snapshots, so that it is more clear which fields are used
with which snapshot kind, and the fields can have more descriptive
names.
---
 contrib/amcheck/verify_heapam.c               |   2 +-
 contrib/amcheck/verify_nbtree.c               |   2 +-
 src/backend/access/heap/heapam.c              |   3 +-
 src/backend/access/heap/heapam_handler.c      |   6 +-
 src/backend/access/heap/heapam_visibility.c   |  24 +--
 src/backend/access/index/indexam.c            |  11 +-
 src/backend/access/nbtree/nbtinsert.c         |   4 +-
 src/backend/access/spgist/spgvacuum.c         |   2 +-
 src/backend/access/table/tableam.c            |   8 +-
 src/backend/access/transam/parallel.c         |  14 +-
 src/backend/catalog/pg_inherits.c             |   2 +-
 src/backend/commands/async.c                  |   4 +-
 src/backend/commands/indexcmds.c              |   4 +-
 src/backend/commands/tablecmds.c              |   2 +-
 src/backend/executor/execIndexing.c           |   4 +-
 src/backend/executor/execReplication.c        |   8 +-
 src/backend/partitioning/partdesc.c           |   2 +-
 src/backend/replication/logical/decode.c      |   2 +-
 src/backend/replication/logical/origin.c      |   4 +-
 .../replication/logical/reorderbuffer.c       | 114 +++++-----
 src/backend/replication/logical/snapbuild.c   | 114 +++++-----
 src/backend/replication/walsender.c           |   2 +-
 src/backend/storage/ipc/procarray.c           |   6 +-
 src/backend/storage/lmgr/predicate.c          |  32 +--
 src/backend/utils/adt/xid8funcs.c             |   4 +-
 src/backend/utils/time/snapmgr.c              | 198 +++++++++++-------
 src/include/access/heapam.h                   |   2 +-
 src/include/access/relscan.h                  |   6 +-
 src/include/replication/reorderbuffer.h       |  12 +-
 src/include/replication/snapbuild.h           |   6 +-
 src/include/replication/snapbuild_internal.h  |   2 +-
 src/include/storage/predicate.h               |   4 +-
 src/include/storage/procarray.h               |   2 +-
 src/include/utils/snapmgr.h                   |  16 +-
 src/include/utils/snapshot.h                  | 155 +++++++++-----
 src/tools/pgindent/typedefs.list              |   4 +
 36 files changed, 451 insertions(+), 336 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 827312306f6..2b25b281d80 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -276,7 +276,7 @@ verify_heapam(PG_FUNCTION_ARGS)
 	 * Any xmin newer than the xmin of our snapshot can't become all-visible
 	 * while we're running.
 	 */
-	ctx.safe_xmin = GetTransactionSnapshot()->xmin;
+	ctx.safe_xmin = GetTransactionSnapshot()->mvcc.xmin;
 
 	/*
 	 * If we report corruption when not examining some individual attribute,
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 825b677c47c..c7f312959c4 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -582,7 +582,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
 			 */
 			if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin &&
 				!TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data),
-									   snapshot->xmin))
+									   snapshot->mvcc.xmin))
 				ereport(ERROR,
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 						 errmsg("index \"%s\" cannot be verified using transaction snapshot",
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..493aad2e8de 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -538,7 +538,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	 * full page write. Until we can prove that beyond doubt, let's check each
 	 * tuple for visibility the hard way.
 	 */
-	all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
+	all_visible = PageIsAllVisible(page) &&
+		(snapshot->snapshot_type != SNAPSHOT_MVCC || !snapshot->mvcc.takenDuringRecovery);
 	check_serializable =
 		CheckForSerializableConflictOutNeeded(scan->rs_base.rs_rd, snapshot);
 
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d74f0fbc5cd..10f36fc1f93 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -386,7 +386,7 @@ tuple_lock_retry:
 
 		if (!ItemPointerEquals(&tmfd->ctid, &tuple->t_self))
 		{
-			SnapshotData SnapshotDirty;
+			DirtySnapshotData SnapshotDirty;
 			TransactionId priorXmax;
 
 			/* it was updated, so look at the updated version */
@@ -411,7 +411,7 @@ tuple_lock_retry:
 							 errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
 
 				tuple->t_self = *tid;
-				if (heap_fetch(relation, &SnapshotDirty, tuple, &buffer, true))
+				if (heap_fetch(relation, (Snapshot) &SnapshotDirty, tuple, &buffer, true))
 				{
 					/*
 					 * If xmin isn't what we're expecting, the slot must have
@@ -2457,7 +2457,7 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
 
 	page = (Page) BufferGetPage(hscan->rs_cbuf);
 	all_visible = PageIsAllVisible(page) &&
-		!scan->rs_snapshot->takenDuringRecovery;
+		(scan->rs_snapshot->snapshot_type != SNAPSHOT_MVCC || !scan->rs_snapshot->mvcc.takenDuringRecovery);
 	maxoffset = PageGetMaxOffsetNumber(page);
 
 	for (;;)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..f5d69b558f1 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -740,7 +740,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
  * token is also returned in snapshot->speculativeToken.
  */
 static bool
-HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesDirty(HeapTuple htup, DirtySnapshotData *snapshot,
 						Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -957,7 +957,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
  * and more contention on ProcArrayLock.
  */
 static bool
-HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesMVCC(HeapTuple htup, MVCCSnapshot snapshot,
 					   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1435,7 +1435,7 @@ HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer, TransactionId *de
  *	snapshot->vistest must have been set up with the horizon to use.
  */
 static bool
-HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesNonVacuumable(HeapTuple htup, NonVacuumableSnapshotData *snapshot,
 								Buffer buffer)
 {
 	TransactionId dead_after = InvalidTransactionId;
@@ -1593,7 +1593,7 @@ TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
  * complicated than when dealing "only" with the present.
  */
 static bool
-HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
+HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, HistoricMVCCSnapshot snapshot,
 							   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
@@ -1610,7 +1610,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 	/* check if it's one of our txids, toplevel is also in there */
-	else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+	else if (TransactionIdInArray(xmin, snapshot->curxip, snapshot->curxcnt))
 	{
 		bool		resolved;
 		CommandId	cmin = HeapTupleHeaderGetRawCommandId(tuple);
@@ -1669,7 +1669,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 	/* check if it's a committed transaction in [xmin, xmax) */
-	else if (TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
+	else if (TransactionIdInArray(xmin, snapshot->committed_xids, snapshot->xcnt))
 	{
 		/* fall through */
 	}
@@ -1702,7 +1702,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	}
 
 	/* check if it's one of our txids, toplevel is also in there */
-	if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+	if (TransactionIdInArray(xmax, snapshot->curxip, snapshot->curxcnt))
 	{
 		bool		resolved;
 		CommandId	cmin;
@@ -1755,7 +1755,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
 		return true;
 	/* xmax is between [xmin, xmax), check known committed array */
-	else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
+	else if (TransactionIdInArray(xmax, snapshot->committed_xids, snapshot->xcnt))
 		return false;
 	/* xmax is between [xmin, xmax), but known not to have committed yet */
 	else
@@ -1778,7 +1778,7 @@ HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	switch (snapshot->snapshot_type)
 	{
 		case SNAPSHOT_MVCC:
-			return HeapTupleSatisfiesMVCC(htup, snapshot, buffer);
+			return HeapTupleSatisfiesMVCC(htup, &snapshot->mvcc, buffer);
 		case SNAPSHOT_SELF:
 			return HeapTupleSatisfiesSelf(htup, snapshot, buffer);
 		case SNAPSHOT_ANY:
@@ -1786,11 +1786,11 @@ HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 		case SNAPSHOT_TOAST:
 			return HeapTupleSatisfiesToast(htup, snapshot, buffer);
 		case SNAPSHOT_DIRTY:
-			return HeapTupleSatisfiesDirty(htup, snapshot, buffer);
+			return HeapTupleSatisfiesDirty(htup, &snapshot->dirty, buffer);
 		case SNAPSHOT_HISTORIC_MVCC:
-			return HeapTupleSatisfiesHistoricMVCC(htup, snapshot, buffer);
+			return HeapTupleSatisfiesHistoricMVCC(htup, &snapshot->historic_mvcc, buffer);
 		case SNAPSHOT_NON_VACUUMABLE:
-			return HeapTupleSatisfiesNonVacuumable(htup, snapshot, buffer);
+			return HeapTupleSatisfiesNonVacuumable(htup, &snapshot->nonvacuumable, buffer);
 	}
 
 	return false;				/* keep compiler quiet */
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 55ec4c10352..769170a37d5 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -469,7 +469,7 @@ index_parallelscan_estimate(Relation indexRelation, int nkeys, int norderbys,
 	RELATION_CHECKS;
 
 	nbytes = offsetof(ParallelIndexScanDescData, ps_snapshot_data);
-	nbytes = add_size(nbytes, EstimateSnapshotSpace(snapshot));
+	nbytes = add_size(nbytes, EstimateSnapshotSpace(&snapshot->mvcc));
 	nbytes = MAXALIGN(nbytes);
 
 	if (instrument)
@@ -517,16 +517,17 @@ index_parallelscan_initialize(Relation heapRelation, Relation indexRelation,
 	Assert(instrument || parallel_aware);
 
 	RELATION_CHECKS;
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
 
 	offset = add_size(offsetof(ParallelIndexScanDescData, ps_snapshot_data),
-					  EstimateSnapshotSpace(snapshot));
+					  EstimateSnapshotSpace((MVCCSnapshot) snapshot));
 	offset = MAXALIGN(offset);
 
 	target->ps_locator = heapRelation->rd_locator;
 	target->ps_indexlocator = indexRelation->rd_locator;
 	target->ps_offset_ins = 0;
 	target->ps_offset_am = 0;
-	SerializeSnapshot(snapshot, target->ps_snapshot_data);
+	SerializeSnapshot((MVCCSnapshot) snapshot, target->ps_snapshot_data);
 
 	if (instrument)
 	{
@@ -590,8 +591,8 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	Assert(RelFileLocatorEquals(heaprel->rd_locator, pscan->ps_locator));
 	Assert(RelFileLocatorEquals(indexrel->rd_locator, pscan->ps_indexlocator));
 
-	snapshot = RestoreSnapshot(pscan->ps_snapshot_data);
-	RegisterSnapshot(snapshot);
+	snapshot = (Snapshot) RestoreSnapshot(pscan->ps_snapshot_data);
+	snapshot = RegisterSnapshot(snapshot);
 	scan = index_beginscan_internal(indexrel, nkeys, norderbys, snapshot,
 									pscan, true);
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index aa82cede30a..714e4ee3f0b 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -413,7 +413,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 	IndexTuple	curitup = NULL;
 	ItemId		curitemid = NULL;
 	BTScanInsert itup_key = insertstate->itup_key;
-	SnapshotData SnapshotDirty;
+	DirtySnapshotData SnapshotDirty;
 	OffsetNumber offset;
 	OffsetNumber maxoff;
 	Page		page;
@@ -558,7 +558,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
 				 * index entry for the entire chain.
 				 */
 				else if (table_index_fetch_tuple_check(heapRel, &htid,
-													   &SnapshotDirty,
+													   (Snapshot) &SnapshotDirty,
 													   &all_dead))
 				{
 					TransactionId xwait;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index eeddacd0d52..524374a7dd5 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -811,7 +811,7 @@ spgvacuumscan(spgBulkDeleteState *bds)
 	/* Finish setting up spgBulkDeleteState */
 	initSpGistState(&bds->spgstate, index);
 	bds->pendingList = NULL;
-	bds->myXmin = GetActiveSnapshot()->xmin;
+	bds->myXmin = GetActiveSnapshot()->mvcc.xmin;
 	bds->lastFilledBlock = SPGIST_LAST_FIXED_BLKNO;
 
 	/*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..4eb81e40d99 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -133,7 +133,7 @@ table_parallelscan_estimate(Relation rel, Snapshot snapshot)
 	Size		sz = 0;
 
 	if (IsMVCCSnapshot(snapshot))
-		sz = add_size(sz, EstimateSnapshotSpace(snapshot));
+		sz = add_size(sz, EstimateSnapshotSpace((MVCCSnapshot) snapshot));
 	else
 		Assert(snapshot == SnapshotAny);
 
@@ -152,7 +152,7 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 
 	if (IsMVCCSnapshot(snapshot))
 	{
-		SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off);
+		SerializeSnapshot((MVCCSnapshot) snapshot, (char *) pscan + pscan->phs_snapshot_off);
 		pscan->phs_snapshot_any = false;
 	}
 	else
@@ -174,8 +174,8 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	if (!pscan->phs_snapshot_any)
 	{
 		/* Snapshot was serialized -- restore it */
-		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
-		RegisterSnapshot(snapshot);
+		snapshot = (Snapshot) RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
+		snapshot = RegisterSnapshot(snapshot);
 		flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 94db1ec3012..8046e14abf7 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -275,10 +275,10 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		shm_toc_estimate_chunk(&pcxt->estimator, combocidlen);
 		if (IsolationUsesXactSnapshot())
 		{
-			tsnaplen = EstimateSnapshotSpace(transaction_snapshot);
+			tsnaplen = EstimateSnapshotSpace((MVCCSnapshot) transaction_snapshot);
 			shm_toc_estimate_chunk(&pcxt->estimator, tsnaplen);
 		}
-		asnaplen = EstimateSnapshotSpace(active_snapshot);
+		asnaplen = EstimateSnapshotSpace((MVCCSnapshot) active_snapshot);
 		shm_toc_estimate_chunk(&pcxt->estimator, asnaplen);
 		tstatelen = EstimateTransactionStateSpace();
 		shm_toc_estimate_chunk(&pcxt->estimator, tstatelen);
@@ -400,14 +400,14 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		if (IsolationUsesXactSnapshot())
 		{
 			tsnapspace = shm_toc_allocate(pcxt->toc, tsnaplen);
-			SerializeSnapshot(transaction_snapshot, tsnapspace);
+			SerializeSnapshot((MVCCSnapshot) transaction_snapshot, tsnapspace);
 			shm_toc_insert(pcxt->toc, PARALLEL_KEY_TRANSACTION_SNAPSHOT,
 						   tsnapspace);
 		}
 
 		/* Serialize the active snapshot. */
 		asnapspace = shm_toc_allocate(pcxt->toc, asnaplen);
-		SerializeSnapshot(active_snapshot, asnapspace);
+		SerializeSnapshot((MVCCSnapshot) active_snapshot, asnapspace);
 		shm_toc_insert(pcxt->toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, asnapspace);
 
 		/* Provide the handle for per-session segment. */
@@ -1493,9 +1493,9 @@ ParallelWorkerMain(Datum main_arg)
 	 */
 	asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, false);
 	tsnapspace = shm_toc_lookup(toc, PARALLEL_KEY_TRANSACTION_SNAPSHOT, true);
-	asnapshot = RestoreSnapshot(asnapspace);
-	tsnapshot = tsnapspace ? RestoreSnapshot(tsnapspace) : asnapshot;
-	RestoreTransactionSnapshot(tsnapshot,
+	asnapshot = (Snapshot) RestoreSnapshot(asnapspace);
+	tsnapshot = tsnapspace ? (Snapshot) RestoreSnapshot(tsnapspace) : asnapshot;
+	RestoreTransactionSnapshot((MVCCSnapshot) tsnapshot,
 							   fps->parallel_leader_pgproc);
 	PushActiveSnapshot(asnapshot);
 
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 929bb53b620..b658601bf77 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -148,7 +148,7 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
 				snap = GetActiveSnapshot();
 
-				if (!XidInMVCCSnapshot(xmin, snap))
+				if (!XidInMVCCSnapshot(xmin, (MVCCSnapshot) snap))
 				{
 					if (detached_xmin)
 					{
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bd37d5beb5..1ffb6f5fa70 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -2022,6 +2022,8 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 	bool		reachedEndOfPage;
 	AsyncQueueEntry *qe;
 
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
+
 	do
 	{
 		QueuePosition thisentry = *current;
@@ -2041,7 +2043,7 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 		/* Ignore messages destined for other databases */
 		if (qe->dboid == MyDatabaseId)
 		{
-			if (XidInMVCCSnapshot(qe->xid, snapshot))
+			if (XidInMVCCSnapshot(qe->xid, (MVCCSnapshot) snapshot))
 			{
 				/*
 				 * The source transaction is still in progress, so we can't
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 32ff3ca9a28..06d1f4d0bd5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1767,7 +1767,7 @@ DefineIndex(Oid tableId,
 	 * they must wait for.  But first, save the snapshot's xmin to use as
 	 * limitXmin for GetCurrentVirtualXIDs().
 	 */
-	limitXmin = snapshot->xmin;
+	limitXmin = snapshot->mvcc.xmin;
 
 	PopActiveSnapshot();
 	UnregisterSnapshot(snapshot);
@@ -4162,7 +4162,7 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein
 		 * We can now do away with our active snapshot, we still need to save
 		 * the xmin limit to wait for older snapshots.
 		 */
-		limitXmin = snapshot->xmin;
+		limitXmin = snapshot->mvcc.xmin;
 
 		PopActiveSnapshot();
 		UnregisterSnapshot(snapshot);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 18ff8956577..ef92cc06812 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -20676,7 +20676,7 @@ ATExecDetachPartitionFinalize(Relation rel, RangeVar *name)
 	 * all such queries are complete (otherwise we would present them with an
 	 * inconsistent view of catalogs).
 	 */
-	WaitForOlderSnapshots(snap->xmin, false);
+	WaitForOlderSnapshots(snap->mvcc.xmin, false);
 
 	DetachPartitionFinalize(rel, partRel, true, InvalidOid);
 
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index e3fe9b78bb5..a3955792729 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -717,7 +717,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	int			indnkeyatts = IndexRelationGetNumberOfKeyAttributes(index);
 	IndexScanDesc index_scan;
 	ScanKeyData scankeys[INDEX_MAX_KEYS];
-	SnapshotData DirtySnapshot;
+	DirtySnapshotData DirtySnapshot;
 	int			i;
 	bool		conflict;
 	bool		found_self;
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, (Snapshot) &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 0a9b880d250..3b817fb3a79 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -184,7 +184,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	ScanKeyData skey[INDEX_MAX_KEYS];
 	int			skey_attoff;
 	IndexScanDesc scan;
-	SnapshotData snap;
+	DirtySnapshotData snap;
 	TransactionId xwait;
 	Relation	idxrel;
 	bool		found;
@@ -202,7 +202,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, (Snapshot) &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -357,7 +357,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 {
 	TupleTableSlot *scanslot;
 	TableScanDesc scan;
-	SnapshotData snap;
+	DirtySnapshotData snap;
 	TypeCacheEntry **eq;
 	TransactionId xwait;
 	bool		found;
@@ -369,7 +369,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, (Snapshot) &snap, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 328b4d450e4..7c15c634181 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -102,7 +102,7 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
 		activesnap = GetActiveSnapshot();
 
-		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, &activesnap->mvcc))
 			return rel->rd_partdesc_nodetached;
 	}
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 78f9a0a11c4..6a428e9720e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -586,7 +586,7 @@ logicalmsg_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	TransactionId xid = XLogRecGetXid(r);
 	uint8		info = XLogRecGetInfo(r) & ~XLR_INFO_MASK;
 	RepOriginId origin_id = XLogRecGetOrigin(r);
-	Snapshot	snapshot = NULL;
+	HistoricMVCCSnapshot snapshot = NULL;
 	xl_logical_message *message;
 
 	if (info != XLOG_LOGICAL_MESSAGE)
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index c3c1d7a2a51..45e87e7b672 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -259,7 +259,7 @@ replorigin_create(const char *roname)
 	HeapTuple	tuple = NULL;
 	Relation	rel;
 	Datum		roname_d;
-	SnapshotData SnapshotDirty;
+	DirtySnapshotData SnapshotDirty;
 	SysScanDesc scan;
 	ScanKeyData key;
 
@@ -301,7 +301,7 @@ replorigin_create(const char *roname)
 
 		scan = systable_beginscan(rel, ReplicationOriginIdentIndex,
 								  true /* indexOK */ ,
-								  &SnapshotDirty,
+								  (Snapshot) &SnapshotDirty,
 								  1, &key);
 
 		collides = HeapTupleIsValid(systable_getnext(scan));
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 977fbcd2474..e8196a8d5d5 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -268,9 +268,9 @@ static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
 										TransactionId xid, XLogSegNo segno);
 static int	ReorderBufferTXNSizeCompare(const pairingheap_node *a, const pairingheap_node *b, void *arg);
 
-static void ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap);
-static Snapshot ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
-									  ReorderBufferTXN *txn, CommandId cid);
+static void ReorderBufferFreeSnap(ReorderBuffer *rb, HistoricMVCCSnapshot snap);
+static HistoricMVCCSnapshot ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
+												  ReorderBufferTXN *txn, CommandId cid);
 
 /*
  * ---------------------------------------
@@ -852,7 +852,7 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn,
  */
 void
 ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
-						  Snapshot snap, XLogRecPtr lsn,
+						  HistoricMVCCSnapshot snap, XLogRecPtr lsn,
 						  bool transactional, const char *prefix,
 						  Size message_size, const char *message)
 {
@@ -886,7 +886,7 @@ ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
 	else
 	{
 		ReorderBufferTXN *txn = NULL;
-		volatile Snapshot snapshot_now = snap;
+		volatile	HistoricMVCCSnapshot snapshot_now = snap;
 
 		/* Non-transactional changes require a valid snapshot. */
 		Assert(snapshot_now);
@@ -1886,55 +1886,55 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
  * that catalog modifying transactions can look into intermediate catalog
  * states.
  */
-static Snapshot
-ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
+static HistoricMVCCSnapshot
+ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
 					  ReorderBufferTXN *txn, CommandId cid)
 {
-	Snapshot	snap;
+	HistoricMVCCSnapshot snap;
 	dlist_iter	iter;
 	int			i = 0;
 	Size		size;
 
-	size = sizeof(SnapshotData) +
+	size = sizeof(HistoricMVCCSnapshotData) +
 		sizeof(TransactionId) * orig_snap->xcnt +
 		sizeof(TransactionId) * (txn->nsubtxns + 1);
 
 	snap = MemoryContextAllocZero(rb->context, size);
-	memcpy(snap, orig_snap, sizeof(SnapshotData));
+	memcpy(snap, orig_snap, sizeof(HistoricMVCCSnapshotData));
 
 	snap->copied = true;
-	snap->active_count = 1;		/* mark as active so nobody frees it */
+	snap->refcount = 1;			/* mark as active so nobody frees it */
 	snap->regd_count = 0;
-	snap->xip = (TransactionId *) (snap + 1);
+	snap->committed_xids = (TransactionId *) (snap + 1);
 
-	memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
+	memcpy(snap->committed_xids, orig_snap->committed_xids, sizeof(TransactionId) * snap->xcnt);
 
 	/*
-	 * snap->subxip contains all txids that belong to our transaction which we
+	 * snap->curxip contains all txids that belong to our transaction which we
 	 * need to check via cmin/cmax. That's why we store the toplevel
 	 * transaction in there as well.
 	 */
-	snap->subxip = snap->xip + snap->xcnt;
-	snap->subxip[i++] = txn->xid;
+	snap->curxip = snap->committed_xids + snap->xcnt;
+	snap->curxip[i++] = txn->xid;
 
 	/*
 	 * txn->nsubtxns isn't decreased when subtransactions abort, so count
 	 * manually. Since it's an upper boundary it is safe to use it for the
 	 * allocation above.
 	 */
-	snap->subxcnt = 1;
+	snap->curxcnt = 1;
 
 	dlist_foreach(iter, &txn->subtxns)
 	{
 		ReorderBufferTXN *sub_txn;
 
 		sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
-		snap->subxip[i++] = sub_txn->xid;
-		snap->subxcnt++;
+		snap->curxip[i++] = sub_txn->xid;
+		snap->curxcnt++;
 	}
 
 	/* sort so we can bsearch() later */
-	qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+	qsort(snap->curxip, snap->curxcnt, sizeof(TransactionId), xidComparator);
 
 	/* store the specified current CommandId */
 	snap->curcid = cid;
@@ -1946,7 +1946,7 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
  * Free a previously ReorderBufferCopySnap'ed snapshot
  */
 static void
-ReorderBufferFreeSnap(ReorderBuffer *rb, Snapshot snap)
+ReorderBufferFreeSnap(ReorderBuffer *rb, HistoricMVCCSnapshot snap)
 {
 	if (snap->copied)
 		pfree(snap);
@@ -2099,7 +2099,7 @@ ReorderBufferApplyMessage(ReorderBuffer *rb, ReorderBufferTXN *txn,
  */
 static inline void
 ReorderBufferSaveTXNSnapshot(ReorderBuffer *rb, ReorderBufferTXN *txn,
-							 Snapshot snapshot_now, CommandId command_id)
+							 HistoricMVCCSnapshot snapshot_now, CommandId command_id)
 {
 	txn->command_id = command_id;
 
@@ -2144,7 +2144,7 @@ ReorderBufferMaybeMarkTXNStreamed(ReorderBuffer *rb, ReorderBufferTXN *txn)
  */
 static void
 ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
-					  Snapshot snapshot_now,
+					  HistoricMVCCSnapshot snapshot_now,
 					  CommandId command_id,
 					  XLogRecPtr last_lsn,
 					  ReorderBufferChange *specinsert)
@@ -2191,7 +2191,7 @@ ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 static void
 ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 						XLogRecPtr commit_lsn,
-						volatile Snapshot snapshot_now,
+						volatile HistoricMVCCSnapshot snapshot_now,
 						volatile CommandId command_id,
 						bool streaming)
 {
@@ -2779,7 +2779,7 @@ ReorderBufferReplay(ReorderBufferTXN *txn,
 					TimestampTz commit_time,
 					RepOriginId origin_id, XLogRecPtr origin_lsn)
 {
-	Snapshot	snapshot_now;
+	HistoricMVCCSnapshot snapshot_now;
 	CommandId	command_id = FirstCommandId;
 
 	txn->final_lsn = commit_lsn;
@@ -3251,7 +3251,7 @@ ReorderBufferProcessXid(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn)
  */
 void
 ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
-						 XLogRecPtr lsn, Snapshot snap)
+						 XLogRecPtr lsn, HistoricMVCCSnapshot snap)
 {
 	ReorderBufferChange *change = ReorderBufferAllocChange(rb);
 
@@ -3269,7 +3269,7 @@ ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
  */
 void
 ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
-							 XLogRecPtr lsn, Snapshot snap)
+							 XLogRecPtr lsn, HistoricMVCCSnapshot snap)
 {
 	ReorderBufferTXN *txn;
 	bool		is_new;
@@ -4043,14 +4043,14 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	snap;
+				HistoricMVCCSnapshot snap;
 				char	   *data;
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
+				sz += sizeof(HistoricMVCCSnapshotData) +
 					sizeof(TransactionId) * snap->xcnt +
-					sizeof(TransactionId) * snap->subxcnt;
+					sizeof(TransactionId) * snap->curxcnt;
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
@@ -4058,21 +4058,21 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
-				memcpy(data, snap, sizeof(SnapshotData));
-				data += sizeof(SnapshotData);
+				memcpy(data, snap, sizeof(HistoricMVCCSnapshotData));
+				data += sizeof(HistoricMVCCSnapshotData);
 
 				if (snap->xcnt)
 				{
-					memcpy(data, snap->xip,
+					memcpy(data, snap->committed_xids,
 						   sizeof(TransactionId) * snap->xcnt);
 					data += sizeof(TransactionId) * snap->xcnt;
 				}
 
-				if (snap->subxcnt)
+				if (snap->curxcnt)
 				{
-					memcpy(data, snap->subxip,
-						   sizeof(TransactionId) * snap->subxcnt);
-					data += sizeof(TransactionId) * snap->subxcnt;
+					memcpy(data, snap->curxip,
+						   sizeof(TransactionId) * snap->curxcnt);
+					data += sizeof(TransactionId) * snap->curxcnt;
 				}
 				break;
 			}
@@ -4177,7 +4177,7 @@ ReorderBufferCanStartStreaming(ReorderBuffer *rb)
 static void
 ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 {
-	Snapshot	snapshot_now;
+	HistoricMVCCSnapshot snapshot_now;
 	CommandId	command_id;
 	Size		stream_bytes;
 	bool		txn_is_streamed;
@@ -4196,10 +4196,10 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	 * After that we need to reuse the snapshot from the previous run.
 	 *
 	 * Unlike DecodeCommit which adds xids of all the subtransactions in
-	 * snapshot's xip array via SnapBuildCommitTxn, we can't do that here but
-	 * we do add them to subxip array instead via ReorderBufferCopySnap. This
-	 * allows the catalog changes made in subtransactions decoded till now to
-	 * be visible.
+	 * snapshot's committed_xids array via SnapBuildCommitTxn, we can't do
+	 * that here but we do add them to curxip array instead via
+	 * ReorderBufferCopySnap. This allows the catalog changes made in
+	 * subtransactions decoded till now to be visible.
 	 */
 	if (txn->snapshot_now == NULL)
 	{
@@ -4345,13 +4345,13 @@ ReorderBufferChangeSize(ReorderBufferChange *change)
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	snap;
+				HistoricMVCCSnapshot snap;
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
+				sz += sizeof(HistoricMVCCSnapshotData) +
 					sizeof(TransactionId) * snap->xcnt +
-					sizeof(TransactionId) * snap->subxcnt;
+					sizeof(TransactionId) * snap->curxcnt;
 
 				break;
 			}
@@ -4629,24 +4629,24 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	oldsnap;
-				Snapshot	newsnap;
+				HistoricMVCCSnapshot oldsnap;
+				HistoricMVCCSnapshot newsnap;
 				Size		size;
 
-				oldsnap = (Snapshot) data;
+				oldsnap = (HistoricMVCCSnapshot) data;
 
-				size = sizeof(SnapshotData) +
+				size = sizeof(HistoricMVCCSnapshotData) +
 					sizeof(TransactionId) * oldsnap->xcnt +
-					sizeof(TransactionId) * (oldsnap->subxcnt + 0);
+					sizeof(TransactionId) * (oldsnap->curxcnt + 0);
 
 				change->data.snapshot = MemoryContextAllocZero(rb->context, size);
 
 				newsnap = change->data.snapshot;
 
 				memcpy(newsnap, data, size);
-				newsnap->xip = (TransactionId *)
-					(((char *) newsnap) + sizeof(SnapshotData));
-				newsnap->subxip = newsnap->xip + newsnap->xcnt;
+				newsnap->committed_xids = (TransactionId *)
+					(((char *) newsnap) + sizeof(HistoricMVCCSnapshotData));
+				newsnap->curxip = newsnap->committed_xids + newsnap->xcnt;
 				newsnap->copied = true;
 				break;
 			}
@@ -5316,7 +5316,7 @@ file_sort_by_lsn(const ListCell *a_p, const ListCell *b_p)
  * transaction for relid.
  */
 static void
-UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
+UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, HistoricMVCCSnapshot snapshot)
 {
 	DIR		   *mapping_dir;
 	struct dirent *mapping_de;
@@ -5364,7 +5364,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 			continue;
 
 		/* not for our transaction */
-		if (!TransactionIdInArray(f_mapped_xid, snapshot->subxip, snapshot->subxcnt))
+		if (!TransactionIdInArray(f_mapped_xid, snapshot->curxip, snapshot->curxcnt))
 			continue;
 
 		/* ok, relevant, queue for apply */
@@ -5383,7 +5383,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 		RewriteMappingFile *f = (RewriteMappingFile *) lfirst(file);
 
 		elog(DEBUG1, "applying mapping: \"%s\" in %u", f->fname,
-			 snapshot->subxip[0]);
+			 snapshot->curxip[0]);
 		ApplyLogicalMappingFile(tuplecid_data, relid, f->fname);
 		pfree(f);
 	}
@@ -5395,7 +5395,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
  */
 bool
 ResolveCminCmaxDuringDecoding(HTAB *tuplecid_data,
-							  Snapshot snapshot,
+							  HistoricMVCCSnapshot snapshot,
 							  HeapTuple htup, Buffer buffer,
 							  CommandId *cmin, CommandId *cmax)
 {
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index b64e53de017..7a341418a74 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -155,11 +155,11 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
+static HistoricMVCCSnapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
-static void SnapBuildFreeSnapshot(Snapshot snap);
+static void SnapBuildFreeSnapshot(HistoricMVCCSnapshot snap);
 
-static void SnapBuildSnapIncRefcount(Snapshot snap);
+static void SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap);
 
 static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
@@ -249,23 +249,21 @@ FreeSnapshotBuilder(SnapBuild *builder)
  * Free an unreferenced snapshot that has previously been built by us.
  */
 static void
-SnapBuildFreeSnapshot(Snapshot snap)
+SnapBuildFreeSnapshot(HistoricMVCCSnapshot snap)
 {
 	/* make sure we don't get passed an external snapshot */
 	Assert(snap->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
-	Assert(!snap->takenDuringRecovery);
 	Assert(snap->regd_count == 0);
 
 	/* slightly more likely, so it's checked even without c-asserts */
 	if (snap->copied)
 		elog(ERROR, "cannot free a copied snapshot");
 
-	if (snap->active_count)
-		elog(ERROR, "cannot free an active snapshot");
+	if (snap->refcount)
+		elog(ERROR, "cannot free a snapshot that's in use");
 
 	pfree(snap);
 }
@@ -313,9 +311,9 @@ SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
  * adding a Snapshot as builder->snapshot.
  */
 static void
-SnapBuildSnapIncRefcount(Snapshot snap)
+SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap)
 {
-	snap->active_count++;
+	snap->refcount++;
 }
 
 /*
@@ -325,26 +323,23 @@ SnapBuildSnapIncRefcount(Snapshot snap)
  * IncRef'ed Snapshot can adjust its refcount easily.
  */
 void
-SnapBuildSnapDecRefcount(Snapshot snap)
+SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap)
 {
 	/* make sure we don't get passed an external snapshot */
 	Assert(snap->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
-	Assert(!snap->takenDuringRecovery);
 
+	Assert(snap->refcount > 0);
 	Assert(snap->regd_count == 0);
 
-	Assert(snap->active_count > 0);
-
 	/* slightly more likely, so it's checked even without casserts */
 	if (snap->copied)
 		elog(ERROR, "cannot free a copied snapshot");
 
-	snap->active_count--;
-	if (snap->active_count == 0)
+	snap->refcount--;
+	if (snap->refcount == 0)
 		SnapBuildFreeSnapshot(snap);
 }
 
@@ -356,15 +351,15 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * these snapshots; they have to copy them and fill in appropriate ->curcid
  * and ->subxip/subxcnt values.
  */
-static Snapshot
+static HistoricMVCCSnapshot
 SnapBuildBuildSnapshot(SnapBuild *builder)
 {
-	Snapshot	snapshot;
+	HistoricMVCCSnapshot snapshot;
 	Size		ssize;
 
 	Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
 
-	ssize = sizeof(SnapshotData)
+	ssize = sizeof(HistoricMVCCSnapshotData)
 		+ sizeof(TransactionId) * builder->committed.xcnt
 		+ sizeof(TransactionId) * 1 /* toplevel xid */ ;
 
@@ -400,31 +395,28 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->xmax = builder->xmax;
 
 	/* store all transactions to be treated as committed by this snapshot */
-	snapshot->xip =
-		(TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
+	snapshot->committed_xids =
+		(TransactionId *) ((char *) snapshot + sizeof(HistoricMVCCSnapshotData));
 	snapshot->xcnt = builder->committed.xcnt;
-	memcpy(snapshot->xip,
+	memcpy(snapshot->committed_xids,
 		   builder->committed.xip,
 		   builder->committed.xcnt * sizeof(TransactionId));
 
 	/* sort so we can bsearch() */
-	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+	qsort(snapshot->committed_xids, snapshot->xcnt, sizeof(TransactionId), xidComparator);
 
 	/*
-	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
+	 * Initially, curxip is empty, i.e. it's a snapshot to be used by
 	 * transactions that don't modify the catalog. Will be filled by
 	 * ReorderBufferCopySnap() if necessary.
 	 */
-	snapshot->subxcnt = 0;
-	snapshot->subxip = NULL;
+	snapshot->curxcnt = 0;
+	snapshot->curxip = NULL;
 
-	snapshot->suboverflowed = false;
-	snapshot->takenDuringRecovery = false;
 	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
-	snapshot->active_count = 0;
+	snapshot->refcount = 0;
 	snapshot->regd_count = 0;
-	snapshot->snapXactCompletionCount = 0;
 
 	return snapshot;
 }
@@ -436,13 +428,13 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
  * The snapshot will be usable directly in current transaction or exported
  * for loading in different transaction.
  */
-Snapshot
+MVCCSnapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
-	Snapshot	snap;
+	HistoricMVCCSnapshot historicsnap;
+	MVCCSnapshot mvccsnap;
 	TransactionId xid;
 	TransactionId safeXid;
-	TransactionId *newxip;
 	int			newxcnt = 0;
 
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -464,10 +456,10 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyProc->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyProc->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder);
+	historicsnap = SnapBuildBuildSnapshot(builder);
 
 	/*
-	 * We know that snap->xmin is alive, enforced by the logical xmin
+	 * We know that historicsnap->xmin is alive, enforced by the logical xmin
 	 * mechanism. Due to that we can do this without locks, we're only
 	 * changing our own value.
 	 *
@@ -479,15 +471,18 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	safeXid = GetOldestSafeDecodingTransactionId(false);
 	LWLockRelease(ProcArrayLock);
 
-	if (TransactionIdFollows(safeXid, snap->xmin))
+	if (TransactionIdFollows(safeXid, historicsnap->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot as oldest safe xid %u follows snapshot's xmin %u",
-			 safeXid, snap->xmin);
+			 safeXid, historicsnap->xmin);
 
-	MyProc->xmin = snap->xmin;
+	MyProc->xmin = historicsnap->xmin;
 
 	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	mvccsnap = palloc(sizeof(MVCCSnapshotData) + sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	mvccsnap->snapshot_type = SNAPSHOT_MVCC;
+	mvccsnap->xmin = historicsnap->xmin;
+	mvccsnap->xmax = historicsnap->xmax;
+	mvccsnap->xip = (TransactionId *) ((char *) mvccsnap + sizeof(MVCCSnapshotData));
 
 	/*
 	 * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -495,15 +490,15 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+	for (xid = historicsnap->xmin; NormalTransactionIdPrecedes(xid, historicsnap->xmax);)
 	{
 		void	   *test;
 
 		/*
-		 * Check whether transaction committed using the decoding snapshot
-		 * meaning of ->xip.
+		 * Check whether transaction committed using the decoding snapshot's
+		 * committed_xids array.
 		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
+		test = bsearch(&xid, historicsnap->committed_xids, historicsnap->xcnt,
 					   sizeof(TransactionId), xidComparator);
 
 		if (test == NULL)
@@ -513,18 +508,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 						 errmsg("initial slot snapshot too large")));
 
-			newxip[newxcnt++] = xid;
+			mvccsnap->xip[newxcnt++] = xid;
 		}
 
 		TransactionIdAdvance(xid);
 	}
-
-	/* adjust remaining snapshot fields as needed */
-	snap->snapshot_type = SNAPSHOT_MVCC;
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
-
-	return snap;
+	mvccsnap->xcnt = newxcnt;
+
+	/* Initialize remaining MVCCSnapshot fields */
+	mvccsnap->subxip = NULL;
+	mvccsnap->subxcnt = 0;
+	mvccsnap->suboverflowed = false;
+	mvccsnap->takenDuringRecovery = false;
+	mvccsnap->copied = true;
+	mvccsnap->curcid = FirstCommandId;
+	mvccsnap->active_count = 0;
+	mvccsnap->regd_count = 0;
+	mvccsnap->snapXactCompletionCount = 0;
+
+	pfree(historicsnap);
+
+	return mvccsnap;
 }
 
 /*
@@ -538,7 +542,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 const char *
 SnapBuildExportSnapshot(SnapBuild *builder)
 {
-	Snapshot	snap;
+	MVCCSnapshot snap;
 	char	   *snapname;
 
 	if (IsTransactionOrTransactionBlock())
@@ -575,7 +579,7 @@ SnapBuildExportSnapshot(SnapBuild *builder)
 /*
  * Ensure there is a snapshot and if not build one for current transaction.
  */
-Snapshot
+HistoricMVCCSnapshot
 SnapBuildGetOrBuildSnapshot(SnapBuild *builder)
 {
 	Assert(builder->state == SNAPBUILD_CONSISTENT);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index d96121b3aad..8a749c89af1 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1305,7 +1305,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		}
 		else if (snapshot_action == CRS_USE_SNAPSHOT)
 		{
-			Snapshot	snap;
+			MVCCSnapshot snap;
 
 			snap = SnapBuildInitialSnapshot(ctx->snapshot_builder);
 			RestoreTransactionSnapshot(snap, MyProc);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 2e54c11f880..b2751dfa63b 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2092,7 +2092,7 @@ GetMaxSnapshotSubxidCount(void)
  * least in the case we already hold a snapshot), but that's for another day.
  */
 static bool
-GetSnapshotDataReuse(Snapshot snapshot)
+GetSnapshotDataReuse(MVCCSnapshot snapshot)
 {
 	uint64		curXactCompletionCount;
 
@@ -2171,8 +2171,8 @@ GetSnapshotDataReuse(Snapshot snapshot)
  * Note: this function should probably not be called with an argument that's
  * not statically allocated (see xip allocation below).
  */
-Snapshot
-GetSnapshotData(Snapshot snapshot)
+MVCCSnapshot
+GetSnapshotData(MVCCSnapshot snapshot)
 {
 	ProcArrayStruct *arrayP = procArray;
 	TransactionId *other_xids = ProcGlobal->xids;
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 5b21a053981..dd52782ff22 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -449,10 +449,10 @@ static void SerialSetActiveSerXmin(TransactionId xid);
 
 static uint32 predicatelock_hash(const void *key, Size keysize);
 static void SummarizeOldestCommittedSxact(void);
-static Snapshot GetSafeSnapshot(Snapshot origSnapshot);
-static Snapshot GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-													  VirtualTransactionId *sourcevxid,
-													  int sourcepid);
+static MVCCSnapshot GetSafeSnapshot(MVCCSnapshot origSnapshot);
+static MVCCSnapshot GetSerializableTransactionSnapshotInt(MVCCSnapshot snapshot,
+														  VirtualTransactionId *sourcevxid,
+														  int sourcepid);
 static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag);
 static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
 									  PREDICATELOCKTARGETTAG *parent);
@@ -1544,10 +1544,10 @@ SummarizeOldestCommittedSxact(void)
  *		for), the passed-in Snapshot pointer should reference a static data
  *		area that can safely be passed to GetSnapshotData.
  */
-static Snapshot
-GetSafeSnapshot(Snapshot origSnapshot)
+static MVCCSnapshot
+GetSafeSnapshot(MVCCSnapshot origSnapshot)
 {
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 
 	Assert(XactReadOnly && XactDeferrable);
 
@@ -1668,8 +1668,8 @@ GetSafeSnapshotBlockingPids(int blocked_pid, int *output, int output_size)
  * always this same pointer; no new snapshot data structure is allocated
  * within this function.
  */
-Snapshot
-GetSerializableTransactionSnapshot(Snapshot snapshot)
+MVCCSnapshot
+GetSerializableTransactionSnapshot(MVCCSnapshot snapshot)
 {
 	Assert(IsolationIsSerializable());
 
@@ -1709,7 +1709,7 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
  * read-only.
  */
 void
-SetSerializableTransactionSnapshot(Snapshot snapshot,
+SetSerializableTransactionSnapshot(MVCCSnapshot snapshot,
 								   VirtualTransactionId *sourcevxid,
 								   int sourcepid)
 {
@@ -1750,8 +1750,8 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
  * source xact is still running after we acquire SerializableXactHashLock.
  * We do that by calling ProcArrayInstallImportedXmin.
  */
-static Snapshot
-GetSerializableTransactionSnapshotInt(Snapshot snapshot,
+static MVCCSnapshot
+GetSerializableTransactionSnapshotInt(MVCCSnapshot snapshot,
 									  VirtualTransactionId *sourcevxid,
 									  int sourcepid)
 {
@@ -3961,12 +3961,12 @@ ReleaseOneSerializableXact(SERIALIZABLEXACT *sxact, bool partial,
 static bool
 XidIsConcurrent(TransactionId xid)
 {
-	Snapshot	snap;
+	MVCCSnapshot snap;
 
 	Assert(TransactionIdIsValid(xid));
 	Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
 
-	snap = GetTransactionSnapshot();
+	snap = (MVCCSnapshot) GetTransactionSnapshot();
 
 	if (TransactionIdPrecedes(xid, snap->xmin))
 		return false;
@@ -4214,7 +4214,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		}
 		else if (!SxactIsDoomed(sxact)
 				 && (!SxactIsCommitted(sxact)
-					 || TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
+					 || TransactionIdPrecedes(TransactionXmin,
 											  sxact->finishedBefore))
 				 && !RWConflictExists(sxact, MySerializableXact))
 		{
@@ -4227,7 +4227,7 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 			 */
 			if (!SxactIsDoomed(sxact)
 				&& (!SxactIsCommitted(sxact)
-					|| TransactionIdPrecedes(GetTransactionSnapshot()->xmin,
+					|| TransactionIdPrecedes(TransactionXmin,
 											 sxact->finishedBefore))
 				&& !RWConflictExists(sxact, MySerializableXact))
 			{
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index 88d798fbf4b..0a27f0dd8a0 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -372,10 +372,10 @@ pg_current_snapshot(PG_FUNCTION_ARGS)
 	pg_snapshot *snap;
 	uint32		nxip,
 				i;
-	Snapshot	cur;
+	MVCCSnapshot cur;
 	FullTransactionId next_fxid = ReadNextFullTransactionId();
 
-	cur = GetActiveSnapshot();
+	cur = (MVCCSnapshot) GetActiveSnapshot();
 	if (cur == NULL)
 		elog(ERROR, "no active snapshot set");
 
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ea35f30f494..78adb6d575a 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -137,18 +137,18 @@
  * These SnapshotData structs are static to simplify memory allocation
  * (see the hack in GetSnapshotData to avoid repeated malloc/free).
  */
-static SnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
-static SnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
-static SnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
+static MVCCSnapshotData CurrentSnapshotData = {SNAPSHOT_MVCC};
+static MVCCSnapshotData SecondarySnapshotData = {SNAPSHOT_MVCC};
+static MVCCSnapshotData CatalogSnapshotData = {SNAPSHOT_MVCC};
 SnapshotData SnapshotSelfData = {SNAPSHOT_SELF};
 SnapshotData SnapshotAnyData = {SNAPSHOT_ANY};
 SnapshotData SnapshotToastData = {SNAPSHOT_TOAST};
 
 /* Pointers to valid snapshots */
-static Snapshot CurrentSnapshot = NULL;
-static Snapshot SecondarySnapshot = NULL;
-static Snapshot CatalogSnapshot = NULL;
-static Snapshot HistoricSnapshot = NULL;
+static MVCCSnapshot CurrentSnapshot = NULL;
+static MVCCSnapshot SecondarySnapshot = NULL;
+static MVCCSnapshot CatalogSnapshot = NULL;
+static HistoricMVCCSnapshot HistoricSnapshot = NULL;
 
 /*
  * These are updated by GetSnapshotData.  We initialize them this way
@@ -171,7 +171,7 @@ static HTAB *tuplecid_data = NULL;
  */
 typedef struct ActiveSnapshotElt
 {
-	Snapshot	as_snap;
+	MVCCSnapshot as_snap;
 	int			as_level;
 	struct ActiveSnapshotElt *as_next;
 } ActiveSnapshotElt;
@@ -196,7 +196,7 @@ bool		FirstSnapshotSet = false;
  * FirstSnapshotSet in combination with IsolationUsesXactSnapshot(), because
  * GUC may be reset before us, changing the value of IsolationUsesXactSnapshot.
  */
-static Snapshot FirstXactSnapshot = NULL;
+static MVCCSnapshot FirstXactSnapshot = NULL;
 
 /* Define pathname of exported-snapshot files */
 #define SNAPSHOT_EXPORT_DIR "pg_snapshots"
@@ -205,16 +205,16 @@ static Snapshot FirstXactSnapshot = NULL;
 typedef struct ExportedSnapshot
 {
 	char	   *snapfile;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 } ExportedSnapshot;
 
 /* Current xact's exported snapshots (a list of ExportedSnapshot structs) */
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
+static MVCCSnapshot CopyMVCCSnapshot(MVCCSnapshot snapshot);
 static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
+static void FreeMVCCSnapshot(MVCCSnapshot snapshot);
 static void SnapshotResetXmin(void);
 
 /* ResourceOwner callbacks to track snapshot references */
@@ -308,8 +308,9 @@ GetTransactionSnapshot(void)
 				CurrentSnapshot = GetSerializableTransactionSnapshot(&CurrentSnapshotData);
 			else
 				CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
+
 			/* Make a saved copy */
-			CurrentSnapshot = CopySnapshot(CurrentSnapshot);
+			CurrentSnapshot = CopyMVCCSnapshot(CurrentSnapshot);
 			FirstXactSnapshot = CurrentSnapshot;
 			/* Mark it as "registered" in FirstXactSnapshot */
 			FirstXactSnapshot->regd_count++;
@@ -319,18 +320,18 @@ GetTransactionSnapshot(void)
 			CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
 		FirstSnapshotSet = true;
-		return CurrentSnapshot;
+		return (Snapshot) CurrentSnapshot;
 	}
 
 	if (IsolationUsesXactSnapshot())
-		return CurrentSnapshot;
+		return (Snapshot) CurrentSnapshot;
 
 	/* Don't allow catalog snapshot to be older than xact snapshot. */
 	InvalidateCatalogSnapshot();
 
 	CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
-	return CurrentSnapshot;
+	return (Snapshot) CurrentSnapshot;
 }
 
 /*
@@ -361,7 +362,7 @@ GetLatestSnapshot(void)
 
 	SecondarySnapshot = GetSnapshotData(&SecondarySnapshotData);
 
-	return SecondarySnapshot;
+	return (Snapshot) SecondarySnapshot;
 }
 
 /*
@@ -380,7 +381,7 @@ GetCatalogSnapshot(Oid relid)
 	 * finishing decoding.
 	 */
 	if (HistoricSnapshotActive())
-		return HistoricSnapshot;
+		return (Snapshot) HistoricSnapshot;
 
 	return GetNonHistoricCatalogSnapshot(relid);
 }
@@ -426,7 +427,7 @@ GetNonHistoricCatalogSnapshot(Oid relid)
 		pairingheap_add(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 	}
 
-	return CatalogSnapshot;
+	return (Snapshot) CatalogSnapshot;
 }
 
 /*
@@ -495,7 +496,7 @@ SnapshotSetCommandId(CommandId curcid)
  * in GetTransactionSnapshot.
  */
 static void
-SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
+SetTransactionSnapshot(MVCCSnapshot sourcesnap, VirtualTransactionId *sourcevxid,
 					   int sourcepid, PGPROC *sourceproc)
 {
 	/* Caller should have checked this already */
@@ -574,7 +575,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcevxid,
 											   sourcepid);
 		/* Make a saved copy */
-		CurrentSnapshot = CopySnapshot(CurrentSnapshot);
+		CurrentSnapshot = CopyMVCCSnapshot(CurrentSnapshot);
 		FirstXactSnapshot = CurrentSnapshot;
 		/* Mark it as "registered" in FirstXactSnapshot */
 		FirstXactSnapshot->regd_count++;
@@ -585,29 +586,27 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 }
 
 /*
- * CopySnapshot
+ * CopyMVCCSnapshot
  *		Copy the given snapshot.
  *
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-static Snapshot
-CopySnapshot(Snapshot snapshot)
+static MVCCSnapshot
+CopyMVCCSnapshot(MVCCSnapshot snapshot)
 {
-	Snapshot	newsnap;
+	MVCCSnapshot newsnap;
 	Size		subxipoff;
 	Size		size;
 
-	Assert(snapshot != InvalidSnapshot);
-
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
+	size = subxipoff = sizeof(MVCCSnapshotData) +
 		snapshot->xcnt * sizeof(TransactionId);
 	if (snapshot->subxcnt > 0)
 		size += snapshot->subxcnt * sizeof(TransactionId);
 
-	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
-	memcpy(newsnap, snapshot, sizeof(SnapshotData));
+	newsnap = (MVCCSnapshot) MemoryContextAlloc(TopTransactionContext, size);
+	memcpy(newsnap, snapshot, sizeof(MVCCSnapshotData));
 
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
@@ -644,11 +643,11 @@ CopySnapshot(Snapshot snapshot)
 }
 
 /*
- * FreeSnapshot
+ * FreeMVCCSnapshot
  *		Free the memory associated with a snapshot.
  */
 static void
-FreeSnapshot(Snapshot snapshot)
+FreeMVCCSnapshot(MVCCSnapshot snapshot)
 {
 	Assert(snapshot->regd_count == 0);
 	Assert(snapshot->active_count == 0);
@@ -664,6 +663,8 @@ FreeSnapshot(Snapshot snapshot)
  * If the passed snapshot is a statically-allocated one, or it is possibly
  * subject to a future command counter update, create a new long-lived copy
  * with active refcount=1.  Otherwise, only increment the refcount.
+ *
+ * Only regular MVCC snaphots can be used as the active snapshot.
  */
 void
 PushActiveSnapshot(Snapshot snapshot)
@@ -682,9 +683,12 @@ PushActiveSnapshot(Snapshot snapshot)
 void
 PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 {
+	MVCCSnapshot origsnap;
 	ActiveSnapshotElt *newactive;
 
-	Assert(snapshot != InvalidSnapshot);
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
+	origsnap = &snapshot->mvcc;
+
 	Assert(ActiveSnapshot == NULL || snap_level >= ActiveSnapshot->as_level);
 
 	newactive = MemoryContextAlloc(TopTransactionContext, sizeof(ActiveSnapshotElt));
@@ -693,11 +697,11 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 	 * Checking SecondarySnapshot is probably useless here, but it seems
 	 * better to be sure.
 	 */
-	if (snapshot == CurrentSnapshot || snapshot == SecondarySnapshot ||
-		!snapshot->copied)
-		newactive->as_snap = CopySnapshot(snapshot);
+	if (origsnap == CurrentSnapshot || origsnap == SecondarySnapshot ||
+		!origsnap->copied)
+		newactive->as_snap = CopyMVCCSnapshot(origsnap);
 	else
-		newactive->as_snap = snapshot;
+		newactive->as_snap = origsnap;
 
 	newactive->as_next = ActiveSnapshot;
 	newactive->as_level = snap_level;
@@ -718,7 +722,8 @@ PushActiveSnapshotWithLevel(Snapshot snapshot, int snap_level)
 void
 PushCopiedSnapshot(Snapshot snapshot)
 {
-	PushActiveSnapshot(CopySnapshot(snapshot));
+	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
+	PushActiveSnapshot((Snapshot) CopyMVCCSnapshot(&snapshot->mvcc));
 }
 
 /*
@@ -771,7 +776,7 @@ PopActiveSnapshot(void)
 
 	if (ActiveSnapshot->as_snap->active_count == 0 &&
 		ActiveSnapshot->as_snap->regd_count == 0)
-		FreeSnapshot(ActiveSnapshot->as_snap);
+		FreeMVCCSnapshot(ActiveSnapshot->as_snap);
 
 	pfree(ActiveSnapshot);
 	ActiveSnapshot = newstack;
@@ -788,7 +793,7 @@ GetActiveSnapshot(void)
 {
 	Assert(ActiveSnapshot != NULL);
 
-	return ActiveSnapshot->as_snap;
+	return (Snapshot) ActiveSnapshot->as_snap;
 }
 
 /*
@@ -805,7 +810,8 @@ ActiveSnapshotSet(void)
  * RegisterSnapshot
  *		Register a snapshot as being in use by the current resource owner
  *
- * If InvalidSnapshot is passed, it is not registered.
+ * Only regular MVCC snaphots and "historic" MVCC snapshots can be registered.
+ * InvalidSnapshot is also accepted, as a no-op.
  */
 Snapshot
 RegisterSnapshot(Snapshot snapshot)
@@ -821,25 +827,39 @@ RegisterSnapshot(Snapshot snapshot)
  *		As above, but use the specified resource owner
  */
 Snapshot
-RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner)
+RegisterSnapshotOnOwner(Snapshot orig_snapshot, ResourceOwner owner)
 {
-	Snapshot	snap;
+	MVCCSnapshot snapshot;
 
-	if (snapshot == InvalidSnapshot)
+	if (orig_snapshot == InvalidSnapshot)
 		return InvalidSnapshot;
 
+	if (orig_snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
+	{
+		HistoricMVCCSnapshot historicsnap = &orig_snapshot->historic_mvcc;
+
+		ResourceOwnerEnlarge(owner);
+		historicsnap->regd_count++;
+		ResourceOwnerRememberSnapshot(owner, (Snapshot) historicsnap);
+
+		return (Snapshot) historicsnap;
+	}
+
+	Assert(orig_snapshot->snapshot_type == SNAPSHOT_MVCC);
+	snapshot = &orig_snapshot->mvcc;
+
 	/* Static snapshot?  Create a persistent copy */
-	snap = snapshot->copied ? snapshot : CopySnapshot(snapshot);
+	snapshot = snapshot->copied ? snapshot : CopyMVCCSnapshot(snapshot);
 
 	/* and tell resowner.c about it */
 	ResourceOwnerEnlarge(owner);
-	snap->regd_count++;
-	ResourceOwnerRememberSnapshot(owner, snap);
+	snapshot->regd_count++;
+	ResourceOwnerRememberSnapshot(owner, (Snapshot) snapshot);
 
-	if (snap->regd_count == 1)
-		pairingheap_add(&RegisteredSnapshots, &snap->ph_node);
+	if (snapshot->regd_count == 1)
+		pairingheap_add(&RegisteredSnapshots, &snapshot->ph_node);
 
-	return snap;
+	return (Snapshot) snapshot;
 }
 
 /*
@@ -875,18 +895,41 @@ UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner)
 static void
 UnregisterSnapshotNoOwner(Snapshot snapshot)
 {
-	Assert(snapshot->regd_count > 0);
-	Assert(!pairingheap_is_empty(&RegisteredSnapshots));
+	if (snapshot->snapshot_type == SNAPSHOT_MVCC)
+	{
+		MVCCSnapshot mvccsnap = &snapshot->mvcc;
+
+		Assert(mvccsnap->regd_count > 0);
+		Assert(!pairingheap_is_empty(&RegisteredSnapshots));
 
-	snapshot->regd_count--;
-	if (snapshot->regd_count == 0)
-		pairingheap_remove(&RegisteredSnapshots, &snapshot->ph_node);
+		mvccsnap->regd_count--;
+		if (mvccsnap->regd_count == 0)
+			pairingheap_remove(&RegisteredSnapshots, &mvccsnap->ph_node);
 
-	if (snapshot->regd_count == 0 && snapshot->active_count == 0)
+		if (mvccsnap->regd_count == 0 && mvccsnap->active_count == 0)
+		{
+			FreeMVCCSnapshot(mvccsnap);
+			SnapshotResetXmin();
+		}
+	}
+	else if (snapshot->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
 	{
-		FreeSnapshot(snapshot);
-		SnapshotResetXmin();
+		HistoricMVCCSnapshot historicsnap = &snapshot->historic_mvcc;
+
+		/*
+		 * Historic snapshots don't rely on the resource owner machinery for
+		 * cleanup, the snapbuild.c machinery ensures that whenever a historic
+		 * snapshot is in use, it has a non-zero refcount.  Registration is
+		 * only supported so that the callers don't need to treat regular MVCC
+		 * catalog snapshots and historic snapshots differently.
+		 */
+		Assert(historicsnap->refcount > 0);
+
+		Assert(historicsnap->regd_count > 0);
+		historicsnap->regd_count--;
 	}
+	else
+		elog(ERROR, "registered snapshot has unexpected type");
 }
 
 /*
@@ -896,8 +939,8 @@ UnregisterSnapshotNoOwner(Snapshot snapshot)
 static int
 xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
 {
-	const SnapshotData *asnap = pairingheap_const_container(SnapshotData, ph_node, a);
-	const SnapshotData *bsnap = pairingheap_const_container(SnapshotData, ph_node, b);
+	const MVCCSnapshotData *asnap = pairingheap_const_container(MVCCSnapshotData, ph_node, a);
+	const MVCCSnapshotData *bsnap = pairingheap_const_container(MVCCSnapshotData, ph_node, b);
 
 	if (TransactionIdPrecedes(asnap->xmin, bsnap->xmin))
 		return 1;
@@ -923,7 +966,7 @@ xmin_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
 static void
 SnapshotResetXmin(void)
 {
-	Snapshot	minSnapshot;
+	MVCCSnapshot minSnapshot;
 
 	if (ActiveSnapshot != NULL)
 		return;
@@ -934,7 +977,7 @@ SnapshotResetXmin(void)
 		return;
 	}
 
-	minSnapshot = pairingheap_container(SnapshotData, ph_node,
+	minSnapshot = pairingheap_container(MVCCSnapshotData, ph_node,
 										pairingheap_first(&RegisteredSnapshots));
 
 	if (TransactionIdPrecedes(MyProc->xmin, minSnapshot->xmin))
@@ -984,7 +1027,7 @@ AtSubAbort_Snapshot(int level)
 
 		if (ActiveSnapshot->as_snap->active_count == 0 &&
 			ActiveSnapshot->as_snap->regd_count == 0)
-			FreeSnapshot(ActiveSnapshot->as_snap);
+			FreeMVCCSnapshot(ActiveSnapshot->as_snap);
 
 		/* and free the stack element */
 		pfree(ActiveSnapshot);
@@ -1006,7 +1049,7 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 * In transaction-snapshot mode we must release our privately-managed
 	 * reference to the transaction snapshot.  We must remove it from
 	 * RegisteredSnapshots to keep the check below happy.  But we don't bother
-	 * to do FreeSnapshot, for two reasons: the memory will go away with
+	 * to do FreeMVCCSnapshot, for two reasons: the memory will go away with
 	 * TopTransactionContext anyway, and if someone has left the snapshot
 	 * stacked as active, we don't want the code below to be chasing through a
 	 * dangling pointer.
@@ -1099,7 +1142,7 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
  *		snapshot.
  */
 char *
-ExportSnapshot(Snapshot snapshot)
+ExportSnapshot(MVCCSnapshot snapshot)
 {
 	TransactionId topXid;
 	TransactionId *children;
@@ -1163,7 +1206,7 @@ ExportSnapshot(Snapshot snapshot)
 	 * ensure that the snapshot's xmin is honored for the rest of the
 	 * transaction.
 	 */
-	snapshot = CopySnapshot(snapshot);
+	snapshot = CopyMVCCSnapshot(snapshot);
 
 	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
 	esnap = (ExportedSnapshot *) palloc(sizeof(ExportedSnapshot));
@@ -1280,7 +1323,7 @@ pg_export_snapshot(PG_FUNCTION_ARGS)
 {
 	char	   *snapshotName;
 
-	snapshotName = ExportSnapshot(GetActiveSnapshot());
+	snapshotName = ExportSnapshot((MVCCSnapshot) GetActiveSnapshot());
 	PG_RETURN_TEXT_P(cstring_to_text(snapshotName));
 }
 
@@ -1384,7 +1427,7 @@ ImportSnapshot(const char *idstr)
 	Oid			src_dbid;
 	int			src_isolevel;
 	bool		src_readonly;
-	SnapshotData snapshot;
+	MVCCSnapshotData snapshot;
 
 	/*
 	 * Must be at top level of a fresh transaction.  Note in particular that
@@ -1653,7 +1696,7 @@ HaveRegisteredOrActiveSnapshot(void)
  * Needed for logical decoding.
  */
 void
-SetupHistoricSnapshot(Snapshot historic_snapshot, HTAB *tuplecids)
+SetupHistoricSnapshot(HistoricMVCCSnapshot historic_snapshot, HTAB *tuplecids)
 {
 	Assert(historic_snapshot != NULL);
 
@@ -1696,11 +1739,10 @@ HistoricSnapshotGetTupleCids(void)
  * SerializedSnapshotData.
  */
 Size
-EstimateSnapshotSpace(Snapshot snapshot)
+EstimateSnapshotSpace(MVCCSnapshot snapshot)
 {
 	Size		size;
 
-	Assert(snapshot != InvalidSnapshot);
 	Assert(snapshot->snapshot_type == SNAPSHOT_MVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
@@ -1720,7 +1762,7 @@ EstimateSnapshotSpace(Snapshot snapshot)
  *		memory location at start_address.
  */
 void
-SerializeSnapshot(Snapshot snapshot, char *start_address)
+SerializeSnapshot(MVCCSnapshot snapshot, char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 
@@ -1776,12 +1818,12 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
  * The copy is palloc'd in TopTransactionContext and has initial refcounts set
  * to 0.  The returned snapshot has the copied flag set.
  */
-Snapshot
+MVCCSnapshot
 RestoreSnapshot(char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 	Size		size;
-	Snapshot	snapshot;
+	MVCCSnapshot snapshot;
 	TransactionId *serialized_xids;
 
 	memcpy(&serialized_snapshot, start_address,
@@ -1790,12 +1832,12 @@ RestoreSnapshot(char *start_address)
 		(start_address + sizeof(SerializedSnapshotData));
 
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = sizeof(SnapshotData)
+	size = sizeof(MVCCSnapshotData)
 		+ serialized_snapshot.xcnt * sizeof(TransactionId)
 		+ serialized_snapshot.subxcnt * sizeof(TransactionId);
 
 	/* Copy all required fields */
-	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	snapshot = (MVCCSnapshot) MemoryContextAlloc(TopTransactionContext, size);
 	snapshot->snapshot_type = SNAPSHOT_MVCC;
 	snapshot->xmin = serialized_snapshot.xmin;
 	snapshot->xmax = serialized_snapshot.xmax;
@@ -1840,7 +1882,7 @@ RestoreSnapshot(char *start_address)
  * the declaration for PGPROC.
  */
 void
-RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
+RestoreTransactionSnapshot(MVCCSnapshot snapshot, void *source_pgproc)
 {
 	SetTransactionSnapshot(snapshot, NULL, InvalidPid, source_pgproc);
 }
@@ -1856,7 +1898,7 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
  * XID could not be ours anyway.
  */
 bool
-XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
+XidInMVCCSnapshot(TransactionId xid, MVCCSnapshot snapshot)
 {
 	/*
 	 * Make a quick range check to eliminate most XIDs without looking at the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1640d9c32f7..3d3ea109a4c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -431,7 +431,7 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup,
  */
 struct HTAB;
 extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
-										  Snapshot snapshot,
+										  HistoricMVCCSnapshot snapshot,
 										  HeapTuple htup,
 										  Buffer buffer,
 										  CommandId *cmin, CommandId *cmax);
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..2626f2996d8 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -34,7 +34,7 @@ typedef struct TableScanDescData
 {
 	/* scan parameters */
 	Relation	rs_rd;			/* heap relation descriptor */
-	struct SnapshotData *rs_snapshot;	/* snapshot to see */
+	union SnapshotData *rs_snapshot;	/* snapshot to see */
 	int			rs_nkeys;		/* number of scan keys */
 	struct ScanKeyData *rs_key; /* array of scan key descriptors */
 
@@ -135,7 +135,7 @@ typedef struct IndexScanDescData
 	/* scan parameters */
 	Relation	heapRelation;	/* heap relation descriptor, or NULL */
 	Relation	indexRelation;	/* index relation descriptor */
-	struct SnapshotData *xs_snapshot;	/* snapshot to see */
+	union SnapshotData *xs_snapshot;	/* snapshot to see */
 	int			numberOfKeys;	/* number of index qualifier conditions */
 	int			numberOfOrderBys;	/* number of ordering operators */
 	struct ScanKeyData *keyData;	/* array of index qualifier descriptors */
@@ -210,7 +210,7 @@ typedef struct SysScanDescData
 	Relation	irel;			/* NULL if doing heap scan */
 	struct TableScanDescData *scan; /* only valid in storage-scan case */
 	struct IndexScanDescData *iscan;	/* only valid in index-scan case */
-	struct SnapshotData *snapshot;	/* snapshot to unregister at end of scan */
+	union SnapshotData *snapshot;	/* snapshot to unregister at end of scan */
 	struct TupleTableSlot *slot;
 }			SysScanDescData;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 3be0cbd7ebe..8bf72c64c94 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -127,7 +127,7 @@ typedef struct ReorderBufferChange
 		}			msg;
 
 		/* New snapshot, set when action == *_INTERNAL_SNAPSHOT */
-		Snapshot	snapshot;
+		HistoricMVCCSnapshot snapshot;
 
 		/*
 		 * New command id for existing snapshot in a catalog changing tx. Set
@@ -359,7 +359,7 @@ typedef struct ReorderBufferTXN
 	 * transaction modifies the catalog, or another catalog-modifying
 	 * transaction commits.
 	 */
-	Snapshot	base_snapshot;
+	HistoricMVCCSnapshot base_snapshot;
 	XLogRecPtr	base_snapshot_lsn;
 	dlist_node	base_snapshot_node; /* link in txns_by_base_snapshot_lsn */
 
@@ -367,7 +367,7 @@ typedef struct ReorderBufferTXN
 	 * Snapshot/CID from the previous streaming run. Only valid for already
 	 * streamed transactions (NULL/InvalidCommandId otherwise).
 	 */
-	Snapshot	snapshot_now;
+	HistoricMVCCSnapshot snapshot_now;
 	CommandId	command_id;
 
 	/*
@@ -703,7 +703,7 @@ extern void ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid,
 									 XLogRecPtr lsn, ReorderBufferChange *change,
 									 bool toast_insert);
 extern void ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid,
-									  Snapshot snap, XLogRecPtr lsn,
+									  HistoricMVCCSnapshot snap, XLogRecPtr lsn,
 									  bool transactional, const char *prefix,
 									  Size message_size, const char *message);
 extern void ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
@@ -727,9 +727,9 @@ extern void ReorderBufferForget(ReorderBuffer *rb, TransactionId xid, XLogRecPtr
 extern void ReorderBufferInvalidate(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn);
 
 extern void ReorderBufferSetBaseSnapshot(ReorderBuffer *rb, TransactionId xid,
-										 XLogRecPtr lsn, Snapshot snap);
+										 XLogRecPtr lsn, HistoricMVCCSnapshot snap);
 extern void ReorderBufferAddSnapshot(ReorderBuffer *rb, TransactionId xid,
-									 XLogRecPtr lsn, Snapshot snap);
+									 XLogRecPtr lsn, HistoricMVCCSnapshot snap);
 extern void ReorderBufferAddNewCommandId(ReorderBuffer *rb, TransactionId xid,
 										 XLogRecPtr lsn, CommandId cid);
 extern void ReorderBufferAddNewTupleCids(ReorderBuffer *rb, TransactionId xid,
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..5930ffb55a8 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -70,15 +70,15 @@ extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *reorder,
 										  XLogRecPtr two_phase_at);
 extern void FreeSnapshotBuilder(SnapBuild *builder);
 
-extern void SnapBuildSnapDecRefcount(Snapshot snap);
+extern void SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap);
 
-extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern MVCCSnapshot SnapBuildInitialSnapshot(SnapBuild *builder);
 extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
 extern void SnapBuildClearExportedSnapshot(void);
 extern void SnapBuildResetExportedSnapshotState(void);
 
 extern SnapBuildState SnapBuildCurrentState(SnapBuild *builder);
-extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder);
+extern HistoricMVCCSnapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr);
 extern XLogRecPtr SnapBuildGetTwoPhaseAt(SnapBuild *builder);
diff --git a/src/include/replication/snapbuild_internal.h b/src/include/replication/snapbuild_internal.h
index 3b915dc8793..9bed20efa31 100644
--- a/src/include/replication/snapbuild_internal.h
+++ b/src/include/replication/snapbuild_internal.h
@@ -74,7 +74,7 @@ struct SnapBuild
 	/*
 	 * Snapshot that's valid to see the catalog state seen at this moment.
 	 */
-	Snapshot	snapshot;
+	HistoricMVCCSnapshot snapshot;
 
 	/*
 	 * LSN of the last location we are sure a snapshot has been serialized to.
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 267d5d90e94..6a78dfeac96 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -47,8 +47,8 @@ extern void CheckPointPredicate(void);
 extern bool PageIsPredicateLocked(Relation relation, BlockNumber blkno);
 
 /* predicate lock maintenance */
-extern Snapshot GetSerializableTransactionSnapshot(Snapshot snapshot);
-extern void SetSerializableTransactionSnapshot(Snapshot snapshot,
+extern MVCCSnapshot GetSerializableTransactionSnapshot(MVCCSnapshot snapshot);
+extern void SetSerializableTransactionSnapshot(MVCCSnapshot snapshot,
 											   VirtualTransactionId *sourcevxid,
 											   int sourcepid);
 extern void RegisterPredicateLockingXid(TransactionId xid);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ef0b733ebe8..7f5727c2586 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -44,7 +44,7 @@ extern void KnownAssignedTransactionIdsIdleMaintenance(void);
 extern int	GetMaxSnapshotXidCount(void);
 extern int	GetMaxSnapshotSubxidCount(void);
 
-extern Snapshot GetSnapshotData(Snapshot snapshot);
+extern MVCCSnapshot GetSnapshotData(MVCCSnapshot snapshot);
 
 extern bool ProcArrayInstallImportedXmin(TransactionId xmin,
 										 VirtualTransactionId *sourcevxid);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..1f627ff966d 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -49,7 +49,7 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
  */
 #define InitNonVacuumableSnapshot(snapshotdata, vistestp)  \
 	((snapshotdata).snapshot_type = SNAPSHOT_NON_VACUUMABLE, \
-	 (snapshotdata).vistest = (vistestp))
+	 (snapshotdata).nonvacuumable.vistest = (vistestp))
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
@@ -89,7 +89,7 @@ extern void WaitForOlderSnapshots(TransactionId limitXmin, bool progress);
 extern bool ThereAreNoPriorRegisteredSnapshots(void);
 extern bool HaveRegisteredOrActiveSnapshot(void);
 
-extern char *ExportSnapshot(Snapshot snapshot);
+extern char *ExportSnapshot(MVCCSnapshot snapshot);
 
 /*
  * These live in procarray.c because they're intimately linked to the
@@ -105,18 +105,18 @@ extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 /*
  * Utility functions for implementing visibility routines in table AMs.
  */
-extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+extern bool XidInMVCCSnapshot(TransactionId xid, MVCCSnapshot snapshot);
 
 /* Support for catalog timetravel for logical decoding */
 struct HTAB;
 extern struct HTAB *HistoricSnapshotGetTupleCids(void);
-extern void SetupHistoricSnapshot(Snapshot historic_snapshot, struct HTAB *tuplecids);
+extern void SetupHistoricSnapshot(HistoricMVCCSnapshot historic_snapshot, struct HTAB *tuplecids);
 extern void TeardownHistoricSnapshot(bool is_error);
 extern bool HistoricSnapshotActive(void);
 
-extern Size EstimateSnapshotSpace(Snapshot snapshot);
-extern void SerializeSnapshot(Snapshot snapshot, char *start_address);
-extern Snapshot RestoreSnapshot(char *start_address);
-extern void RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc);
+extern Size EstimateSnapshotSpace(MVCCSnapshot snapshot);
+extern void SerializeSnapshot(MVCCSnapshot snapshot, char *start_address);
+extern MVCCSnapshot RestoreSnapshot(char *start_address);
+extern void RestoreTransactionSnapshot(MVCCSnapshot snapshot, void *source_pgproc);
 
 #endif							/* SNAPMGR_H */
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 0e546ec1497..93c1f51784f 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -17,7 +17,7 @@
 
 
 /*
- * The different snapshot types.  We use SnapshotData structures to represent
+ * The different snapshot types.  We use the SnapshotData union to represent
  * both "regular" (MVCC) snapshots and "special" snapshots that have non-MVCC
  * semantics.  The specific semantics of a snapshot are encoded by its type.
  *
@@ -27,6 +27,9 @@
  * The reason the snapshot type rather than a callback as it used to be is
  * that that allows to use the same snapshot for different table AMs without
  * having one callback per AM.
+ *
+ * The executor deals with MVCC snapshots, but the table AM and some other
+ * parts of the system also support the special snapshots.
  */
 typedef enum SnapshotType
 {
@@ -100,7 +103,9 @@ typedef enum SnapshotType
 	/*
 	 * A tuple is visible iff it follows the rules of SNAPSHOT_MVCC, but
 	 * supports being called in timetravel context (for decoding catalog
-	 * contents in the context of logical decoding).
+	 * contents in the context of logical decoding).  A historic MVCC snapshot
+	 * should only be used on catalog tables, as we only track XIDs that
+	 * modify catalogs during logical decoding.
 	 */
 	SNAPSHOT_HISTORIC_MVCC,
 
@@ -114,37 +119,18 @@ typedef enum SnapshotType
 	SNAPSHOT_NON_VACUUMABLE,
 } SnapshotType;
 
-typedef struct SnapshotData *Snapshot;
-
-#define InvalidSnapshot		((Snapshot) NULL)
-
 /*
- * Struct representing all kind of possible snapshots.
+ * Struct representing a normal MVCC snapshot.
  *
- * There are several different kinds of snapshots:
- * * Normal MVCC snapshots
- * * MVCC snapshots taken during recovery (in Hot-Standby mode)
- * * Historic MVCC snapshots used during logical decoding
- * * snapshots passed to HeapTupleSatisfiesDirty()
- * * snapshots passed to HeapTupleSatisfiesNonVacuumable()
- * * snapshots used for SatisfiesAny, Toast, Self where no members are
- *	 accessed.
- *
- * TODO: It's probably a good idea to split this struct using a NodeTag
- * similar to how parser and executor nodes are handled, with one type for
- * each different kind of snapshot to avoid overloading the meaning of
- * individual fields.
+ * MVCC snapshots come in two variants: those taken during recovery in hot
+ * standby mode, and "normal" MVCC snapshots.  They are distinguished by
+ * takenDuringRecovery.
  */
-typedef struct SnapshotData
+typedef struct MVCCSnapshotData
 {
-	SnapshotType snapshot_type; /* type of snapshot */
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
 
 	/*
-	 * The remaining fields are used only for MVCC snapshots, and are normally
-	 * just zeroes in special snapshots.  (But xmin and xmax are used
-	 * specially by HeapTupleSatisfiesDirty, and xmin is used specially by
-	 * HeapTupleSatisfiesNonVacuumable.)
-	 *
 	 * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
 	 * the effects of all older XIDs except those listed in the snapshot. xmin
 	 * is stored as an optimization to avoid needing to search the XID arrays
@@ -154,10 +140,8 @@ typedef struct SnapshotData
 	TransactionId xmax;			/* all XID >= xmax are invisible to me */
 
 	/*
-	 * For normal MVCC snapshot this contains the all xact IDs that are in
-	 * progress, unless the snapshot was taken during recovery in which case
-	 * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
-	 * it contains *committed* transactions between xmin and xmax.
+	 * xip contains the all xact IDs that are in progress, unless the snapshot
+	 * was taken during recovery in which case it's empty.
 	 *
 	 * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
 	 */
@@ -165,10 +149,8 @@ typedef struct SnapshotData
 	uint32		xcnt;			/* # of xact ids in xip[] */
 
 	/*
-	 * For non-historic MVCC snapshots, this contains subxact IDs that are in
-	 * progress (and other transactions that are in progress if taken during
-	 * recovery). For historic snapshot it contains *all* xids assigned to the
-	 * replayed transaction, including the toplevel xid.
+	 * subxip contains subxact IDs that are in progress (and other
+	 * transactions that are in progress if taken during recovery).
 	 *
 	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
 	 * out any that are >= xmax
@@ -182,18 +164,6 @@ typedef struct SnapshotData
 
 	CommandId	curcid;			/* in my xact, CID < curcid are visible */
 
-	/*
-	 * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC
-	 * snapshots.
-	 */
-	uint32		speculativeToken;
-
-	/*
-	 * For SNAPSHOT_NON_VACUUMABLE (and hopefully more in the future) this is
-	 * used to determine whether row could be vacuumed.
-	 */
-	struct GlobalVisState *vistest;
-
 	/*
 	 * Book-keeping information, used by the snapshot manager
 	 */
@@ -207,6 +177,97 @@ typedef struct SnapshotData
 	 * transactions completed since the last GetSnapshotData().
 	 */
 	uint64		snapXactCompletionCount;
+} MVCCSnapshotData;
+
+typedef struct MVCCSnapshotData *MVCCSnapshot;
+
+#define InvalidMVCCSnapshot ((MVCCSnapshot) NULL)
+
+/*
+ * Struct representing a "historic" MVCC snapshot during logical decoding.
+ * These are constructed by src/replication/logical/snapbuild.c.
+ */
+typedef struct HistoricMVCCSnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
+
+	/*
+	 * xmin and xmax like in a normal MVCC snapshot.
+	 */
+	TransactionId xmin;			/* all XID < xmin are visible to me */
+	TransactionId xmax;			/* all XID >= xmax are invisible to me */
+
+	/*
+	 * committed_xids contains *committed* transactions between xmin and xmax.
+	 * (This is the inverse of 'xip' in normal MVCC snapshots, which contains
+	 * all non-committed transactions.)  The array is sorted by XID to allow
+	 * binary search.
+	 *
+	 * note: all ids in committed_xids[] satisfy xmin <= committed_xids[i] <
+	 * xmax
+	 */
+	TransactionId *committed_xids;
+	uint32		xcnt;			/* # of xact ids in committed_xids[] */
+
+	/*
+	 * curxip contains *all* xids assigned to the replayed transaction,
+	 * including the toplevel xid.
+	 */
+	TransactionId *curxip;
+	int32		curxcnt;		/* # of xact ids in curxip[] */
+
+	CommandId	curcid;			/* in my xact, CID < curcid are visible */
+
+	bool		copied;			/* false if it's a "base" snapshot */
+
+	uint32		refcount;		/* refcount managed by snapbuild.c  */
+	uint32		regd_count;		/* refcount registered with resource owners */
+
+} HistoricMVCCSnapshotData;
+
+typedef struct HistoricMVCCSnapshotData *HistoricMVCCSnapshot;
+
+/*
+ * Struct representing a special "snapshot" which sees all tuples as visible
+ * if they are visible to anyone, i.e. if they are not vacuumable.
+ * i.e. SNAPSHOT_NON_VACUUMABLE.
+ */
+typedef struct NonVacuumableSnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
+
+	/* This is used to determine whether row could be vacuumed. */
+	struct GlobalVisState *vistest;
+} NonVacuumableSnapshotData;
+
+/*
+ * Return values to the caller of HeapTupleSatisfyDirty.
+ */
+typedef struct DirtySnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot, must be first */
+
+	TransactionId xmin;
+	TransactionId xmax;
+	uint32		speculativeToken;
+} DirtySnapshotData;
+
+/*
+ * Generic union representing all kind of possible snapshots.  Some have
+ * type-specific structs.
+ */
+typedef union SnapshotData
+{
+	SnapshotType snapshot_type; /* type of snapshot */
+
+	MVCCSnapshotData mvcc;
+	DirtySnapshotData dirty;
+	HistoricMVCCSnapshotData historic_mvcc;
+	NonVacuumableSnapshotData nonvacuumable;
 } SnapshotData;
 
+typedef union SnapshotData *Snapshot;
+
+#define InvalidSnapshot		((Snapshot) NULL)
+
 #endif							/* SNAPSHOT_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 93339ef3c58..b1a144917c8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -628,6 +628,7 @@ DictThesaurus
 DimensionInfo
 DirectoryMethodData
 DirectoryMethodFile
+DirtySnapshotData
 DisableTimeoutParams
 DiscardMode
 DiscardStmt
@@ -1175,6 +1176,7 @@ HeapTupleFreeze
 HeapTupleHeader
 HeapTupleHeaderData
 HeapTupleTableSlot
+HistoricMVCCSnapshotData
 HistControl
 HotStandbyState
 I32
@@ -1623,6 +1625,7 @@ MINIDUMPWRITEDUMP
 MINIDUMP_TYPE
 MJEvalResult
 MTTargetRelLookup
+MVCCSnapshotData
 MVDependencies
 MVDependency
 MVNDistinct
@@ -1722,6 +1725,7 @@ NextValueExpr
 Node
 NodeTag
 NonEmptyRange
+NonVacuumableSnapshotData
 Notification
 NotificationList
 NotifyStmt
-- 
2.39.5

v2-0002-Simplify-historic-snapshot-refcounting.patchtext/x-patch; charset=UTF-8; name=v2-0002-Simplify-historic-snapshot-refcounting.patchDownload

From 3a0c4d145d95f9b39603980cace6114de338acfd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 13 Mar 2025 16:45:12 +0200
Subject: [PATCH v2 2/2] Simplify historic snapshot refcounting

ReorderBufferProcessTXN() handled "copied" snapshots created with
ReorderBufferCopySnap() differently from "base" historic snapshots
created by snapbuild.c. The base snapshots used a reference count,
while copied snapshots did not. Simplify by using the reference count
for both.
---
 .../replication/logical/reorderbuffer.c       | 97 ++++++++-----------
 src/backend/replication/logical/snapbuild.c   | 48 +--------
 src/include/replication/snapbuild.h           |  1 +
 src/include/utils/snapshot.h                  |  2 -
 4 files changed, 46 insertions(+), 102 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index e8196a8d5d5..e47970f1c82 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -103,7 +103,7 @@
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
 #include "replication/slot.h"
-#include "replication/snapbuild.h"	/* just for SnapBuildSnapDecRefcount */
+#include "replication/snapbuild.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/procarray.h"
@@ -268,7 +268,6 @@ static void ReorderBufferSerializedPath(char *path, ReplicationSlot *slot,
 										TransactionId xid, XLogSegNo segno);
 static int	ReorderBufferTXNSizeCompare(const pairingheap_node *a, const pairingheap_node *b, void *arg);
 
-static void ReorderBufferFreeSnap(ReorderBuffer *rb, HistoricMVCCSnapshot snap);
 static HistoricMVCCSnapshot ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
 												  ReorderBufferTXN *txn, CommandId cid);
 
@@ -543,7 +542,7 @@ ReorderBufferFreeChange(ReorderBuffer *rb, ReorderBufferChange *change,
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			if (change->data.snapshot)
 			{
-				ReorderBufferFreeSnap(rb, change->data.snapshot);
+				SnapBuildSnapDecRefcount(change->data.snapshot);
 				change->data.snapshot = NULL;
 			}
 			break;
@@ -1593,7 +1592,8 @@ ReorderBufferCleanupTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	if (txn->snapshot_now != NULL)
 	{
 		Assert(rbtxn_is_streamed(txn));
-		ReorderBufferFreeSnap(rb, txn->snapshot_now);
+		SnapBuildSnapDecRefcount(txn->snapshot_now);
+		txn->snapshot_now = NULL;
 	}
 
 	/*
@@ -1902,7 +1902,6 @@ ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
 	snap = MemoryContextAllocZero(rb->context, size);
 	memcpy(snap, orig_snap, sizeof(HistoricMVCCSnapshotData));
 
-	snap->copied = true;
 	snap->refcount = 1;			/* mark as active so nobody frees it */
 	snap->regd_count = 0;
 	snap->committed_xids = (TransactionId *) (snap + 1);
@@ -1942,18 +1941,6 @@ ReorderBufferCopySnap(ReorderBuffer *rb, HistoricMVCCSnapshot orig_snap,
 	return snap;
 }
 
-/*
- * Free a previously ReorderBufferCopySnap'ed snapshot
- */
-static void
-ReorderBufferFreeSnap(ReorderBuffer *rb, HistoricMVCCSnapshot snap)
-{
-	if (snap->copied)
-		pfree(snap);
-	else
-		SnapBuildSnapDecRefcount(snap);
-}
-
 /*
  * If the transaction was (partially) streamed, we need to prepare or commit
  * it in a 'streamed' way.  That is, we first stream the remaining part of the
@@ -2104,11 +2091,8 @@ ReorderBufferSaveTXNSnapshot(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	txn->command_id = command_id;
 
 	/* Avoid copying if it's already copied. */
-	if (snapshot_now->copied)
-		txn->snapshot_now = snapshot_now;
-	else
-		txn->snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
-												  txn, command_id);
+	txn->snapshot_now = snapshot_now;
+	SnapBuildSnapIncRefcount(txn->snapshot_now);
 }
 
 /*
@@ -2208,6 +2192,8 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	/* setup the initial snapshot */
 	SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+	/* increase refcount for the installed historic snapshot */
+	SnapBuildSnapIncRefcount(snapshot_now);
 
 	/*
 	 * Decoding needs access to syscaches et al., which in turn use
@@ -2511,33 +2497,12 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 					/* get rid of the old */
 					TeardownHistoricSnapshot(false);
-
-					if (snapshot_now->copied)
-					{
-						ReorderBufferFreeSnap(rb, snapshot_now);
-						snapshot_now =
-							ReorderBufferCopySnap(rb, change->data.snapshot,
-												  txn, command_id);
-					}
-
-					/*
-					 * Restored from disk, need to be careful not to double
-					 * free. We could introduce refcounting for that, but for
-					 * now this seems infrequent enough not to care.
-					 */
-					else if (change->data.snapshot->copied)
-					{
-						snapshot_now =
-							ReorderBufferCopySnap(rb, change->data.snapshot,
-												  txn, command_id);
-					}
-					else
-					{
-						snapshot_now = change->data.snapshot;
-					}
+					SnapBuildSnapDecRefcount(snapshot_now);
 
 					/* and continue with the new one */
+					snapshot_now = change->data.snapshot;
 					SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
+					SnapBuildSnapIncRefcount(snapshot_now);
 					break;
 
 				case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
@@ -2547,16 +2512,26 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					{
 						command_id = change->data.command_id;
 
-						if (!snapshot_now->copied)
+						TeardownHistoricSnapshot(false);
+
+						/*
+						 * Construct a new snapshot with the new command ID.
+						 *
+						 * If this is the only reference to the snapshot, and
+						 * it's a "copied" snapshot that already contains all
+						 * the replayed transaction's XIDs (curxnct > 0), we
+						 * can take a shortcut and update the snapshot's
+						 * command ID in place.
+						 */
+						if (snapshot_now->refcount == 1 && snapshot_now->curxcnt > 0)
+							snapshot_now->curcid = command_id;
+						else
 						{
-							/* we don't use the global one anymore */
+							SnapBuildSnapDecRefcount(snapshot_now);
 							snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
 																 txn, command_id);
 						}
 
-						snapshot_now->curcid = command_id;
-
-						TeardownHistoricSnapshot(false);
 						SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
 					}
 
@@ -2646,11 +2621,11 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 */
 		if (streaming)
 			ReorderBufferSaveTXNSnapshot(rb, txn, snapshot_now, command_id);
-		else if (snapshot_now->copied)
-			ReorderBufferFreeSnap(rb, snapshot_now);
 
 		/* cleanup */
 		TeardownHistoricSnapshot(false);
+		SnapBuildSnapDecRefcount(snapshot_now);
+		snapshot_now = NULL;
 
 		/*
 		 * Aborting the current (sub-)transaction as a whole has the right
@@ -2703,6 +2678,11 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 		TeardownHistoricSnapshot(true);
 
+		/*
+		 * don't decrement the refcount on snapshot_now yet, we still use it
+		 * in the ReorderBufferResetTXN() call below.
+		 */
+
 		/*
 		 * Force cache invalidation to happen outside of a valid transaction
 		 * to prevent catalog access as we just caught an error.
@@ -2751,9 +2731,15 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			ReorderBufferResetTXN(rb, txn, snapshot_now,
 								  command_id, prev_lsn,
 								  specinsert);
+
+			SnapBuildSnapDecRefcount(snapshot_now);
+			snapshot_now = NULL;
 		}
 		else
 		{
+			SnapBuildSnapDecRefcount(snapshot_now);
+			snapshot_now = NULL;
+
 			ReorderBufferCleanupTXN(rb, txn);
 			MemoryContextSwitchTo(ecxt);
 			PG_RE_THROW();
@@ -4256,8 +4242,7 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 											 txn, command_id);
 
 		/* Free the previously copied snapshot. */
-		Assert(txn->snapshot_now->copied);
-		ReorderBufferFreeSnap(rb, txn->snapshot_now);
+		SnapBuildSnapDecRefcount(txn->snapshot_now);
 		txn->snapshot_now = NULL;
 	}
 
@@ -4647,7 +4632,7 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				newsnap->committed_xids = (TransactionId *)
 					(((char *) newsnap) + sizeof(HistoricMVCCSnapshotData));
 				newsnap->curxip = newsnap->committed_xids + newsnap->xcnt;
-				newsnap->copied = true;
+				newsnap->refcount = 1;
 				break;
 			}
 			/* the base struct contains all the data, easy peasy */
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 7a341418a74..50dca7cb758 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -157,10 +157,6 @@ static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 /* snapshot building/manipulation/distribution functions */
 static HistoricMVCCSnapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
-static void SnapBuildFreeSnapshot(HistoricMVCCSnapshot snap);
-
-static void SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap);
-
 static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
 static inline bool SnapBuildXidHasCatalogChanges(SnapBuild *builder, TransactionId xid,
@@ -245,29 +241,6 @@ FreeSnapshotBuilder(SnapBuild *builder)
 	MemoryContextDelete(context);
 }
 
-/*
- * Free an unreferenced snapshot that has previously been built by us.
- */
-static void
-SnapBuildFreeSnapshot(HistoricMVCCSnapshot snap)
-{
-	/* make sure we don't get passed an external snapshot */
-	Assert(snap->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
-
-	/* make sure nobody modified our snapshot */
-	Assert(snap->curcid == FirstCommandId);
-	Assert(snap->regd_count == 0);
-
-	/* slightly more likely, so it's checked even without c-asserts */
-	if (snap->copied)
-		elog(ERROR, "cannot free a copied snapshot");
-
-	if (snap->refcount)
-		elog(ERROR, "cannot free a snapshot that's in use");
-
-	pfree(snap);
-}
-
 /*
  * In which state of snapshot building are we?
  */
@@ -310,7 +283,7 @@ SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr)
  * This is used when handing out a snapshot to some external resource or when
  * adding a Snapshot as builder->snapshot.
  */
-static void
+void
 SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap)
 {
 	snap->refcount++;
@@ -318,9 +291,6 @@ SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap)
 
 /*
  * Decrease refcount of a snapshot and free if the refcount reaches zero.
- *
- * Externally visible, so that external resources that have been handed an
- * IncRef'ed Snapshot can adjust its refcount easily.
  */
 void
 SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap)
@@ -328,19 +298,12 @@ SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap)
 	/* make sure we don't get passed an external snapshot */
 	Assert(snap->snapshot_type == SNAPSHOT_HISTORIC_MVCC);
 
-	/* make sure nobody modified our snapshot */
-	Assert(snap->curcid == FirstCommandId);
-
 	Assert(snap->refcount > 0);
 	Assert(snap->regd_count == 0);
 
-	/* slightly more likely, so it's checked even without casserts */
-	if (snap->copied)
-		elog(ERROR, "cannot free a copied snapshot");
-
 	snap->refcount--;
 	if (snap->refcount == 0)
-		SnapBuildFreeSnapshot(snap);
+		pfree(snap);
 }
 
 /*
@@ -413,7 +376,6 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->curxcnt = 0;
 	snapshot->curxip = NULL;
 
-	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
 	snapshot->refcount = 0;
 	snapshot->regd_count = 0;
@@ -1037,18 +999,16 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
 		builder->snapshot = SnapBuildBuildSnapshot(builder);
+		SnapBuildSnapIncRefcount(builder->snapshot);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
 		{
-			SnapBuildSnapIncRefcount(builder->snapshot);
 			ReorderBufferSetBaseSnapshot(builder->reorder, xid, lsn,
 										 builder->snapshot);
+			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
 
-		/* refcount of the snapshot builder for the new snapshot */
-		SnapBuildSnapIncRefcount(builder->snapshot);
-
 		/* add a new catalog snapshot to all currently running transactions */
 		SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
 	}
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 5930ffb55a8..6095013a299 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -70,6 +70,7 @@ extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *reorder,
 										  XLogRecPtr two_phase_at);
 extern void FreeSnapshotBuilder(SnapBuild *builder);
 
+extern void SnapBuildSnapIncRefcount(HistoricMVCCSnapshot snap);
 extern void SnapBuildSnapDecRefcount(HistoricMVCCSnapshot snap);
 
 extern MVCCSnapshot SnapBuildInitialSnapshot(SnapBuild *builder);
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 93c1f51784f..bca0ad16e68 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -218,8 +218,6 @@ typedef struct HistoricMVCCSnapshotData
 
 	CommandId	curcid;			/* in my xact, CID < curcid are visible */
 
-	bool		copied;			/* false if it's a "base" snapshot */
-
 	uint32		refcount;		/* refcount managed by snapbuild.c  */
 	uint32		regd_count;		/* refcount registered with resource owners */
 
-- 
2.39.5

Noah Misch

noah@leadboat.com

5 months ago

In reply to: Heikki Linnakangas (#6)

Re: A few patches to clarify snapshot management

On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 00:00, Andres Freund wrote:

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while doing
logical decoding anyway, because historical snapshots only track XIDs that
modify catalogs. regclassout and enumout do work because they use the
catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
for its row_filter feature. row_filter calls ExecEvalExpr() from the output
plugin, e.g. to evaluate expression "id between 2 AND 4" arising from this
configuration in the pglogical test suite:

SELECT * FROM pglogical.replication_set_add_table('default', 'basic_dml', false, row_filter := $rf$id between 2 AND 4$rf$);

One of the GetTransactionSnapshot() calls is in pglogical_output_plugin.c
itself. For that, I could work around the change by forcing the old
HistoricSnapshot use:

-		PushActiveSnapshot(GetTransactionSnapshot());
+		Assert(HistoricSnapshotActive());
+		PushActiveSnapshot(GetCatalogSnapshot(InvalidOid));

That doesn't get far. Calling a plpgsql function in the expression reaches
the "cannot take query snapshot during logical decoding" error via this stack
trace:

GetTransactionSnapshot at snapmgr.c:279:3
exec_eval_simple_expr at pl_exec.c:6214:3
(inlined by) exec_eval_expr at pl_exec.c:5699:6
exec_stmt_raise at pl_exec.c:3820:8
(inlined by) exec_stmts at pl_exec.c:2096:10
exec_stmt_block at pl_exec.c:1955:6
exec_toplevel_block at pl_exec.c:1646:7
plpgsql_exec_function at pl_exec.c:636:5
plpgsql_call_handler at pl_handler.c:278:11
fmgr_security_definer at fmgr.c:755:52
ExecInterpExpr at execExprInterp.c:927:7
pglogical_change_filter at pglogical_output_plugin.c:663:7
(inlined by) pg_decode_change at pglogical_output_plugin.c:691:7
change_cb_wrapper at logical.c:1121:22
ReorderBufferApplyChange at reorderbuffer.c:2078:3
(inlined by) ReorderBufferProcessTXN at reorderbuffer.c:2383:7
DecodeCommit at decode.c:743:3
(inlined by) xact_decode at decode.c:242:5
LogicalDecodingProcessRecord at decode.c:123:1
XLogSendLogical at walsender.c:3442:33
WalSndLoop at walsender.c:2837:7
StartLogicalReplication at walsender.c:1504:2
(inlined by) exec_replication_command at walsender.c:2158:6
PostgresMain at postgres.c:4762:10
BackendMain at backend_startup.c:80:2
postmaster_child_launch at launch_backend.c:291:3
BackendStartup at postmaster.c:3587:8
(inlined by) ServerLoop at postmaster.c:1702:6
PostmasterMain at postmaster.c:1252:6
main at main.c:165:4

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I think pglogical_output_plugin.c w/ plpgsql is largely sane when used with a
plpgsql function that consults only catalogs and the output tuple. If the
pglogical test suite is representative, that's the usual case for a
row_filter. A plpgsql function that reads user tables will be fragile with
concurrent pruning, but a user might sanely accept that fragility. A plpgsql
function that writes tuples is not sane in a row_filter. How do you see it?

So far, I know of these options:

1. Make pglogical block the row_filter feature for any v18+ origin.
2. Revert postgresql.git commit 1585ff7387.
3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
snapshots directly. That should keep simple row_filter expressions like
"col > 0" functioning. Entering plpgsql or similarly-complex logic will
fail with "cannot take query snapshot during logical decoding", and we'll
consider that to be working as intended.
4. Fail later and lazily, for just the most-unreasonable cases. For example,
fail when HistoricSnapshot applies to a write operation. (Maybe this
already fails. I didn't check.)

Which of those or other options should we consider?

For reference, https://github.com/2ndQuadrant/pglogical/pull/503 is an
otherwise-working port of pglogical to v18. Its Makefile currently disables
the tests that reach "cannot take query snapshot during logical decoding".

Heikki Linnakangas

hlinnaka@iki.fi

5 months ago

In reply to: Noah Misch (#8)

3 attachment(s)

Re: A few patches to clarify snapshot management

On 10/08/2025 01:23, Noah Misch wrote:

On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 00:00, Andres Freund wrote:

On 2024-12-20 19:31:01 +0200, Heikki Linnakangas wrote:

While playing around some more with this, I noticed that this code in
GetTransactionSnapshot() is never reached, and AFAICS has always been dead
code:

Snapshot
GetTransactionSnapshot(void)
{
/*
* Return historic snapshot if doing logical decoding. We'll never need a
* non-historic transaction snapshot in this (sub-)transaction, so there's
* no need to be careful to set one up for later calls to
* GetTransactionSnapshot().
*/
if (HistoricSnapshotActive())
{
Assert(!FirstSnapshotSet);
return HistoricSnapshot;
}

when you think about it, that's good, because it doesn't really make sense
to call GetTransactionSnapshot() during logical decoding. We jump through
hoops to make the historic catalog decoding possible with historic
snapshots, tracking subtransactions that modify catalogs and WAL-logging
command ids, but they're not suitable for general purpose queries. So I
think we should turn that into an error, per attached patch.

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while doing
logical decoding anyway, because historical snapshots only track XIDs that
modify catalogs. regclassout and enumout do work because they use the
catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
for its row_filter feature. row_filter calls ExecEvalExpr() from the output
plugin, e.g. to evaluate expression "id between 2 AND 4" arising from this
configuration in the pglogical test suite:

SELECT * FROM pglogical.replication_set_add_table('default', 'basic_dml', false, row_filter := $rf$id between 2 AND 4$rf$);

One of the GetTransactionSnapshot() calls is in pglogical_output_plugin.c
itself. For that, I could work around the change by forcing the old
HistoricSnapshot use:
-		PushActiveSnapshot(GetTransactionSnapshot());
+		Assert(HistoricSnapshotActive());
+		PushActiveSnapshot(GetCatalogSnapshot(InvalidOid));
That doesn't get far. Calling a plpgsql function in the expression reaches
the "cannot take query snapshot during logical decoding" error via this stack
trace:

GetTransactionSnapshot at snapmgr.c:279:3
exec_eval_simple_expr at pl_exec.c:6214:3
(inlined by) exec_eval_expr at pl_exec.c:5699:6
exec_stmt_raise at pl_exec.c:3820:8
(inlined by) exec_stmts at pl_exec.c:2096:10
exec_stmt_block at pl_exec.c:1955:6
exec_toplevel_block at pl_exec.c:1646:7
plpgsql_exec_function at pl_exec.c:636:5
plpgsql_call_handler at pl_handler.c:278:11
fmgr_security_definer at fmgr.c:755:52
ExecInterpExpr at execExprInterp.c:927:7
pglogical_change_filter at pglogical_output_plugin.c:663:7
(inlined by) pg_decode_change at pglogical_output_plugin.c:691:7
change_cb_wrapper at logical.c:1121:22
ReorderBufferApplyChange at reorderbuffer.c:2078:3
(inlined by) ReorderBufferProcessTXN at reorderbuffer.c:2383:7
DecodeCommit at decode.c:743:3
(inlined by) xact_decode at decode.c:242:5
LogicalDecodingProcessRecord at decode.c:123:1
XLogSendLogical at walsender.c:3442:33
WalSndLoop at walsender.c:2837:7
StartLogicalReplication at walsender.c:1504:2
(inlined by) exec_replication_command at walsender.c:2158:6
PostgresMain at postgres.c:4762:10
BackendMain at backend_startup.c:80:2
postmaster_child_launch at launch_backend.c:291:3
BackendStartup at postmaster.c:3587:8
(inlined by) ServerLoop at postmaster.c:1702:6
PostmasterMain at postmaster.c:1252:6
main at main.c:165:4

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I think pglogical_output_plugin.c w/ plpgsql is largely sane when used with a
plpgsql function that consults only catalogs and the output tuple. If the
pglogical test suite is representative, that's the usual case for a
row_filter. A plpgsql function that reads user tables will be fragile with
concurrent pruning, but a user might sanely accept that fragility. A plpgsql
function that writes tuples is not sane in a row_filter. How do you see it?

So far, I know of these options:

1. Make pglogical block the row_filter feature for any v18+ origin.
2. Revert postgresql.git commit 1585ff7387.
3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
snapshots directly. That should keep simple row_filter expressions like
"col > 0" functioning. Entering plpgsql or similarly-complex logic will
fail with "cannot take query snapshot during logical decoding", and we'll
consider that to be working as intended.
4. Fail later and lazily, for just the most-unreasonable cases. For example,
fail when HistoricSnapshot applies to a write operation. (Maybe this
already fails. I didn't check.)

Which of those or other options should we consider?

Hmm, so what snapshot should you use for these row filter expressions
anyway?

As long as you don't try to access any tables, it doesn't matter
much. Although, the effective catalog snapshot also affects how any
functions used in the expression are resolved. If you CREATE OR REPLACE
the row filter function while logical decoding is active, what version
of the function do you expect to be used? I think that's a little fuzzy,
and you might get different answer for the initial sync step and the
on-going decoding. We don't necessarily need to solve that here.
Nevertheless, what would be the least surprising answer to that?

Currently, in v17 and below, we will use the historic snapshot, which
represents the point in time that we are decoding. Is that the right
choice? I'm not sure. A historic snapshot is only supposed to be used
for catalogs, it's not clear if it works correctly for arbitrary
queries. And it's not clear it's the right choice for resolving the row
filter functions either.

How about always using a fresh snapshot instead? Instead of pushing the
historic snapshot as the active snapshot, _disable_ the historic
snapshot and use GetTransactionSnapshot() to acquire a regular snapshot?

We could implement that in GetTransactionSnapshot() ifself by just
removing the check for HistoricSnapshotActive(), and let it call
GetSnapshotData() as usual. But I still think it's a useful sanity check
that you don't call GetTransactionSnapshot() while a historic snapshot
is active, so I'd prefer for the caller to explicitly disable the
historic snapshot first.

Attached is a patch to pglogical to demonstrate that.

Another apprach is to continue down the path you attempted. There are
many places in plpgsql and elsewhere where we call
GetTransactionSnapshot(), but are they really necessary when you're
executing something like the row filter expression? I think the row
filter expression is supposed to be read-only. There are optimizations
already to avoid GetTransactionSnapshot() calls in read-only functions
(i.e. immutable), but we could expand those to any function in a
read-only transaction, and set XactReadOnly while evaluating the row
filter expression.

The second attached patch makes that change in PostgreSQL code. With
those changes, the pglogical change you attempted to do
"PushActiveSnapshot(GetCatalogSnapshot(InvalidOid))" seems to work. I'm
not sure it covers all the cases though, there might be more
GetTransactionSnapshot() calls lurking.

I think I prefer the change to pglogical to disable the historic
snapshot. It feels more robust. I'm not sure if there's a performance
penalty though, as you now need to call GetSnapshotData() for every
decoded transaction.

Finally, attached is a pglogical test case to test what happens if you
change the datatype of the table, while there's decoding active with a
complex row filter function that also accesses the table. I'm not sure
how that should behave and I think that falls into the category of
"don't do that". But FWIW, on v17 it tries to read it fails with this:

ERROR: could not read blocks 0..0 in file "base/16384/17512": read only
0 of 8192 bytes

while with the attached pglogical-disable-historic-snapshot.patch it
fails more nicely:

[2025-08-15 14:05:41.514 EEST] [3824375] [regression] ERROR: attribute
1 of type rowfilter_ddl_tbl has wrong type
[2025-08-15 14:05:41.514 EEST] [3824375] [regression] DETAIL: Table has
type text, but query expects integer.

- Heikki

Attachments:

pglogical-disable-historic-snapshot.patchtext/x-patch; charset=UTF-8; name=pglogical-disable-historic-snapshot.patchDownload

diff --git a/pglogical_output_plugin.c b/pglogical_output_plugin.c
index 9f1bf9c..98aae62 100644
--- a/pglogical_output_plugin.c
+++ b/pglogical_output_plugin.c
@@ -630,6 +630,10 @@ pglogical_change_filter(PGLogicalOutputData *data, Relation relation,
 		HeapTuple		newtup = change->data.tp.newtuple ?
 			&change->data.tp.newtuple->tuple : NULL;
 #endif
+#if PG_VERSION_NUM >= 180000
+		Snapshot		save_historic_snapshot;
+		HTAB		   *save_tuplecids;
+#endif
 
 		/* Skip empty changes. */
 		if (!newtup && !oldtup)
@@ -638,6 +642,12 @@ pglogical_change_filter(PGLogicalOutputData *data, Relation relation,
 			return false;
 		}
 
+#if PG_VERSION_NUM >= 180000
+		Assert(HistoricSnapshotActive());
+		save_historic_snapshot = GetCatalogSnapshot(InvalidOid);
+		save_tuplecids = HistoricSnapshotGetTupleCids();
+		TeardownHistoricSnapshot(false);
+#endif
 		PushActiveSnapshot(GetTransactionSnapshot());
 
 		estate = create_estate_for_relation(relation, false);
@@ -667,6 +677,9 @@ pglogical_change_filter(PGLogicalOutputData *data, Relation relation,
 		FreeExecutorState(estate);
 
 		PopActiveSnapshot();
+#if PG_VERSION_NUM >= 180000
+		SetupHistoricSnapshot(historic_snapshot, tuplecids);
+#endif
 	}
 
 	/* Make sure caller is aware of any attribute filter. */

use-readonly-mode-for-decode.patchtext/x-patch; charset=UTF-8; name=use-readonly-mode-for-decode.patchDownload

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 359aafea681..4c9a17a2fc9 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1644,6 +1644,7 @@ fmgr_sql(PG_FUNCTION_ARGS)
 	while (es)
 	{
 		bool		completed;
+		bool		need_snapshot = !fcache->func->readonly_func && !XactReadOnly;
 
 		if (es->status == F_EXEC_START)
 		{
@@ -1653,7 +1654,7 @@ fmgr_sql(PG_FUNCTION_ARGS)
 			 * visible.  Take a new snapshot if we don't have one yet,
 			 * otherwise just bump the command ID in the existing snapshot.
 			 */
-			if (!fcache->func->readonly_func)
+			if (need_snapshot)
 			{
 				CommandCounterIncrement();
 				if (!pushed_snapshot)
@@ -1667,7 +1668,7 @@ fmgr_sql(PG_FUNCTION_ARGS)
 
 			postquel_start(es, fcache);
 		}
-		else if (!fcache->func->readonly_func && !pushed_snapshot)
+		else if (need_snapshot && !pushed_snapshot)
 		{
 			/* Re-establish active snapshot when re-entering function */
 			PushActiveSnapshot(es->qd->snapshot);
@@ -1946,13 +1947,15 @@ ShutdownSQLFunction(Datum arg)
 		/* Shut down anything still running */
 		if (es->status == F_EXEC_RUN)
 		{
+			bool		need_snapshot = !fcache->func->readonly_func && !XactReadOnly;
+
 			/* Re-establish active snapshot for any called functions */
-			if (!fcache->func->readonly_func)
+			if (need_snapshot)
 				PushActiveSnapshot(es->qd->snapshot);
 
 			postquel_end(es, fcache);
 
-			if (!fcache->func->readonly_func)
+			if (need_snapshot)
 				PopActiveSnapshot();
 		}
 		es = es->next;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 11139a910b8..6985799d90f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2249,6 +2249,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			BeginInternalSubTransaction(streaming ? "stream" : "replay");
 		else
 			StartTransactionCommand();
+		XactReadOnly = true;
 
 		/*
 		 * We only need to send begin/begin-prepare for non-streamed
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b9acc790dc6..b27867d0613 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4013,7 +4013,7 @@ plpgsql_estate_setup(PLpgSQL_execstate *estate,
 	estate->retistuple = func->fn_retistuple;
 	estate->retisset = func->fn_retset;
 
-	estate->readonly_func = func->fn_readonly;
+	estate->readonly_func = func->fn_readonly || XactReadOnly;
 	estate->atomic = true;
 
 	estate->exitlabel = NULL;

row_filter_ddl.sqlapplication/sql; name=row_filter_ddl.sqlDownload

#10

Noah Misch

noah@leadboat.com

5 months ago

In reply to: Heikki Linnakangas (#9)

Re: A few patches to clarify snapshot management

On Fri, Aug 15, 2025 at 02:12:03PM +0300, Heikki Linnakangas wrote:

On 10/08/2025 01:23, Noah Misch wrote:

On Tue, Jan 07, 2025 at 11:55:00AM +0200, Heikki Linnakangas wrote:

On 07/01/2025 00:00, Andres Freund wrote:

Hm. I'm not sure it's a good idea to forbid this. Couldn't there be sane C
code in an output functions calling GetTransactionSnapshot() or such to do
some internal lookups?

I haven't seen any. And I don't think that would work correctly while doing
logical decoding anyway, because historical snapshots only track XIDs that
modify catalogs. regclassout and enumout do work because they use the
catalog snapshot rather than GetTransactionSnapshot().

(I committed that change in commit 1585ff7387 already, but discussion is
still welcome of course)

https://github.com/2ndQuadrant/pglogical does rely on the pre-1585ff7387 code
for its row_filter feature.

So far, I know of these options:

1. Make pglogical block the row_filter feature for any v18+ origin.
2. Revert postgresql.git commit 1585ff7387.
3. Make pglogical use HistoricSnapshot where pglogical_output_plugin.c handles
snapshots directly. That should keep simple row_filter expressions like
"col > 0" functioning. Entering plpgsql or similarly-complex logic will
fail with "cannot take query snapshot during logical decoding", and we'll
consider that to be working as intended.
4. Fail later and lazily, for just the most-unreasonable cases. For example,
fail when HistoricSnapshot applies to a write operation. (Maybe this
already fails. I didn't check.)

Which of those or other options should we consider?

Hmm, so what snapshot should you use for these row filter expressions
anyway?

As long as you don't try to access any tables, it doesn't matter
much. Although, the effective catalog snapshot also affects how any
functions used in the expression are resolved. If you CREATE OR REPLACE the
row filter function while logical decoding is active, what version of the
function do you expect to be used? I think that's a little fuzzy, and you
might get different answer for the initial sync step and the on-going
decoding. We don't necessarily need to solve that here. Nevertheless, what
would be the least surprising answer to that?

Currently, in v17 and below, we will use the historic snapshot, which
represents the point in time that we are decoding. Is that the right
choice? I'm not sure. A historic snapshot is only supposed to be used for
catalogs, it's not clear if it works correctly for arbitrary queries. And
it's not clear it's the right choice for resolving the row filter functions
either.

How about always using a fresh snapshot instead? Instead of pushing the
historic snapshot as the active snapshot, _disable_ the historic snapshot
and use GetTransactionSnapshot() to acquire a regular snapshot?

I see advantages in using the historic snapshot:

1. It's the longstanding behavior, and applications aren't complaining.

2. If someone wants "fresh snapshot", they can do that today with a C
extension that provides an execute_at_fresh_snapshot(sql text) SQL
function. If we adopt the fresh snapshot in pglogical or in core, I don't
see a comparably-clean way for the application code to get back to the
historic snapshot. (That's because the historic snapshot lives only in
stack variables at the moment in question.)

3. If an application is relying on the longstanding behavior and needs to
adapt to the proposed "fresh snapshot" behavior, that may be invasive to
implement and harmful to performance. For example, instead of reading from
a user_catalog_table inside the filter, the application may need to
duplicate that table's data into the rows being filtered.

Does the "fresh snapshot" alternative bring strengths to outweigh those?

Another apprach is to continue down the path you attempted. There are many
places in plpgsql and elsewhere where we call GetTransactionSnapshot(), but
are they really necessary when you're executing something like the row
filter expression? I think the row filter expression is supposed to be
read-only. There are optimizations already to avoid GetTransactionSnapshot()
calls in read-only functions (i.e. immutable), but we could expand those to
any function in a read-only transaction, and set XactReadOnly while
evaluating the row filter expression.

The second attached patch makes that change in PostgreSQL code. With those
changes, the pglogical change you attempted to do
"PushActiveSnapshot(GetCatalogSnapshot(InvalidOid))" seems to work. I'm not
sure it covers all the cases though, there might be more
GetTransactionSnapshot() calls lurking.

Yep. As you say, the GetTransactionSnapshot() calls probably aren't
necessary, so this could work long-term. That patch's edit in src/pl may
imply similar needs lurking in non-core PLs.

Finally, attached is a pglogical test case to test what happens if you
change the datatype of the table, while there's decoding active with a
complex row filter function that also accesses the table. I'm not sure how
that should behave and I think that falls into the category of "don't do
that". But FWIW, on v17 it tries to read it fails with this:

ERROR: could not read blocks 0..0 in file "base/16384/17512": read only 0
of 8192 bytes

Reading reliably with a historic snapshot would require adding
user_catalog_table. Ideally, the error message would lead the user to a
conclusion like "you're reading a non-catalog with a historic snapshot; this
is expected after a rewrite of that non-catalog". With user_catalog_table:

-   CREATE TABLE public.rowfilter_ddl_tbl (id int primary key);
+   CREATE TABLE public.rowfilter_ddl_tbl (id int primary key) WITH (user_catalog_table = true);

... the v17 ALTER fails with 'ERROR: cannot rewrite table "rowfilter_ddl_tbl"
used as a catalog table'. Not bad. Incidentally, while that's the result
with a production build, a v17 --enable-cassert build crashes earlier (at
today's REL_17_STABLE and today's pglogical head):

#4 0x000055ed959bd415 in ExceptionalCondition (conditionName=conditionName@entry=0x55ed95ae46f8 "ActiveSnapshot->as_snap->active_count == 1",
fileName=fileName@entry=0x55ed95a49ce1 "snapmgr.c", lineNumber=lineNumber@entry=718) at assert.c:66
#5 0x000055ed959ff409 in UpdateActiveSnapshotCommandId () at snapmgr.c:718
#6 0x000055ed956f8342 in _SPI_execute_plan (plan=plan@entry=0x55edab044410, options=options@entry=0x7ffd8b03f4b0, snapshot=snapshot@entry=0x0,
crosscheck_snapshot=crosscheck_snapshot@entry=0x0, fire_triggers=fire_triggers@entry=true) at spi.c:2668
#7 0x000055ed956f8bc2 in SPI_execute_plan_with_paramlist (plan=0x55edab044410, params=0x55edab021750, read_only=false, tcount=tcount@entry=0) at spi.c:751
#8 0x00007f18a25e78a7 in exec_run_select (estate=estate@entry=0x7ffd8b03fbd0, expr=expr@entry=0x55edab030b58, portalP=portalP@entry=0x0, maxtuples=0) at pl_exec.c:5824
#9 0x00007f18a25e7b8a in exec_eval_expr (estate=0x7ffd8b03fbd0, expr=0x55edab030b58, isNull=0x7ffd8b03f597, rettype=0x7ffd8b03f598, rettypmod=0x7ffd8b03f59c)
at pl_exec.c:5714
#10 0x00007f18a25ea7f2 in exec_assign_expr (estate=estate@entry=0x7ffd8b03fbd0, target=0x55edab020b50, expr=0x55edab030b58) at pl_exec.c:5039
#11 0x00007f18a25ebe45 in exec_stmt_assign (estate=0x7ffd8b03fbd0, stmt=0x55edab030ac8) at pl_exec.c:2156
#12 exec_stmts (estate=estate@entry=0x7ffd8b03fbd0, stmts=0x55edab030c38) at pl_exec.c:2020
#13 0x00007f18a25ebdd3 in exec_stmt_if (estate=0x55edaaf3af40, stmt=<optimized out>) at pl_exec.c:2535
#14 exec_stmts (estate=estate@entry=0x7ffd8b03fbd0, stmts=0x55edab030cd8) at pl_exec.c:2036
#15 0x00007f18a25ede6b in exec_stmt_block (estate=estate@entry=0x7ffd8b03fbd0, block=block@entry=0x55edab0310d0) at pl_exec.c:1943
#16 0x00007f18a25edf6d in exec_toplevel_block (estate=estate@entry=0x7ffd8b03fbd0, block=0x55edab0310d0) at pl_exec.c:1634
#17 0x00007f18a25ee7e1 in plpgsql_exec_function (func=func@entry=0x55edab024a18, fcinfo=fcinfo@entry=0x55edaafd8a18, simple_eval_estate=simple_eval_estate@entry=0x0,
simple_eval_resowner=simple_eval_resowner@entry=0x0, procedure_resowner=procedure_resowner@entry=0x0, atomic=<optimized out>) at pl_exec.c:623
#18 0x00007f18a25f8e43 in plpgsql_call_handler (fcinfo=0x55edaafd8a18) at pl_handler.c:277
#19 0x000055ed956b239f in ExecInterpExpr (state=0x55edaafd83a0, econtext=0x55edab012180, isnull=<optimized out>) at execExprInterp.c:740
#20 0x00007f18a3003bc0 in pglogical_change_filter (data=0x55edaafb7268, relation=0x7f18a2a93ab8, change=0x55edab006dc0, att_list=<synthetic pointer>)
at pglogical_output_plugin.c:656
#21 pg_decode_change (ctx=0x55edaafb5460, txn=<optimized out>, relation=0x7f18a2a93ab8, change=0x55edab006dc0) at pglogical_output_plugin.c:690
#22 0x000055ed957fe6f9 in change_cb_wrapper (cache=<optimized out>, txn=<optimized out>, relation=<optimized out>, change=<optimized out>) at logical.c:1137
#23 0x000055ed9580a1f8 in ReorderBufferApplyChange (rb=<optimized out>, txn=<optimized out>, relation=0x7f18a2a93ab8, change=0x55edab006dc0, streaming=false)
at reorderbuffer.c:2019
#24 ReorderBufferProcessTXN (rb=0x55edaafb9ce0, txn=0x55edaafefdb0, commit_lsn=37174848, snapshot_now=<optimized out>, command_id=command_id@entry=0,
streaming=streaming@entry=false) at reorderbuffer.c:2300
#25 0x000055ed9580a4dc in ReorderBufferReplay (txn=<optimized out>, rb=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>, commit_time=<optimized out>,
origin_id=<optimized out>, origin_lsn=0, xid=<optimized out>) at reorderbuffer.c:2767
#26 0x000055ed9580b041 in ReorderBufferCommit (rb=<optimized out>, xid=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>, commit_time=<optimized out>,
origin_id=<optimized out>, origin_lsn=<optimized out>) at reorderbuffer.c:2791
#27 0x000055ed957fad92 in DecodeCommit (ctx=0x55edaafb5460, buf=0x7ffd8b040500, parsed=0x7ffd8b040370, xid=794, two_phase=false) at decode.c:746
#28 xact_decode (ctx=0x55edaafb5460, buf=0x7ffd8b040500) at decode.c:242
#29 0x000055ed957fa721 in LogicalDecodingProcessRecord (ctx=0x55edaafb5460, record=0x55edaafb5838) at decode.c:116
#30 0x000055ed95825c12 in XLogSendLogical () at walsender.c:3445
#31 0x000055ed95828526 in WalSndLoop (send_data=send_data@entry=0x55ed95825bd0 <XLogSendLogical>) at walsender.c:2835
#32 0x000055ed958295bc in StartLogicalReplication (cmd=<optimized out>) at walsender.c:1525
#33 exec_replication_command (
cmd_string=cmd_string@entry=0x55edaaed62d0 "START_REPLICATION SLOT \"pgl_postgres_test_provider_test_sube55bf37\" LOGICAL 0/2341A18 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', \"binary.want_i"...) at walsender.c:2160
#34 0x000055ed958811f4 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4763
...
(gdb) p ActiveSnapshot->as_snap->active_count
$1 = 3

#11

Heikki Linnakangas

hlinnaka@iki.fi

5 months ago

In reply to: Noah Misch (#10)

1 attachment(s)

Re: A few patches to clarify snapshot management

On 19/08/2025 03:14, Noah Misch wrote:

On Fri, Aug 15, 2025 at 02:12:03PM +0300, Heikki Linnakangas wrote:

How about always using a fresh snapshot instead? Instead of pushing the
historic snapshot as the active snapshot, _disable_ the historic snapshot
and use GetTransactionSnapshot() to acquire a regular snapshot?

I see advantages in using the historic snapshot:

1. It's the longstanding behavior, and applications aren't complaining.

2. If someone wants "fresh snapshot", they can do that today with a C
extension that provides an execute_at_fresh_snapshot(sql text) SQL
function. If we adopt the fresh snapshot in pglogical or in core, I don't
see a comparably-clean way for the application code to get back to the
historic snapshot. (That's because the historic snapshot lives only in
stack variables at the moment in question.)

3. If an application is relying on the longstanding behavior and needs to
adapt to the proposed "fresh snapshot" behavior, that may be invasive to
implement and harmful to performance. For example, instead of reading from
a user_catalog_table inside the filter, the application may need to
duplicate that table's data into the rows being filtered.

Oh, I had not considered user_catalog_tables. I didn't remember that's a
thing.

The docs on user_catalog_table says:

Note that access to user catalog tables or regular system catalog
tables in the output plugins has to be done via the systable_* scan APIs
only. Access via the heap_* scan APIs will error out.

That doesn't quite say that you should be able to run arbitrary queries
on a user_catalog_table. In fact it suggests that you can't, because
surely you're not using the systable_* scan APIs when running arbitrary
queries.

That said, I agree it would be nice if we can keep it working.

Does the "fresh snapshot" alternative bring strengths to outweigh those?

The argument for the fresh snapshot is that using a historic snapshot
only makes sense for catalog tables, and by taking a fresh snapshot, we
avoid the mistake of using the historic snapshot for anything else. I
thought that there's practically no valid use case for using a historic
snapshot in anything that calls GetTransactionSnapshot(), and if it
happens to work, it's only because the snapshot isn't actually used for
anything or is only used to read data that hasn't changed so that you
get away with it.

I agree that reading a table marked as user_catalog_table is valid case,
however, so I take back that argument.

How about the attached, then? It reverts the GetTransactionSnapshot()
change. But to still catch at least some of the invalid uses of the
historic snapshot, it adds checks to heap_beginscan() and
index_beginscan(), to complain if they are called on a non-catalog
relation with a historic snapshot.

- Heikki

Attachments:

0001-Revert-GetTransactionSnapshot-to-return-historic-sna.patchtext/x-patch; charset=UTF-8; name=0001-Revert-GetTransactionSnapshot-to-return-historic-sna.patchDownload

From 610c44f27850cc227c8907e548baee3e709b9fee Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 19 Aug 2025 23:21:58 +0300
Subject: [PATCH 1/1] Revert GetTransactionSnapshot() to return historic
 snapshot during LR

Commit 1585ff7387 changed GetTransactionSnapshot() to throw an error
if it's called during logical decoding, instead of returning the
historic snapshot. I made that change for extra protection, because a
historic snapshot can only be used to access catalog tables while
GetTransactionSnapshot() is usually called when you're executing
arbitrary queries. You might get very subtle visibility problems if
you tried to use the historic snapshot for arbitrary queries.

There's no built-in code in PostgreSQL that calls
GetTransactionSnapshot() during logical decoding, but it turns out
that the pglogical extension does just that, to evaluate row filter
expressions. You would get weird results if the row filter runs
arbitrary queries, but it is sane as long as you don't access any
non-catalog tables. Even though there are no checks to enforce that in
pglogical, a typical row filter expression does not access any tables
and works fine. Accessing tables marked with the user_catalog_table =
true option is also OK.

To fix pglogical with row filters, and any other extensions that might
do similar things, revert GetTransactionSnapshot() to return a
historic snapshot during logical decoding.

To try to still catch the unsafe usage of historic snapshots, add
checks in heap_beginscan() and index_beginscan() to complain if you
try to use a historic snapshot to scan a non-catalog table.

Backpatch-through: 18
Reported-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/20250809222338.cc.nmisch%40google.com
---
 src/backend/access/heap/heapam.c   |  9 +++++++++
 src/backend/access/index/indexam.c |  8 ++++++++
 src/backend/utils/time/snapmgr.c   | 19 +++++++++++++++----
 src/include/utils/snapmgr.h        |  3 +++
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..ee692c03c3c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1143,6 +1143,15 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	if (!(snapshot && IsMVCCSnapshot(snapshot)))
 		scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;
 
+	/* Check that a historic snapshot is not used for non-catalog tables */
+	if (snapshot &&
+		IsHistoricMVCCSnapshot(snapshot) &&
+		!RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		elog(ERROR, "cannot query non-catalog table \"%s\" during logical decoding",
+			 RelationGetRelationName(relation));
+	}
+
 	/*
 	 * For seqscan and sample scans in a serializable transaction, acquire a
 	 * predicate lock on the entire relation. This is required not only to
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 219df1971da..a1ab4156d27 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -263,6 +263,14 @@ index_beginscan(Relation heapRelation,
 
 	Assert(snapshot != InvalidSnapshot);
 
+	/* Check that a historic snapshot is not used for non-catalog tables */
+	if (IsHistoricMVCCSnapshot(snapshot) &&
+		!RelationIsAccessibleInLogicalDecoding(heapRelation))
+	{
+		elog(ERROR, "cannot query non-catalog table \"%s\" during logical decoding",
+			 RelationGetRelationName(heapRelation));
+	}
+
 	scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot, NULL, false);
 
 	/*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ea35f30f494..65561cc6bc3 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -271,12 +271,23 @@ Snapshot
 GetTransactionSnapshot(void)
 {
 	/*
-	 * This should not be called while doing logical decoding.  Historic
-	 * snapshots are only usable for catalog access, not for general-purpose
-	 * queries.
+	 * Return historic snapshot if doing logical decoding.
+	 *
+	 * Historic snapshots are only usable for catalog access, not for
+	 * general-purpose queries.  The caller is responsible for ensuring that
+	 * the snapshot is used correctly! (PostgreSQL code never calls this
+	 * during logical decoding, but extensions can do it.)
 	 */
 	if (HistoricSnapshotActive())
-		elog(ERROR, "cannot take query snapshot during logical decoding");
+	{
+		/*
+		 * We'll never need a non-historic transaction snapshot in this
+		 * (sub-)transaction, so there's no need to be careful to set one up
+		 * for later calls to GetTransactionSnapshot().
+		 */
+		Assert(!FirstSnapshotSet);
+		return HistoricSnapshot;
+	}
 
 	/* First call in transaction? */
 	if (!FirstSnapshotSet)
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index d346be71642..604c1f90216 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -56,6 +56,9 @@ extern PGDLLIMPORT SnapshotData SnapshotToastData;
 	((snapshot)->snapshot_type == SNAPSHOT_MVCC || \
 	 (snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
 
+#define IsHistoricMVCCSnapshot(snapshot)  \
+	((snapshot)->snapshot_type == SNAPSHOT_HISTORIC_MVCC)
+
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
-- 
2.39.5

#12

Noah Misch

noah@leadboat.com

5 months ago

In reply to: Heikki Linnakangas (#11)

Re: A few patches to clarify snapshot management

On Tue, Aug 19, 2025 at 11:45:01PM +0300, Heikki Linnakangas wrote:

How about the attached, then? It reverts the GetTransactionSnapshot()
change. But to still catch at least some of the invalid uses of the historic
snapshot, it adds checks to heap_beginscan() and index_beginscan(), to
complain if they are called on a non-catalog relation with a historic
snapshot.

@@ -1143,6 +1143,15 @@ heap_beginscan(Relation relation, Snapshot snapshot,
if (!(snapshot && IsMVCCSnapshot(snapshot)))
scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;
+	/* Check that a historic snapshot is not used for non-catalog tables */
+	if (snapshot &&
+		IsHistoricMVCCSnapshot(snapshot) &&
+		!RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		elog(ERROR, "cannot query non-catalog table \"%s\" during logical decoding",
+			 RelationGetRelationName(relation));
+	}
+

I feel post-beta3 is late for debut of restrictions like this. How about a
pure revert, then add those restrictions in v19? Should be s/elog/ereport/,
also.

#13

Heikki Linnakangas

hlinnaka@iki.fi

5 months ago

In reply to: Noah Misch (#12)

Re: A few patches to clarify snapshot management

On 20/08/2025 03:37, Noah Misch wrote:

On Tue, Aug 19, 2025 at 11:45:01PM +0300, Heikki Linnakangas wrote:

How about the attached, then? It reverts the GetTransactionSnapshot()
change. But to still catch at least some of the invalid uses of the historic
snapshot, it adds checks to heap_beginscan() and index_beginscan(), to
complain if they are called on a non-catalog relation with a historic
snapshot.
@@ -1143,6 +1143,15 @@ heap_beginscan(Relation relation, Snapshot snapshot,
if (!(snapshot && IsMVCCSnapshot(snapshot)))
scan->rs_base.rs_flags &= ~SO_ALLOW_PAGEMODE;
+	/* Check that a historic snapshot is not used for non-catalog tables */
+	if (snapshot &&
+		IsHistoricMVCCSnapshot(snapshot) &&
+		!RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		elog(ERROR, "cannot query non-catalog table \"%s\" during logical decoding",
+			 RelationGetRelationName(relation));
+	}
+
I feel post-beta3 is late for debut of restrictions like this. How about a
pure revert, then add those restrictions in v19? Should be s/elog/ereport/,
also.

Ok, fair. I committed the revert to v18, and the revert + additional
checks to master.

- Heikki