snapshot too old, configured by time

Started by Kevin Grittnerover 10 years ago89 messages

kgrittn@ymail.com

over 10 years ago

1 attachment(s)

As discussed when the "proof of concept" patch was submitted during
9.5 development, here is a version intended to be considered for
commit to 9.6, with the following changes:

1. It is configured using time rather than number of transactions.
Not only was there unanimous agreement here that this was better,
but the EDB customer who had requested this feature and who had
been testing it independently made the same request.

2. The "proof of concept" patch only supported heap and btree
checking; this supports all index types.

3. Documentation has been added.

4. Tests have been added. They are currently somewhat minimal,
since this is using a whole new technique for testing from any
existing committed tests -- I wanted to make sure that this
approach to testing was OK with everyone before expanding it. If
it is, I assume we will want to move some of the more generic
portions to a .pm file to make it available for other tests.

Basically, this patch aims to limit bloat when there are snapshots
that are kept registered for prolonged periods. The immediate
reason for this is a customer application that keeps read-only
cursors against fairly static data open for prolonged periods, and
automatically fields SQLSTATE 72000 to re-open them if necessary.
When used, it should also provide some protections against extreme
bloat from forgotten "idle in transaction" connections which are
left holding a snapshot.

Once a snapshot reaches the age threshold, it can be terminated if
reads data modified after the snapshot was built. It is expected
that useful ranges will normally be somewhere from a few hours to a
few days.

By default old_snapshot_threshold is set to -1, which disables the
new behavior.

The customer has been testing a preliminary version of this
time-based patch for several weeks, and is happy with the results
-- it is preventing bloat for them and not generating "snapshot too
old" errors at unexpected times.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Steve Singer

steve@ssinger.info

over 10 years ago

In reply to: Kevin Grittner (#1)

1 attachment(s)

Re: snapshot too old, configured by time

On 08/31/2015 10:07 AM, Kevin Grittner wrote:

Kevin,

I've started to do a review on this patch but I am a bit confused with
some of what I am seeing.

The attached testcase fails I replace the cursor in your test case with
direct selects from the table.
I would have expected this to generate the snapshot too old error as
well but it doesn't.

# Failed test 'expect "snapshot too old" error'
# at t/002_snapshot_too_old_select.pl line 64.
# got: ''
# expected: '72000'
# Looks like you failed 1 test of 9.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/9 subtests

Am I misunderstanding something or is the patch not working as expected?

Show quoted text

As discussed when the "proof of concept" patch was submitted during
9.5 development, here is a version intended to be considered for
commit to 9.6, with the following changes:

1. It is configured using time rather than number of transactions.
Not only was there unanimous agreement here that this was better,
but the EDB customer who had requested this feature and who had
been testing it independently made the same request.

2. The "proof of concept" patch only supported heap and btree
checking; this supports all index types.

3. Documentation has been added.

4. Tests have been added. They are currently somewhat minimal,
since this is using a whole new technique for testing from any
existing committed tests -- I wanted to make sure that this
approach to testing was OK with everyone before expanding it. If
it is, I assume we will want to move some of the more generic
portions to a .pm file to make it available for other tests.

Basically, this patch aims to limit bloat when there are snapshots
that are kept registered for prolonged periods. The immediate
reason for this is a customer application that keeps read-only
cursors against fairly static data open for prolonged periods, and
automatically fields SQLSTATE 72000 to re-open them if necessary.
When used, it should also provide some protections against extreme
bloat from forgotten "idle in transaction" connections which are
left holding a snapshot.

Once a snapshot reaches the age threshold, it can be terminated if
reads data modified after the snapshot was built. It is expected
that useful ranges will normally be somewhere from a few hours to a
few days.

By default old_snapshot_threshold is set to -1, which disables the
new behavior.

The customer has been testing a preliminary version of this
time-based patch for several weeks, and is happy with the results
-- it is preventing bloat for them and not generating "snapshot too
old" errors at unexpected times.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Kevin Grittner (#1)

Re: snapshot too old, configured by time

I'm starting to read through this and have a few questions.

Kevin Grittner wrote:

4. Tests have been added. They are currently somewhat minimal,
since this is using a whole new technique for testing from any
existing committed tests -- I wanted to make sure that this
approach to testing was OK with everyone before expanding it. If
it is, I assume we will want to move some of the more generic
portions to a .pm file to make it available for other tests.

I think your test module is a bit unsure about whether it's called tso
or sto. I would place it inside src/test/modules, so that buildfarm
runs it automatically; I'm not sure it will pick up things in src/test.
(This is the whole point of src/test/modules, after all). I haven't
written or reviewed our TestLib stuff so I can't comment on whether the
test code is good or not. How long is the test supposed to last?

It bothers me a bit to add #include rel.h to snapmgr.h because of the
new macro definition. It seems out of place. I would instead move the
macro closer to bufmgr headers or rel.h itself perhaps. Also, the
definition isn't accompanied by an explanatory comment (other than why
is it a macro instead of a function).

So if I understand correctly, every call to BufferGetPage needs to have
a TestForOldSnapshot immediately afterwards? It seems easy to later
introduce places that fail to test for old snapshots. What happens if
they do? Does vacuum remove tuples anyway and then the query returns
wrong results? That seems pretty dangerous. Maybe the snapshot could
be an argument of BufferGetPage?

Please have the comments be clearer on what "align to minute boundaries"
means. Is it floor or ceil? Also, in the OldSnapshotControlData big
comment you talk about "length" and then the variable is called "used".
Maybe that should be more consistent, for clarity.

How large is the struct kept in shmem? I don't think the size is
configurable, is it? I would have guessed that the size would depend on
the GUC param ...

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

over 10 years ago

In reply to: Steve Singer (#2)

Re: snapshot too old, configured by time

Thanks to both Steve and Álvaro for review and comments. I won't
be able to address all of those within the time frame of the
current CF, so I have moved it to the next CF and flagged it as
"Waiting on Author". I will post a patch to address all comments
before that CF starts. I will discuss now, though.

Steve Singer <steve@ssinger.info> wrote:

The attached testcase fails I replace the cursor in your test
case with direct selects from the table.
I would have expected this to generate the snapshot too old error
as well but it doesn't.

Good idea for an additional test; it does indeed show a case where
the patch could do better. I've reviewed the code and see how I
can adjust the calculations to fix this.

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

4. Tests have been added. They are currently somewhat minimal,
since this is using a whole new technique for testing from any
existing committed tests -- I wanted to make sure that this
approach to testing was OK with everyone before expanding it.
If it is, I assume we will want to move some of the more generic
portions to a .pm file to make it available for other tests.

I think your test module is a bit unsure about whether it's
called tso or sto.

Oops. Yeah, will fix. They all should be sto for "snapshot too
old". There is heavy enough use of timestamp fields name ts in the
code that my fingers got confused.

I would place it inside src/test/modules, so that buildfarm
runs it automatically; I'm not sure it will pick up things in
src/test.

As it stands now, the test is getting run as part of `make
check-world`, and it seems like src/test/modules is about testing
separate executables, so I don't think it makes sense to move the
tests -- but I could be convinced that I'm missing something.

I haven't written or reviewed our TestLib stuff so I can't
comment on whether the test code is good or not. How long is the
test supposed to last?

Well, I can't see how to get a good test of some code with a
setting of 0 (snapshot can become "too old" immediately). That
setting may keep parts of the test fast, but to exercise much of
the important code you need to configure to '1min' and have the
time pass the 0 second mark for a minute at various points in the
test; so it will be hard to avoid having this test take a few
minutes.

It bothers me a bit to add #include rel.h to snapmgr.h because of
the new macro definition. It seems out of place.

Yeah, I couldn't find anywhere to put the macro that I entirely
liked.

I would instead move the macro closer to bufmgr headers or rel.h
itself perhaps.

I'll try that, or something similar.

Also, the definition isn't accompanied by an explanatory comment
(other than why is it a macro instead of a function).

Will fix.

So if I understand correctly, every call to BufferGetPage needs
to have a TestForOldSnapshot immediately afterwards?

No, every call that is part of a scan. Access for internal
purposes (such as adding an index entry or vacuuming) does not need
this treatment.

It seems easy to later introduce places that fail to test for old
snapshots. What happens if they do?

That could cause incorrect results for those using the "snapshot
too old" feature, because a query might fail to recognize that one
or more rows it should see are missing.

Does vacuum remove tuples anyway and then the query returns
wrong results?

Right. That.

That seems pretty dangerous. Maybe the snapshot could be an
argument of BufferGetPage?

It would need that, the Relation (or go find it from the Buffer),
and an indication of whether this access is due to a scan.

Please have the comments be clearer on what "align to minute
boundaries" means. Is it floor or ceil?

Also, in the OldSnapshotControlData big comment you talk about
"length" and then the variable is called "used". Maybe that
should be more consistent, for clarity.

Will work on that comment.

How large is the struct kept in shmem?

To allow a setting up to 60 days, 338kB.

I don't think the size is configurable, is it?

Not in the posted patch.

I would have guessed that the size would depend on the GUC param ...

I had been trying to make this GUC PGC_SIGHUP, but found a lot of
devils hiding in the details of that. When I gave up on that and
went back to PGC_POSTMASTER I neglected to change this allocation
back. I agree that it should be changed, and that will make the
size depend on the configuration. If the feature is not used, no
significant RAM will be allocated. If, for example, someone wants
to configure to '5h' they would need only about 1.2kB for this
structure.

Again, thanks to both of you for the review!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Kevin Grittner (#4)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I would place it inside src/test/modules, so that buildfarm
runs it automatically; I'm not sure it will pick up things in
src/test.

As it stands now, the test is getting run as part of `make
check-world`, and it seems like src/test/modules is about testing
separate executables, so I don't think it makes sense to move the
tests -- but I could be convinced that I'm missing something.

It's not conceived just as a way to test separate executables; maybe it
is at the moment (though I don't think it is?) but if so that's just an
accident. The intention is to have modules that get tested without them
being installed, which wasn't the case when they were in contrib.

The problem with check-world is that buildfarm doesn't run it. We don't
want to set up separate buildfarm modules for each subdir in src/test;
that would be pretty tedious.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

about 10 years ago

In reply to: Alvaro Herrera (#5)

1 attachment(s)

Re: snapshot too old, configured by time

On Tuesday, September 15, 2015 12:07 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I would place it inside src/test/modules, so that buildfarm
runs it automatically; I'm not sure it will pick up things in
src/test.

As it stands now, the test is getting run as part of `make
check-world`, and it seems like src/test/modules is about testing
separate executables, so I don't think it makes sense to move the
tests -- but I could be convinced that I'm missing something.

It's not conceived just as a way to test separate executables; maybe it
is at the moment (though I don't think it is?) but if so that's just an
accident. The intention is to have modules that get tested without them
being installed, which wasn't the case when they were in contrib.

The problem with check-world is that buildfarm doesn't run it. We don't
want to set up separate buildfarm modules for each subdir in src/test;
that would be pretty tedious.

OK, moved.

All other issues raised by Álvaro and Steve have been addressed,
except for this one, which I will argue against:

So if I understand correctly, every call to BufferGetPage needs to have
a TestForOldSnapshot immediately afterwards? It seems easy to later
introduce places that fail to test for old snapshots. What happens if
they do? Does vacuum remove tuples anyway and then the query returns
wrong results? That seems pretty dangerous. Maybe the snapshot could
be an argument of BufferGetPage?

There are 486 occurences of BufferGetPage in the source code, and
this patch follows 36 of them with TestForOldSnapshot. This only
needs to be done when a page is going to be used for a scan which
produces user-visible results. That is, it is *not* needed for
positioning within indexes to add or vacuum away entries, for heap
inserts or deletions (assuming the row to be deleted has already
been found). It seems wrong to modify about 450 BufferGetPage
references to add a NULL parameter; and if we do want to make that
noop change as insurance, it seems like it should be a separate
patch, since the substance of this patch would be buried under the
volume of that.

I will add this to the November CF.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

snapshot-too-old-v3.difftext/x-diffDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5081da0..33b225a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1944,6 +1944,42 @@ include_dir 'conf.d'
         </para>
        </listitem>
       </varlistentry>
+
+      <varlistentry id="guc-old-snapshot-threshold" xreflabel="old_snapshot_threshold">
+       <term><varname>old_snapshot_threshold</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>old_snapshot_threshold</> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the minimum time that a snapshot can be used without risk of a
+         <literal>snapshot too old</> error occurring when using the snapshot.
+         This parameter can only be set at server start.
+        </para>
+
+        <para>
+         Beyond the threshold, old data may be vacuumed away.  This can help
+         prevent bloat in the face of snapshots which remain in use for a
+         long time.  To prevent incorrect results due to cleanup of data which
+         would otherwise be visible to the snapshot, an error is generated
+         when the snapshot is older than this threshold and the snapshot is
+         used to read a page which has been modified since the snapshot was
+         built.
+        </para>
+
+        <para>
+         A value of <literal>-1</> disables this feature, and is the default.
+         Useful values for production work probably range from a small number
+         of hours to a few days.  The setting will be coerced to a granularity
+         of minutes, and small numbers (such as <literal>0</> or
+         <literal>1min</>) are only allowed because they may sometimes be
+         useful for testing.  While a setting as high as <literal>60d</> is
+         allowed, please note that in many workloads extreme bloat or
+         transaction ID wraparound may occur in much shorter time frames.
+        </para>
+       </listitem>
+      </varlistentry>
      </variablelist>
     </sect2>
    </sect1>
@@ -2869,6 +2905,10 @@ include_dir 'conf.d'
         You should also consider setting <varname>hot_standby_feedback</>
         on standby server(s) as an alternative to using this parameter.
        </para>
+       <para>
+        This does not prevent cleanup of dead rows which have reached the age
+        specified by <varname>old_snapshot_threshold</>.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -3016,6 +3056,16 @@ include_dir 'conf.d'
         until it eventually reaches the primary.  Standbys make no other use
         of feedback they receive other than to pass upstream.
        </para>
+       <para>
+        This setting does not override the behavior of
+        <varname>old_snapshot_threshold</> on the primary; a snapshot on the
+        standby which exceeds the primary's age threshold can become invalid,
+        resulting in cancellation of transactions on the standby.  This is
+        because <varname>old_snapshot_threshold</> is intended to provide an
+        absolute limit on the time which dead rows can contribute to bloat,
+        which would otherwise be violated because of the configuration of a
+        standby.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 99337b0..6e65a8d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -96,7 +96,7 @@ brininsert(PG_FUNCTION_ARGS)
 	MemoryContext tupcxt = NULL;
 	MemoryContext oldcxt = NULL;
 
-	revmap = brinRevmapInitialize(idxRel, &pagesPerRange);
+	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
 
 	for (;;)
 	{
@@ -113,7 +113,7 @@ brininsert(PG_FUNCTION_ARGS)
 		/* normalize the block number to be the first block in the range */
 		heapBlk = (heapBlk / pagesPerRange) * pagesPerRange;
 		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-										 BUFFER_LOCK_SHARE);
+										 BUFFER_LOCK_SHARE, NULL);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -248,7 +248,8 @@ brinbeginscan(PG_FUNCTION_ARGS)
 	scan = RelationGetIndexScan(r, nkeys, norderbys);
 
 	opaque = (BrinOpaque *) palloc(sizeof(BrinOpaque));
-	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange);
+	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange,
+											   scan->xs_snapshot);
 	opaque->bo_bdesc = brin_build_desc(r);
 	scan->opaque = opaque;
 
@@ -333,7 +334,8 @@ bringetbitmap(PG_FUNCTION_ARGS)
 		MemoryContextResetAndDeleteChildren(perRangeCxt);
 
 		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
-									   &off, &size, BUFFER_LOCK_SHARE);
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
 		if (tup)
 		{
 			tup = brin_copy_tuple(tup, size);
@@ -637,7 +639,7 @@ brinbuild(PG_FUNCTION_ARGS)
 	/*
 	 * Initialize our state, including the deformed tuple state.
 	 */
-	revmap = brinRevmapInitialize(index, &pagesPerRange);
+	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 	state = initialize_brin_buildstate(index, revmap, pagesPerRange);
 
 	/*
@@ -1007,7 +1009,8 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 		 * the same.)
 		 */
 		phtup = brinGetTupleForHeapBlock(state->bs_rmAccess, heapBlk, &phbuf,
-										 &offset, &phsz, BUFFER_LOCK_SHARE);
+										 &offset, &phsz, BUFFER_LOCK_SHARE,
+										 NULL);
 		/* the placeholder tuple must exist */
 		if (phtup == NULL)
 			elog(ERROR, "missing placeholder tuple");
@@ -1042,7 +1045,7 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 	BlockNumber pagesPerRange;
 	Buffer		buf;
 
-	revmap = brinRevmapInitialize(index, &pagesPerRange);
+	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 
 	/*
 	 * Scan the revmap to find unsummarized items.
@@ -1057,7 +1060,7 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 		CHECK_FOR_INTERRUPTS();
 
 		tup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-									   BUFFER_LOCK_SHARE);
+									   BUFFER_LOCK_SHARE, NULL);
 		if (tup == NULL)
 		{
 			/* no revmap entry for this heap range. Summarize it. */
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 6ddcfda..6e42881 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -68,15 +68,19 @@ static void revmap_physical_extend(BrinRevmap *revmap);
  * brinRevmapTerminate when caller is done with it.
  */
 BrinRevmap *
-brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
+brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange,
+					 Snapshot snapshot)
 {
 	BrinRevmap *revmap;
 	Buffer		meta;
 	BrinMetaPageData *metadata;
+	Page		page;
 
 	meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
 	LockBuffer(meta, BUFFER_LOCK_SHARE);
-	metadata = (BrinMetaPageData *) PageGetContents(BufferGetPage(meta));
+	page = BufferGetPage(meta);
+	TestForOldSnapshot(snapshot, idxrel, page);
+	metadata = (BrinMetaPageData *) PageGetContents(page);
 
 	revmap = palloc(sizeof(BrinRevmap));
 	revmap->rm_irel = idxrel;
@@ -185,7 +189,8 @@ brinSetHeapBlockItemptr(Buffer buf, BlockNumber pagesPerRange,
  */
 BrinTuple *
 brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
-						 Buffer *buf, OffsetNumber *off, Size *size, int mode)
+						 Buffer *buf, OffsetNumber *off, Size *size, int mode,
+						 Snapshot snapshot)
 {
 	Relation	idxRel = revmap->rm_irel;
 	BlockNumber mapBlk;
@@ -262,6 +267,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 		}
 		LockBuffer(*buf, mode);
 		page = BufferGetPage(*buf);
+		TestForOldSnapshot(snapshot, idxRel, page);
 
 		/* If we land on a revmap page, start over */
 		if (BRIN_IS_REGULAR_PAGE(page))
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index f0ff91a..f68629e 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -71,7 +71,7 @@ ginTraverseLock(Buffer buffer, bool searchMode)
  * is share-locked, and stack->parent is NULL.
  */
 GinBtreeStack *
-ginFindLeafPage(GinBtree btree, bool searchMode)
+ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot)
 {
 	GinBtreeStack *stack;
 
@@ -90,6 +90,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 		stack->off = InvalidOffsetNumber;
 
 		page = BufferGetPage(stack->buffer);
+		TestForOldSnapshot(snapshot, btree->index, page);
 
 		access = ginTraverseLock(stack->buffer, searchMode);
 
@@ -116,6 +117,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 			stack->buffer = ginStepRight(stack->buffer, btree->index, access);
 			stack->blkno = rightlink;
 			page = BufferGetPage(stack->buffer);
+			TestForOldSnapshot(snapshot, btree->index, page);
 
 			if (!searchMode && GinPageIsIncompleteSplit(page))
 				ginFinishSplit(btree, stack, false, NULL);
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index ec8c94b..597f335 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -1820,7 +1820,7 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 	{
 		/* search for the leaf page where the first item should go to */
 		btree.itemptr = insertdata.items[insertdata.curitem];
-		stack = ginFindLeafPage(&btree, false);
+		stack = ginFindLeafPage(&btree, false, NULL);
 
 		ginInsertValue(&btree, stack, &insertdata, buildStats);
 	}
@@ -1830,7 +1830,8 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
  * Starts a new scan on a posting tree.
  */
 GinBtreeStack *
-ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
+ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno,
+						Snapshot snapshot)
 {
 	GinBtreeStack *stack;
 
@@ -1838,7 +1839,7 @@ ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
 
 	btree->fullScan = TRUE;
 
-	stack = ginFindLeafPage(btree, TRUE);
+	stack = ginFindLeafPage(btree, TRUE, snapshot);
 
 	return stack;
 }
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 54b2db8..7e3be27 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -19,6 +19,7 @@
 #include "miscadmin.h"
 #include "utils/datum.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* GUC parameter */
 int			GinFuzzySearchLimit = 0;
@@ -63,7 +64,7 @@ moveRightIfItNeeded(GinBtreeData *btree, GinBtreeStack *stack)
  */
 static void
 scanPostingTree(Relation index, GinScanEntry scanEntry,
-				BlockNumber rootPostingTree)
+				BlockNumber rootPostingTree, Snapshot snapshot)
 {
 	GinBtreeData btree;
 	GinBtreeStack *stack;
@@ -71,7 +72,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
 	Page		page;
 
 	/* Descend to the leftmost leaf page */
-	stack = ginScanBeginPostingTree(&btree, index, rootPostingTree);
+	stack = ginScanBeginPostingTree(&btree, index, rootPostingTree, snapshot);
 	buffer = stack->buffer;
 	IncrBufferRefCount(buffer); /* prevent unpin in freeGinBtreeStack */
 
@@ -114,7 +115,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
  */
 static bool
 collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
-				   GinScanEntry scanEntry)
+				   GinScanEntry scanEntry, Snapshot snapshot)
 {
 	OffsetNumber attnum;
 	Form_pg_attribute attr;
@@ -145,6 +146,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			return true;
 
 		page = BufferGetPage(stack->buffer);
+		TestForOldSnapshot(snapshot, btree->index, page);
 		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, stack->off));
 
 		/*
@@ -224,7 +226,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			LockBuffer(stack->buffer, GIN_UNLOCK);
 
 			/* Collect all the TIDs in this entry's posting tree */
-			scanPostingTree(btree->index, scanEntry, rootPostingTree);
+			scanPostingTree(btree->index, scanEntry, rootPostingTree, snapshot);
 
 			/*
 			 * We lock again the entry page and while it was unlocked insert
@@ -291,7 +293,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
  * Start* functions setup beginning state of searches: finds correct buffer and pins it.
  */
 static void
-startScanEntry(GinState *ginstate, GinScanEntry entry)
+startScanEntry(GinState *ginstate, GinScanEntry entry, Snapshot snapshot)
 {
 	GinBtreeData btreeEntry;
 	GinBtreeStack *stackEntry;
@@ -316,7 +318,7 @@ restartScanEntry:
 	ginPrepareEntryScan(&btreeEntry, entry->attnum,
 						entry->queryKey, entry->queryCategory,
 						ginstate);
-	stackEntry = ginFindLeafPage(&btreeEntry, true);
+	stackEntry = ginFindLeafPage(&btreeEntry, true, snapshot);
 	page = BufferGetPage(stackEntry->buffer);
 	needUnlock = TRUE;
 
@@ -333,7 +335,7 @@ restartScanEntry:
 		 * for the entry type.
 		 */
 		btreeEntry.findItem(&btreeEntry, stackEntry);
-		if (collectMatchBitmap(&btreeEntry, stackEntry, entry) == false)
+		if (!collectMatchBitmap(&btreeEntry, stackEntry, entry, snapshot))
 		{
 			/*
 			 * GIN tree was seriously restructured, so we will cleanup all
@@ -381,7 +383,7 @@ restartScanEntry:
 			needUnlock = FALSE;
 
 			stack = ginScanBeginPostingTree(&entry->btree, ginstate->index,
-											rootPostingTree);
+											rootPostingTree, snapshot);
 			entry->buffer = stack->buffer;
 
 			/*
@@ -533,7 +535,7 @@ startScan(IndexScanDesc scan)
 	uint32		i;
 
 	for (i = 0; i < so->totalentries; i++)
-		startScanEntry(ginstate, so->entries[i]);
+		startScanEntry(ginstate, so->entries[i], scan->xs_snapshot);
 
 	if (GinFuzzySearchLimit > 0)
 	{
@@ -578,7 +580,8 @@ startScan(IndexScanDesc scan)
  * keep it pinned to prevent interference with vacuum.
  */
 static void
-entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advancePast)
+entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
+				   ItemPointerData advancePast, Snapshot snapshot)
 {
 	Page		page;
 	int			i;
@@ -622,7 +625,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
 			entry->btree.itemptr.ip_posid++;
 		}
 		entry->btree.fullScan = false;
-		stack = ginFindLeafPage(&entry->btree, true);
+		stack = ginFindLeafPage(&entry->btree, true, snapshot);
 
 		/* we don't need the stack, just the buffer. */
 		entry->buffer = stack->buffer;
@@ -732,7 +735,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
  */
 static void
 entryGetItem(GinState *ginstate, GinScanEntry entry,
-			 ItemPointerData advancePast)
+			 ItemPointerData advancePast, Snapshot snapshot)
 {
 	Assert(!entry->isFinished);
 
@@ -855,7 +858,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			/* If we've processed the current batch, load more items */
 			while (entry->offset >= entry->nlist)
 			{
-				entryLoadMoreItems(ginstate, entry, advancePast);
+				entryLoadMoreItems(ginstate, entry, advancePast, snapshot);
 
 				if (entry->isFinished)
 				{
@@ -894,7 +897,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
  */
 static void
 keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
-		   ItemPointerData advancePast)
+		   ItemPointerData advancePast, Snapshot snapshot)
 {
 	ItemPointerData minItem;
 	ItemPointerData curPageLossy;
@@ -941,7 +944,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 		 */
 		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
-			entryGetItem(ginstate, entry, advancePast);
+			entryGetItem(ginstate, entry, advancePast, snapshot);
 			if (entry->isFinished)
 				continue;
 		}
@@ -999,7 +1002,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 
 		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
-			entryGetItem(ginstate, entry, advancePast);
+			entryGetItem(ginstate, entry, advancePast, snapshot);
 			if (entry->isFinished)
 				continue;
 		}
@@ -1208,7 +1211,8 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
 			GinScanKey	key = so->keys + i;
 
 			/* Fetch the next item for this key that is > advancePast. */
-			keyGetItem(&so->ginstate, so->tempCtx, key, advancePast);
+			keyGetItem(&so->ginstate, so->tempCtx, key, advancePast,
+					   scan->xs_snapshot);
 
 			if (key->isFinished)
 				return false;
@@ -1330,6 +1334,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
 	for (;;)
 	{
 		page = BufferGetPage(pos->pendingBuffer);
+		TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
 
 		maxoff = PageGetMaxOffsetNumber(page);
 		if (pos->firstOffset > maxoff)
@@ -1510,6 +1515,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
 			   sizeof(bool) * (pos->lastOffset - pos->firstOffset));
 
 		page = BufferGetPage(pos->pendingBuffer);
+		TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
 
 		for (i = 0; i < so->nkeys; i++)
 		{
@@ -1696,12 +1702,15 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 	int			i;
 	pendingPosition pos;
 	Buffer		metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO);
+	Page		page;
 	BlockNumber blkno;
 
 	*ntids = 0;
 
 	LockBuffer(metabuffer, GIN_SHARE);
-	blkno = GinPageGetMeta(BufferGetPage(metabuffer))->head;
+	page = BufferGetPage(metabuffer);
+	TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
+	blkno = GinPageGetMeta(page)->head;
 
 	/*
 	 * fetch head of list before unlocking metapage. head page must be pinned
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 49e9185..9c999c1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -192,7 +192,7 @@ ginEntryInsert(GinState *ginstate,
 
 	ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
 
-	stack = ginFindLeafPage(&btree, false);
+	stack = ginFindLeafPage(&btree, false, NULL);
 	page = BufferGetPage(stack->buffer);
 
 	if (btree.findItem(&btree, stack))
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index ce8e582..e41756a 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -337,6 +337,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	LockBuffer(buffer, GIST_SHARE);
 	gistcheckpage(scan->indexRelation, buffer);
 	page = BufferGetPage(buffer);
+	TestForOldSnapshot(scan->xs_snapshot, r, page);
 	opaque = GistPageGetOpaque(page);
 
 	/*
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 24b06a5..356dc38 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -252,6 +252,7 @@ hashgettuple(PG_FUNCTION_ARGS)
 		buf = so->hashso_curbuf;
 		Assert(BufferIsValid(buf));
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(scan->xs_snapshot, rel, page);
 		maxoffnum = PageGetMaxOffsetNumber(page);
 		for (offnum = ItemPointerGetOffsetNumber(current);
 			 offnum <= maxoffnum;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index e9d7b7f..3c57170 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -188,7 +188,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	/* Read the metapage */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	page = BufferGetPage(metabuf);
+	TestForOldSnapshot(scan->xs_snapshot, rel, page);
+	metap = HashPageGetMeta(page);
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -241,6 +243,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	/* Fetch the primary bucket page for the bucket */
 	buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
 	page = BufferGetPage(buf);
+	TestForOldSnapshot(scan->xs_snapshot, rel, page);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
 
@@ -347,6 +350,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 					_hash_readnext(rel, &buf, &page, &opaque);
 					if (BufferIsValid(buf))
 					{
+						TestForOldSnapshot(scan->xs_snapshot, rel, page);
 						maxoff = PageGetMaxOffsetNumber(page);
 						offnum = _hash_binsearch(page, so->hashso_sk_hash);
 					}
@@ -388,6 +392,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 					_hash_readprev(rel, &buf, &page, &opaque);
 					if (BufferIsValid(buf))
 					{
+						TestForOldSnapshot(scan->xs_snapshot, rel, page);
 						maxoff = PageGetMaxOffsetNumber(page);
 						offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
 					}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bcf9871..bd9cc4b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -382,6 +382,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 	dp = (Page) BufferGetPage(buffer);
+	TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 	lines = PageGetMaxOffsetNumber(dp);
 	ntup = 0;
 
@@ -512,6 +513,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 		/* page and lineoff now reference the physically next tid */
 
@@ -554,6 +556,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 
 		if (!scan->rs_inited)
@@ -588,6 +591,7 @@ heapgettup(HeapScanDesc scan,
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -712,6 +716,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber((Page) dp);
 		linesleft = lines;
 		if (backward)
@@ -786,6 +791,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 		/* page and lineindex now reference the next visible tid */
 
@@ -826,6 +832,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 
 		if (!scan->rs_inited)
@@ -859,6 +866,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -973,6 +981,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		heapgetpage(scan, page);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 		linesleft = lines;
 		if (backward)
@@ -1648,6 +1657,7 @@ heap_fetch(Relation relation,
 	 */
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 	page = BufferGetPage(buffer);
+	TestForOldSnapshot(snapshot, relation, page);
 
 	/*
 	 * We'd better check for out-of-range offnum in case of VACUUM since the
@@ -1977,6 +1987,7 @@ heap_get_latest_tid(Relation relation,
 		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid));
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buffer);
+		TestForOldSnapshot(snapshot, relation, page);
 
 		/*
 		 * Check for bogus item number.  This is not treated as an error
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 563e5c3..3218fd7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 * need to use the horizon that includes slots, otherwise the data-only
 	 * horizon can be used. Note that the toast relation of user defined
 	 * relations are *not* considered catalog relations.
+	 *
+	 * It is OK to apply the old snapshot limit before acquiring the cleanup
+	 * lock because the worst that can happen is that we are not quite as
+	 * aggressive about the cleanup (by however many transaction IDs are
+	 * consumed between this point and acquiring the lock).  This allows us to
+	 * save significant overhead in the case where the page is found not to be
+	 * prunable.
 	 */
 	if (IsCatalogRelation(relation) ||
 		RelationIsAccessibleInLogicalDecoding(relation))
 		OldestXmin = RecentGlobalXmin;
 	else
-		OldestXmin = RecentGlobalDataXmin;
+		OldestXmin =
+				TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+													relation);
 
 	Assert(TransactionIdIsValid(OldestXmin));
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..8b9b1b2 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -119,7 +119,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 
 top:
 	/* find the first page containing this key */
-	stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
+	stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE, NULL);
 
 	offset = InvalidOffsetNumber;
 
@@ -135,7 +135,7 @@ top:
 	 * precise description.
 	 */
 	buf = _bt_moveright(rel, buf, natts, itup_scankey, false,
-						true, stack, BT_WRITE);
+						true, stack, BT_WRITE, NULL);
 
 	/*
 	 * If we're not allowing duplicates, make sure the key isn't already in
@@ -1671,7 +1671,8 @@ _bt_insert_parent(Relation rel,
 			elog(DEBUG2, "concurrent ROOT page split");
 			lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
 			/* Find the leftmost page at the next level up */
-			pbuf = _bt_get_endpoint(rel, lpageop->btpo.level + 1, false);
+			pbuf = _bt_get_endpoint(rel, lpageop->btpo.level + 1, false,
+									NULL);
 			/* Set up a phony stack entry pointing there */
 			stack = &fakestack;
 			stack->bts_blkno = BufferGetBlockNumber(pbuf);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 6e65db9..0ef27ef 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1255,7 +1255,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				itup_scankey = _bt_mkscankey(rel, targetkey);
 				/* find the leftmost leaf page containing this key */
 				stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey,
-								   false, &lbuf, BT_READ);
+								   false, &lbuf, BT_READ, NULL);
 				/* don't need a pin on the page */
 				_bt_relbuf(rel, lbuf);
 
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index d69a057..bb0bf72 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -30,7 +30,7 @@ static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
 			 OffsetNumber offnum, IndexTuple itup);
 static bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
-static Buffer _bt_walk_left(Relation rel, Buffer buf);
+static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
 
@@ -79,6 +79,10 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
  * address of the leaf-page buffer, which is read-locked and pinned.
  * No locks are held on the parent pages, however!
  *
+ * If the snapshot parameter is not NULL, "old snapshot" checking will take
+ * place during the descent through the tree.  This is not needed when
+ * positioning for an insert or delete, so NULL is used for those cases.
+ *
  * NOTE that the returned buffer is read-locked regardless of the access
  * parameter.  However, access = BT_WRITE will allow an empty root page
  * to be created and returned.  When access = BT_READ, an empty index
@@ -87,7 +91,7 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
  */
 BTStack
 _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
-		   Buffer *bufP, int access)
+		   Buffer *bufP, int access, Snapshot snapshot)
 {
 	BTStack		stack_in = NULL;
 
@@ -96,7 +100,9 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 
 	/* If index is empty and access = BT_READ, no root page is created. */
 	if (!BufferIsValid(*bufP))
+	{
 		return (BTStack) NULL;
+	}
 
 	/* Loop iterates once per level descended in the tree */
 	for (;;)
@@ -124,7 +130,7 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 		 */
 		*bufP = _bt_moveright(rel, *bufP, keysz, scankey, nextkey,
 							  (access == BT_WRITE), stack_in,
-							  BT_READ);
+							  BT_READ, snapshot);
 
 		/* if this is a leaf page, we're done */
 		page = BufferGetPage(*bufP);
@@ -197,6 +203,10 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
  * On entry, we have the buffer pinned and a lock of the type specified by
  * 'access'.  If we move right, we release the buffer and lock and acquire
  * the same on the right sibling.  Return value is the buffer we stop at.
+ *
+ * If the snapshot parameter is not NULL, "old snapshot" checking will take
+ * place during the descent through the tree.  This is not needed when
+ * positioning for an insert or delete, so NULL is used for those cases.
  */
 Buffer
 _bt_moveright(Relation rel,
@@ -206,7 +216,8 @@ _bt_moveright(Relation rel,
 			  bool nextkey,
 			  bool forupdate,
 			  BTStack stack,
-			  int access)
+			  int access,
+			  Snapshot snapshot)
 {
 	Page		page;
 	BTPageOpaque opaque;
@@ -232,6 +243,7 @@ _bt_moveright(Relation rel,
 	for (;;)
 	{
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		if (P_RIGHTMOST(opaque))
@@ -970,7 +982,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	 * Use the manufactured insertion scan key to descend the tree and
 	 * position ourselves on the target leaf page.
 	 */
-	stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ);
+	stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ,
+					   scan->xs_snapshot);
 
 	/* don't need to keep the stack around... */
 	_bt_freestack(stack);
@@ -1363,6 +1376,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
 			/* check for deleted page */
 			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1421,7 +1435,8 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			}
 
 			/* Step to next physical page */
-			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf);
+			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf,
+											scan->xs_snapshot);
 
 			/* if we're physically at end of index, return failure */
 			if (so->currPos.buf == InvalidBuffer)
@@ -1436,6 +1451,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			 * and do it all again.
 			 */
 			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1469,7 +1485,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
  * again if it's important.
  */
 static Buffer
-_bt_walk_left(Relation rel, Buffer buf)
+_bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot)
 {
 	Page		page;
 	BTPageOpaque opaque;
@@ -1499,6 +1515,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 		CHECK_FOR_INTERRUPTS();
 		buf = _bt_getbuf(rel, blkno, BT_READ);
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		/*
@@ -1525,12 +1542,14 @@ _bt_walk_left(Relation rel, Buffer buf)
 			blkno = opaque->btpo_next;
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 			page = BufferGetPage(buf);
+			TestForOldSnapshot(snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
 		/* Return to the original page to see what's up */
 		buf = _bt_relandgetbuf(rel, buf, obknum, BT_READ);
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_ISDELETED(opaque))
 		{
@@ -1548,6 +1567,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 				blkno = opaque->btpo_next;
 				buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 				page = BufferGetPage(buf);
+				TestForOldSnapshot(snapshot, rel, page);
 				opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 				if (!P_ISDELETED(opaque))
 					break;
@@ -1584,7 +1604,8 @@ _bt_walk_left(Relation rel, Buffer buf)
  * The returned buffer is pinned and read-locked.
  */
 Buffer
-_bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
+_bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
+				 Snapshot snapshot)
 {
 	Buffer		buf;
 	Page		page;
@@ -1607,6 +1628,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 		return InvalidBuffer;
 
 	page = BufferGetPage(buf);
+	TestForOldSnapshot(snapshot, rel, page);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	for (;;)
@@ -1626,6 +1648,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 					 RelationGetRelationName(rel));
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 			page = BufferGetPage(buf);
+			TestForOldSnapshot(snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
@@ -1678,7 +1701,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	 * version of _bt_search().  We don't maintain a stack since we know we
 	 * won't need it.
 	 */
-	buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(dir));
+	buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(dir), scan->xs_snapshot);
 
 	if (!BufferIsValid(buf))
 	{
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 8a0d909..5aa6740 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -319,7 +319,7 @@ spgLeafTest(Relation index, SpGistScanOpaque so,
  */
 static void
 spgWalk(Relation index, SpGistScanOpaque so, bool scanWholeIndex,
-		storeRes_func storeRes)
+		storeRes_func storeRes, Snapshot snapshot)
 {
 	Buffer		buffer = InvalidBuffer;
 	bool		reportedSome = false;
@@ -360,6 +360,7 @@ redirect:
 		/* else new pointer points to the same page, no work needed */
 
 		page = BufferGetPage(buffer);
+		TestForOldSnapshot(snapshot, index, page);
 
 		isnull = SpGistPageStoresNulls(page) ? true : false;
 
@@ -584,7 +585,7 @@ spggetbitmap(PG_FUNCTION_ARGS)
 	so->tbm = tbm;
 	so->ntids = 0;
 
-	spgWalk(scan->indexRelation, so, true, storeBitmap);
+	spgWalk(scan->indexRelation, so, true, storeBitmap, scan->xs_snapshot);
 
 	PG_RETURN_INT64(so->ntids);
 }
@@ -645,7 +646,8 @@ spggettuple(PG_FUNCTION_ARGS)
 		}
 		so->iPtr = so->nPtrs = 0;
 
-		spgWalk(scan->indexRelation, so, false, storeGettuple);
+		spgWalk(scan->indexRelation, so, false, storeGettuple,
+				scan->xs_snapshot);
 
 		if (so->nPtrs == 0)
 			break;				/* must have completed scan */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 6d55148..ca41ecf 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -489,7 +489,8 @@ vacuum_set_xid_limits(Relation rel,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin = GetOldestXmin(rel, true);
+	*oldestXmin =
+		TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, true), rel);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index a01cfb4..c091d14 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -270,7 +270,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
 	if (possibly_freeable > 0 &&
 		(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
-		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))
+		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION) &&
+		old_snapshot_threshold < 0)
 		lazy_truncate_heap(onerel, vacrelstats);
 
 	/* Vacuum the Free Space Map */
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 32ac58f..d73ffd3 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -43,6 +43,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/snapmgr.h"
 
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
@@ -136,6 +137,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ReplicationOriginShmemSize());
 		size = add_size(size, WalSndShmemSize());
 		size = add_size(size, WalRcvShmemSize());
+		size = add_size(size, SnapMgrShmemSize());
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
@@ -247,6 +249,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	/*
 	 * Set up other modules that need some shared memory space
 	 */
+	SnapMgrInit();
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b13fe03..ffe01c5 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1749,6 +1749,15 @@ GetSnapshotData(Snapshot snapshot)
 	snapshot->regd_count = 0;
 	snapshot->copied = false;
 
+	/*
+	 * Capture the current time and WAL stream location in case this snapshot
+	 * becomes old enough to need to fall back on the special "old snapshot"
+	 * logic.
+	 */
+	snapshot->lsn = GetXLogInsertRecPtr();
+	snapshot->whenTaken = GetSnapshotCurrentTimestamp();
+	MaintainOldSnapshotTimeMapping(snapshot->whenTaken, xmin);
+
 	return snapshot;
 }
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index c557cb6..f8996cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -46,3 +46,4 @@ CommitTsControlLock					38
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
+OldSnapshotTimeMapLock				42
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
index 7b97d45..241998d 100644
--- a/src/backend/utils/errcodes.txt
+++ b/src/backend/utils/errcodes.txt
@@ -413,6 +413,10 @@ Section: Class 58 - System Error (errors external to PostgreSQL itself)
 58P01    E    ERRCODE_UNDEFINED_FILE                                         undefined_file
 58P02    E    ERRCODE_DUPLICATE_FILE                                         duplicate_file
 
+Section: Class 72 - Snapshot Failure
+# (class borrowed from Oracle)
+72000    E    ERRCODE_SNAPSHOT_TOO_OLD                                       snapshot_too_old
+
 Section: Class F0 - Configuration File Error
 
 # (PostgreSQL-specific error class)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 71090f2..5fd6fe7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2558,6 +2558,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"old_snapshot_threshold", PGC_POSTMASTER, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Time before a snapshot is too old to read pages changed after the snapshot was taken."),
+			gettext_noop("A value of -1 disables this feature."),
+			GUC_UNIT_MIN
+		},
+		&old_snapshot_threshold,
+		-1, -1, MINS_PER_HOUR * HOURS_PER_DAY * 60,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"tcp_keepalives_idle", PGC_USERSET, CLIENT_CONN_OTHER,
 			gettext_noop("Time between issuing TCP keepalives."),
 			gettext_noop("A value of 0 uses the system default."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dcf929f..ba27169 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -165,6 +165,8 @@
 #effective_io_concurrency = 1		# 1-1000; 0 disables prefetching
 #max_worker_processes = 8
 #max_parallel_degree = 0		# max number of worker processes per node
+#old_snapshot_threshold = -1		# 1min-60d; -1 disables; 0 is immediate
+									# (change requires restart)
 
 
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 074935c..13c58d2 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -46,14 +46,18 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
 #include "lib/pairingheap.h"
 #include "miscadmin.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinval.h"
+#include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -61,6 +65,62 @@
 
 
 /*
+ * GUC parameters
+ */
+int			old_snapshot_threshold;		/* number of minutes, -1 disables */
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;			/* protect current timestamp */
+	int64		current_timestamp;		/* latest snapshot timestamp */
+	slock_t		mutex_threshold;		/* protect threshold fields */
+	int64		threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;		/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of old_snapshot_threshold means that the buffer is
+	 * full and the head must be advanced to add new entries.  Use timestamps
+	 * aligned to minute boundaries, since that seems less surprising than
+	 * aligning based on the first usage timestamp.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;		/* subscript of oldest tracked time */
+	int64		head_timestamp;		/* time corresponding to head xid */
+	int			count_used;			/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+}	OldSnapshotControlData;
+
+typedef struct OldSnapshotControlData *OldSnapshotControl;
+
+static volatile OldSnapshotControl oldSnapshotControl;
+
+
+/*
  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
  * mode, and to the latest one taken in a read-committed transaction.
  * SecondarySnapshot is a snapshot that's always up-to-date as of the current
@@ -153,6 +213,7 @@ static Snapshot FirstXactSnapshot = NULL;
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
+static int64 AlignTimestampToMinuteBoundary(int64 ts);
 static Snapshot CopySnapshot(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -174,6 +235,47 @@ typedef struct SerializedSnapshotData
 	CommandId	curcid;
 } SerializedSnapshotData;
 
+Size
+SnapMgrShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(OldSnapshotControlData, xid_by_minute);
+	if (old_snapshot_threshold > 0)
+		size = add_size(size, mul_size(sizeof(TransactionId),
+									   old_snapshot_threshold));
+
+	return size;
+}
+
+/*
+ * Initialize for managing old snapshot detection.
+ */
+void
+SnapMgrInit(void)
+{
+	bool		found;
+
+	/*
+	 * Create or attach to the OldSnapshotControl structure.
+	 */
+	oldSnapshotControl = (OldSnapshotControl)
+		ShmemInitStruct("OldSnapshotControlData",
+						SnapMgrShmemSize(), &found);
+
+	if (!found)
+	{
+		SpinLockInit(&oldSnapshotControl->mutex_current);
+		oldSnapshotControl->current_timestamp = 0;
+		SpinLockInit(&oldSnapshotControl->mutex_threshold);
+		oldSnapshotControl->threshold_timestamp = 0;
+		oldSnapshotControl->threshold_xid = InvalidTransactionId;
+		oldSnapshotControl->head_offset = 0;
+		oldSnapshotControl->head_timestamp = 0;
+		oldSnapshotControl->count_used = 0;
+	}
+}
+
 /*
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
@@ -1405,6 +1507,259 @@ ThereAreNoPriorRegisteredSnapshots(void)
 	return false;
 }
 
+
+/*
+ * Return an int64 timestamp which is exactly on a minute boundary.
+ *
+ * If the argument is already aligned, return that value, otherwise move to
+ * the next minute boundary following the given time.
+ */
+static int64
+AlignTimestampToMinuteBoundary(int64 ts)
+{
+	int64		retval = ts + (USECS_PER_MINUTE - 1);
+
+	return retval - (retval % USECS_PER_MINUTE);
+}
+
+/*
+ * Get current timestamp for snapshots as int64 that never moves backward.
+ */
+int64
+GetSnapshotCurrentTimestamp(void)
+{
+	int64		now = GetCurrentIntegerTimestamp();
+
+	/*
+	 * Don't let time move backward; if it hasn't advanced, use the old value.
+	 */
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	if (now <= oldSnapshotControl->current_timestamp)
+		now = oldSnapshotControl->current_timestamp;
+	else
+		oldSnapshotControl->current_timestamp = now;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	return now;
+}
+
+/*
+ * Get timestamp through which vacuum may have processed based on last stored
+ * value for threshold_timestamp.
+ *
+ * XXX: If we can trust a read of an int64 value to be atomic, we can skip the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)
+{
+	int64		threshold_timestamp;
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	threshold_timestamp = oldSnapshotControl->threshold_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	return threshold_timestamp;
+}
+
+/*
+ * TransactionIdLimitedForOldSnapshots
+ *
+ * Apply old snapshot limit, if any.  This is intended to be called for page
+ * pruning and table vacuuming, to allow old_snapshot_threshold to override
+ * the normal global xmin value.  Actual testing for snapshot too old will be
+ * based on whether a snapshot timestamp is prior to the threshold timestamp
+ * set in this function.
+ */
+TransactionId
+TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+									Relation relation)
+{
+	if (TransactionIdIsNormal(recentXmin)
+		&& old_snapshot_threshold >= 0
+		&& RelationNeedsWAL(relation)
+		&& !IsCatalogRelation(relation)
+		&& !RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		int64		ts;
+		TransactionId xlimit = recentXmin;
+		bool		same_ts_as_threshold = false;
+
+		ts = AlignTimestampToMinuteBoundary(GetSnapshotCurrentTimestamp())
+			 - (old_snapshot_threshold * USECS_PER_MINUTE);
+
+		/* Check for fast exit without LW locking. */
+		SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+		if (ts == oldSnapshotControl->threshold_timestamp)
+		{
+			xlimit = oldSnapshotControl->threshold_xid;
+			same_ts_as_threshold = true;
+		}
+		SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+		if (!same_ts_as_threshold)
+		{
+			LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+
+			if (oldSnapshotControl->count_used > 0
+				&& ts >= oldSnapshotControl->head_timestamp)
+			{
+				int		offset;
+
+				offset = ((ts - oldSnapshotControl->head_timestamp)
+						  / USECS_PER_MINUTE);
+				if (offset > oldSnapshotControl->count_used - 1)
+					offset = oldSnapshotControl->count_used - 1;
+				offset = (oldSnapshotControl->head_offset + offset)
+						% old_snapshot_threshold;
+				xlimit = oldSnapshotControl->xid_by_minute[offset];
+
+				if (NormalTransactionIdFollows(xlimit, recentXmin))
+				{
+					SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+					oldSnapshotControl->threshold_timestamp = ts;
+					oldSnapshotControl->threshold_xid = xlimit;
+					SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+				}
+			}
+
+			LWLockRelease(OldSnapshotTimeMapLock);
+		}
+
+		if (NormalTransactionIdFollows(xlimit, recentXmin))
+			return xlimit;
+	}
+
+	return recentXmin;
+}
+
+/*
+ * Take care of the circular buffer that maps time to xid.
+ */
+void
+MaintainOldSnapshotTimeMapping(int64 whenTaken, TransactionId xmin)
+{
+	int64		ts;
+
+	/*
+	 * Fast exit when old_snapshot_threshold is not used or when snapshots
+	 * should immediately be considered "old" (for testing purposes).
+	 */
+	if (old_snapshot_threshold <= 0)
+		return;
+
+	/*
+	 * We don't want to do something stupid with unusual values, but we don't
+	 * want to litter the log with warnings or break otherwise normal
+	 * processing for this feature; so if something seems unreasonable, just
+	 * log at DEBUG level and return without doing anything.
+	 */
+	if (whenTaken < 0)
+	{
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with negative whenTaken = %ld",
+			 (long) whenTaken);
+		return;
+	}
+	if (!TransactionIdIsNormal(xmin))
+	{
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with xmin = %lu",
+			 (unsigned long) xmin);
+		return;
+	}
+
+	ts = AlignTimestampToMinuteBoundary(whenTaken);
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+
+	Assert(oldSnapshotControl->head_offset >= 0);
+	Assert(oldSnapshotControl->head_offset < old_snapshot_threshold);
+	Assert((oldSnapshotControl->head_timestamp % USECS_PER_MINUTE) == 0);
+	Assert(oldSnapshotControl->count_used >= 0);
+	Assert(oldSnapshotControl->count_used <= old_snapshot_threshold);
+
+	if (oldSnapshotControl->count_used == 0)
+	{
+		/* set up first entry for empty mapping */
+		oldSnapshotControl->head_offset = 0;
+		oldSnapshotControl->head_timestamp = ts;
+		oldSnapshotControl->count_used = 1;
+		oldSnapshotControl->xid_by_minute[0] = xmin;
+	}
+	else if (ts < oldSnapshotControl->head_timestamp)
+	{
+		/* old ts; log it at DEBUG */
+		LWLockRelease(OldSnapshotTimeMapLock);
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with old whenTaken = %ld",
+			 (long) whenTaken);
+		return;
+	}
+	else if (ts <= (oldSnapshotControl->head_timestamp +
+					((oldSnapshotControl->count_used - 1)
+					 * USECS_PER_MINUTE)))
+	{
+		/* existing mapping; advance xid if possible */
+		int		bucket = (oldSnapshotControl->head_offset
+						  + ((ts - oldSnapshotControl->head_timestamp)
+							 / USECS_PER_MINUTE))
+						 % old_snapshot_threshold;
+
+		if (TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[bucket], xmin))
+			oldSnapshotControl->xid_by_minute[bucket] = xmin;
+	}
+	else
+	{
+		/* We need a new bucket, but it might not be the very next one. */
+		int		advance = ((ts - oldSnapshotControl->head_timestamp)
+						   / USECS_PER_MINUTE);
+
+		oldSnapshotControl->head_timestamp = ts;
+
+		if (advance >= old_snapshot_threshold)
+		{
+			/* Advance is so far that all old data is junk; start over. */
+			oldSnapshotControl->head_offset = 0;
+			oldSnapshotControl->count_used = 1;
+			oldSnapshotControl->xid_by_minute[0] = xmin;
+		}
+		else
+		{
+			/* Store the new value in one or more buckets. */
+			int i;
+
+			for (i = 0; i < advance; i++)
+			{
+				if (oldSnapshotControl->count_used == old_snapshot_threshold)
+				{
+					/* Map full and new value replaces old head. */
+					int		old_head = oldSnapshotControl->head_offset;
+
+					if (old_head == (old_snapshot_threshold - 1))
+						oldSnapshotControl->head_offset = 0;
+					else
+						oldSnapshotControl->head_offset = old_head + 1;
+					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+				}
+				else
+				{
+					/* Extend map to unused entry. */
+					int		new_tail = (oldSnapshotControl->head_offset
+										+ oldSnapshotControl->count_used)
+									   % old_snapshot_threshold;
+
+					oldSnapshotControl->count_used++;
+					oldSnapshotControl->xid_by_minute[new_tail] = xmin;
+				}
+			}
+		}
+	}
+
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/access/brin_revmap.h b/src/include/access/brin_revmap.h
index cca6ec5..300838c 100644
--- a/src/include/access/brin_revmap.h
+++ b/src/include/access/brin_revmap.h
@@ -18,12 +18,13 @@
 #include "storage/itemptr.h"
 #include "storage/off.h"
 #include "utils/relcache.h"
+#include "utils/snapshot.h"
 
 /* struct definition lives in brin_revmap.c */
 typedef struct BrinRevmap BrinRevmap;
 
 extern BrinRevmap *brinRevmapInitialize(Relation idxrel,
-					 BlockNumber *pagesPerRange);
+					 BlockNumber *pagesPerRange, Snapshot snapshot);
 extern void brinRevmapTerminate(BrinRevmap *revmap);
 
 extern void brinRevmapExtend(BrinRevmap *revmap,
@@ -34,6 +35,6 @@ extern void brinSetHeapBlockItemptr(Buffer rmbuf, BlockNumber pagesPerRange,
 						BlockNumber heapBlk, ItemPointerData tid);
 extern BrinTuple *brinGetTupleForHeapBlock(BrinRevmap *revmap,
 						 BlockNumber heapBlk, Buffer *buf, OffsetNumber *off,
-						 Size *size, int mode);
+						 Size *size, int mode, Snapshot snapshot);
 
 #endif   /* BRIN_REVMAP_H */
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 5021887..173f8b4 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -699,7 +699,7 @@ typedef struct
  * PostingItem
  */
 
-extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode);
+extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot);
 extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
@@ -727,7 +727,7 @@ extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
 extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats);
-extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno);
+extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno, Snapshot snapshot);
 extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
 
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 9e48efd..00bbaf1 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -701,17 +701,18 @@ extern int	_bt_pagedel(Relation rel, Buffer buf);
  */
 extern BTStack _bt_search(Relation rel,
 		   int keysz, ScanKey scankey, bool nextkey,
-		   Buffer *bufP, int access);
+		   Buffer *bufP, int access, Snapshot snapshot);
 extern Buffer _bt_moveright(Relation rel, Buffer buf, int keysz,
 			  ScanKey scankey, bool nextkey, bool forupdate, BTStack stack,
-			  int access);
+			  int access, Snapshot snapshot);
 extern OffsetNumber _bt_binsrch(Relation rel, Buffer buf, int keysz,
 			ScanKey scankey, bool nextkey);
 extern int32 _bt_compare(Relation rel, int keysz, ScanKey scankey,
 			Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
-extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost);
+extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
+							   Snapshot snapshot);
 
 /*
  * prototypes for functions in nbtutils.c
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 0f59201..fe80f3b 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -14,11 +14,14 @@
 #ifndef BUFMGR_H
 #define BUFMGR_H
 
+#include "catalog/catalog.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
+#include "utils/snapmgr.h"
+#include "utils/tqual.h"
 
 typedef void *Block;
 
@@ -148,6 +151,37 @@ extern PGDLLIMPORT int32 *LocalRefCount;
 #define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
 
 /*
+ * Check whether the given snapshot is too old to have safely read the given
+ * page from the given table.  If so, throw a "snapshot too old" error.
+ *
+ * This test generally needs to be performed after every BufferGetPage() call
+ * that is executed as part of a scan.  It is not needed for calls made for
+ * modifying the page (for example, to position to the right place to insert a
+ * new index tuple or for vacuuming).
+ *
+ * Note that a NULL snapshot argument is allowed and causes a fast return
+ * without error; this is to support call sites which can be called from
+ * either scans or index modification areas.
+ *
+ * This is a macro for speed; keep the tests that are fastest and/or most
+ * likely to exclude a page from old snapshot testing near the front.
+ */
+#define TestForOldSnapshot(snapshot, relation, page) \
+	do { \
+		if (old_snapshot_threshold >= 0 \
+		 && (snapshot) != NULL \
+		 && (snapshot)->satisfies == HeapTupleSatisfiesMVCC \
+		 && !XLogRecPtrIsInvalid((snapshot)->lsn) \
+		 && PageGetLSN(page) > (snapshot)->lsn \
+		 && !IsCatalogRelation(relation) \
+		 && !RelationIsAccessibleInLogicalDecoding(relation) \
+		 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) \
+			ereport(ERROR, \
+					(errcode(ERRCODE_SNAPSHOT_TOO_OLD), \
+					 errmsg("snapshot too old"))); \
+	} while (0)
+
+/*
  * prototypes for functions in bufmgr.c
  */
 extern bool ComputeIoConcurrency(int io_concurrency, double *target);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 8a55a09..77d43bd 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -15,6 +15,7 @@
 #define REL_H
 
 #include "access/tupdesc.h"
+#include "access/xlog.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_index.h"
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f8524eb..09a33c1 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -14,10 +14,20 @@
 #define SNAPMGR_H
 
 #include "fmgr.h"
+#include "utils/relcache.h"
 #include "utils/resowner.h"
 #include "utils/snapshot.h"
 
 
+/* GUC variables */
+extern int	old_snapshot_threshold;
+
+
+extern Size SnapMgrShmemSize(void);
+extern void SnapMgrInit(void);
+extern int64 GetSnapshotCurrentTimestamp(void);
+extern int64 GetOldSnapshotThresholdTimestamp(void);
+
 extern bool FirstSnapshotSet;
 
 extern TransactionId TransactionXmin;
@@ -54,6 +64,9 @@ extern void ImportSnapshot(const char *idstr);
 extern bool XactHasExportedSnapshots(void);
 extern void DeleteAllExportedSnapshotFiles(void);
 extern bool ThereAreNoPriorRegisteredSnapshots(void);
+extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+														 Relation relation);
+extern void MaintainOldSnapshotTimeMapping(int64 whenTaken, TransactionId xmin);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index cbf1bbd..e4e091d 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -14,6 +14,7 @@
 #define SNAPSHOT_H
 
 #include "access/htup.h"
+#include "access/xlogdefs.h"
 #include "lib/pairingheap.h"
 #include "storage/buf.h"
 
@@ -105,6 +106,9 @@ typedef struct SnapshotData
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
+
+	int64		whenTaken;		/* timestamp when snapshot was taken */
+	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
 } SnapshotData;
 
 /*
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 6167ec1..299dc4f 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -8,6 +8,7 @@ SUBDIRS = \
 		  brin \
 		  commit_ts \
 		  dummy_seclabel \
+		  sto \
 		  test_ddl_deparse \
 		  test_extensions \
 		  test_parser \
diff --git a/src/test/modules/sto/.gitignore b/src/test/modules/sto/.gitignore
new file mode 100644
index 0000000..e07b677
--- /dev/null
+++ b/src/test/modules/sto/.gitignore
@@ -0,0 +1,2 @@
+# Generated by regression tests
+/tmp_check/
diff --git a/src/test/modules/sto/Makefile b/src/test/modules/sto/Makefile
new file mode 100644
index 0000000..55d3822
--- /dev/null
+++ b/src/test/modules/sto/Makefile
@@ -0,0 +1,20 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/modules/sto
+#
+# Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/modules/sto/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/modules/sto
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
+check:
+	$(prove_check)
diff --git a/src/test/modules/sto/t/001_snapshot_too_old.pl b/src/test/modules/sto/t/001_snapshot_too_old.pl
new file mode 100644
index 0000000..6b1a431
--- /dev/null
+++ b/src/test/modules/sto/t/001_snapshot_too_old.pl
@@ -0,0 +1,124 @@
+use strict;
+use warnings;
+
+use Config;
+use TestLib;
+use DBI;
+use DBD::Pg ':async';
+
+use Test::More tests => 9;
+
+my $tempdir       = TestLib::tempdir;
+
+my $driver   = "Pg";
+my $database = "postgres";
+
+sub log_sleep
+{
+	my ($t) = @_;
+	print "sleeping $t seconds...";
+	sleep $t;
+	print "done!\n";
+}
+
+sub log_sleep_to_new_minute
+{
+	my ($sec) = localtime;
+	log_sleep (61 - $sec);
+}
+
+sub connect_ok
+{
+	my ($db) = @_;
+	my $dbh = DBI->connect("dbi:$driver:database=$db", '', '',
+						   {AutoCommit => 1, RaiseError => 0, PrintError => 1})
+		or print $DBI::errstr;
+	ok((defined $dbh), "get DBI connection to $database");
+	return $dbh;
+}
+
+sub setting_ok
+{
+	my ($dbh, $setting, $expected_value) = @_;
+	my $actual_value = $dbh->selectrow_array("SHOW $setting");
+	is($actual_value, $expected_value, "check $setting");
+}
+
+sub select_scalar_ok
+{
+	my ($dbh, $statement, $expected_value) = @_;
+	my $actual_value = $dbh->selectrow_array($statement);
+	is($actual_value, $expected_value, "SELECT scalar value");
+}
+
+sub fetch_first_ok
+{
+	my ($dbh) = @_;
+	select_scalar_ok($dbh, 'FETCH FIRST FROM cursor1', '1');
+}
+
+sub snapshot_too_old_ok
+{
+	my ($dbh) = @_;
+	my $actual_value = $dbh->selectrow_array('FETCH FIRST FROM cursor1');
+	my $errstr = $dbh->errstr;
+	is($errstr, 'ERROR:  snapshot too old', 'expect "snapshot too old" error');
+}
+
+# Initialize cluster
+start_test_server($tempdir, qq(
+old_snapshot_threshold = '1min'
+autovacuum = off
+enable_indexonlyscan = off
+max_connections = 10
+log_statement = 'all'
+));
+
+# Confirm that we can connect to the database.
+my $conn1 = connect_ok($database);
+my $conn2 = connect_ok($database);
+
+# Confirm that the settings "took".
+setting_ok($conn1, 'old_snapshot_threshold', '1min');
+
+# Do some setup.
+$conn1->do(qq(CREATE TABLE t (c int NOT NULL)));
+$conn1->do(qq(INSERT INTO t SELECT generate_series(1, 1000)));
+$conn1->do(qq(VACUUM ANALYZE t));
+$conn1->do(qq(CREATE TABLE u (c int NOT NULL)));
+
+# Open long-lived cursor
+$conn1->begin_work;
+$conn1->do(qq(DECLARE cursor1 CURSOR FOR SELECT c FROM t));
+
+# Bump the last-used transaction number.
+$conn2->do(qq(INSERT INTO u VALUES (1)));
+
+# Confirm immediate fetch works.
+fetch_first_ok($conn1);
+
+# Confirm delayed fetch works with no modifications.
+log_sleep_to_new_minute;
+fetch_first_ok($conn1);
+
+# Try again to confirm no pruning affect.
+fetch_first_ok($conn1);
+
+# Confirm that fetch immediately after update is OK.
+# This assumes that they happen within one minute of each other.
+# Still using same snapshot, so value shouldn't change.
+$conn2->do(qq(UPDATE t SET c = 1001 WHERE c = 1));
+fetch_first_ok($conn1);
+
+# Try again to confirm no pruning affect.
+$conn2->do(qq(VACUUM ANALYZE t));
+fetch_first_ok($conn1);
+
+# Passage of time should now make this get the "snapshot too old" error.
+log_sleep_to_new_minute;
+snapshot_too_old_ok($conn1);
+$conn1->rollback;
+
+# Clean up a bit, just to minimize noise in log file.
+$conn2->disconnect;
+$conn1->disconnect;
diff --git a/src/test/modules/sto/t/002_snapshot_too_old_select.pl b/src/test/modules/sto/t/002_snapshot_too_old_select.pl
new file mode 100644
index 0000000..d0985c3
--- /dev/null
+++ b/src/test/modules/sto/t/002_snapshot_too_old_select.pl
@@ -0,0 +1,126 @@
+use strict;
+use warnings;
+
+use Config;
+use TestLib;
+use DBI;
+use DBD::Pg ':async';
+
+use Test::More tests => 9;
+
+my $tempdir       = TestLib::tempdir;
+
+my $driver   = "Pg";
+my $database = "postgres";
+
+sub log_sleep
+{
+	my ($t) = @_;
+	print "sleeping $t seconds...";
+	sleep $t;
+	print "done!\n";
+}
+
+sub log_sleep_to_new_minute
+{
+	my ($sec) = localtime;
+	log_sleep (61 - $sec);
+}
+
+sub connect_ok
+{
+	my ($db) = @_;
+	my $dbh = DBI->connect("dbi:$driver:database=$db", '', '',
+						   {AutoCommit => 1, RaiseError => 0, PrintError => 1})
+		or print $DBI::errstr;
+	ok((defined $dbh), "get DBI connection to $database");
+	return $dbh;
+}
+
+sub setting_ok
+{
+	my ($dbh, $setting, $expected_value) = @_;
+	my $actual_value = $dbh->selectrow_array("SHOW $setting");
+	is($actual_value, $expected_value, "check $setting");
+}
+
+sub select_scalar_ok
+{
+	my ($dbh, $statement, $expected_value) = @_;
+	my $actual_value = $dbh->selectrow_array($statement);
+	is($actual_value, $expected_value, "SELECT scalar value");
+}
+
+sub select_ok
+{
+	my ($dbh) = @_;
+	select_scalar_ok($dbh, 'select c from t order by c limit 1', '1');
+}
+
+sub snapshot_too_old_ok
+{
+	my ($dbh) = @_;
+	my $actual_value = $dbh->selectrow_array('select c from t order by c limit 1');
+	my $errstr = $dbh->errstr;
+	is($errstr, 'ERROR:  snapshot too old', 'expect "snapshot too old" error');
+}
+
+# Initialize cluster
+start_test_server($tempdir, qq(
+old_snapshot_threshold = '1min'
+autovacuum = off
+enable_indexonlyscan = off
+max_connections = 10
+log_statement = 'all'
+));
+
+# Confirm that we can connect to the database.
+my $conn1 = connect_ok($database);
+my $conn2 = connect_ok($database);
+
+# Confirm that the settings "took".
+setting_ok($conn1, 'old_snapshot_threshold', '1min');
+
+# Do some setup.
+$conn1->do(qq(CREATE TABLE t (c int NOT NULL)));
+$conn1->do(qq(INSERT INTO t SELECT generate_series(1, 1000)));
+$conn1->do(qq(VACUUM ANALYZE t));
+$conn1->do(qq(CREATE TABLE u (c int NOT NULL)));
+
+# Open long-lived snapshot
+$conn1->begin_work;
+$conn1->do(qq(set transaction isolation level REPEATABLE READ));
+$conn1->do(qq(SELECT c FROM t));
+
+# Bump the last-used transaction number.
+$conn2->do(qq(INSERT INTO u VALUES (1)));
+
+# Confirm immediate fetch works.
+select_ok($conn1);
+
+# Confirm delayed fetch works with no modifications.
+log_sleep_to_new_minute;
+select_ok($conn1);
+
+# Try again to confirm no pruning affect.
+select_ok($conn1);
+
+# Confirm that fetch immediately after update is OK.
+# This assumes that they happen within one minute of each other.
+# Still using same snapshot, so value shouldn't change.
+$conn2->do(qq(UPDATE t SET c = 1001 WHERE c = 1));
+select_ok($conn1);
+
+# Try again to confirm no pruning affect.
+log_sleep_to_new_minute;
+$conn2->do(qq(VACUUM ANALYZE t));
+select_ok($conn1);
+
+# Passage of time should now make this get the "snapshot too old" error.
+log_sleep_to_new_minute;
+snapshot_too_old_ok($conn1);
+$conn1->rollback;
+
+# Clean up a bit, just to minimize noise in log file.
+$conn2->disconnect;
+$conn1->disconnect;
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 02533eb..5fffc78 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -135,7 +135,7 @@ sub tempdir_short
 # --config-auth).
 sub standard_initdb
 {
-	my $pgdata = shift;
+	my ($pgdata, $confappend) = @_;
 	system_or_bail('initdb', '-D', "$pgdata", '-A' , 'trust', '-N');
 	system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
 
@@ -153,6 +153,11 @@ sub standard_initdb
 		print CONF "unix_socket_directories = '$tempdir_short'\n";
 		print CONF "listen_addresses = ''\n";
 	}
+	if (defined $confappend and length $confappend)
+	{
+		print CONF "\n# Added by test\n";
+		print CONF "$confappend\n";
+	}
 	close CONF;
 
 	$ENV{PGHOST}         = $windows_os ? "127.0.0.1" : $tempdir_short;
@@ -183,11 +188,11 @@ my ($test_server_datadir, $test_server_logfile);
 # Initialize a new cluster for testing in given directory, and start it.
 sub start_test_server
 {
-	my ($tempdir) = @_;
+	my ($tempdir, $confappend) = @_;
 	my $ret;
 
 	print("### Starting test server in $tempdir\n");
-	standard_initdb "$tempdir/pgdata";
+	standard_initdb "$tempdir/pgdata", $confappend;
 
 	$ret = system_log('pg_ctl', '-D', "$tempdir/pgdata", '-w', '-l',
 	  "$log_path/postmaster.log", '-o', "--log-statement=all",

Steve Singer

steve@ssinger.info

about 10 years ago

In reply to: Kevin Grittner (#6)

Re: snapshot too old, configured by time

On 10/15/2015 05:47 PM, Kevin Grittner wrote:

All other issues raised by Álvaro and Steve have been addressed,
except for this one, which I will argue against:

I've been looking through the updated patch

In snapmgr.c

+ * XXX: If we can trust a read of an int64 value to be atomic, we can 
skip the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)

Was your intent with the XXX for it to be a TODO to only aquire the lock
on platforms without the atomic 64bit operations?

On a more general note:

I've tried various manual tests of this feature and it sometimes works
as expected and sometimes doesn't.
I'm getting the feeling that how I expect it to work isn't quite in sync
with how it does work.

I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum
T1: Waits 60 seconds, tries to select from the table. The snapshot
should be too old

For example it seems that in test 002 the select_ok on conn1 following
the vacuum but right before the final sleep is critical to the snapshot
too old error showing up (ie if I remove that select_ok but leave in the
sleep I don't get the error)

Is this intended and if so is there a better way we can explain how
things work?

Also is 0 intended to be an acceptable value for old_snapshot_threshold
and if so what should we expect to see then?

So if I understand correctly, every call to BufferGetPage needs to have
a TestForOldSnapshot immediately afterwards? It seems easy to later
introduce places that fail to test for old snapshots. What happens if
they do? Does vacuum remove tuples anyway and then the query returns
wrong results? That seems pretty dangerous. Maybe the snapshot could
be an argument of BufferGetPage?

There are 486 occurences of BufferGetPage in the source code, and
this patch follows 36 of them with TestForOldSnapshot. This only
needs to be done when a page is going to be used for a scan which
produces user-visible results. That is, it is *not* needed for
positioning within indexes to add or vacuum away entries, for heap
inserts or deletions (assuming the row to be deleted has already
been found). It seems wrong to modify about 450 BufferGetPage
references to add a NULL parameter; and if we do want to make that
noop change as insurance, it seems like it should be a separate
patch, since the substance of this patch would be buried under the
volume of that.

I will add this to the November CF.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

about 10 years ago

In reply to: Steve Singer (#7)

Re: snapshot too old, configured by time

On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer <steve@ssinger.info> wrote:

On 10/15/2015 05:47 PM, Kevin Grittner wrote:

All other issues raised by Álvaro and Steve have been addressed, except
for this one, which I will argue against:

I've been looking through the updated patch

In snapmgr.c
+ * XXX: If we can trust a read of an int64 value to be atomic, we can skip
the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)
Was your intent with the XXX for it to be a TODO to only aquire the lock on
platforms without the atomic 64bit operations?

On a more general note:

I've tried various manual tests of this feature and it sometimes works as
expected and sometimes doesn't.
I'm getting the feeling that how I expect it to work isn't quite in sync
with how it does work.

I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum
T1: Waits 60 seconds, tries to select from the table. The snapshot should
be too old

For example it seems that in test 002 the select_ok on conn1 following the
vacuum but right before the final sleep is critical to the snapshot too old
error showing up (ie if I remove that select_ok but leave in the sleep I
don't get the error)

Is this intended and if so is there a better way we can explain how things
work?

Also is 0 intended to be an acceptable value for old_snapshot_threshold and
if so what should we expect to see then?

There has been a review but no replies for more than 1 month. Returned
with feedback?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@gmail.com

about 10 years ago

In reply to: Michael Paquier (#8)

Re: snapshot too old, configured by time

On Wed, Dec 2, 2015 at 12:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer <steve@ssinger.info> wrote:

In snapmgr.c
+ * XXX: If we can trust a read of an int64 value to be atomic, we can skip the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)
Was your intent with the XXX for it to be a TODO to only aquire the lock on
platforms without the atomic 64bit operations?

I'm not sure whether we can safely assume a read of an int64 to be
atomic on any platform; if we actually can on some platforms, and
we have a #define to tell us whether we are in such an environment,
we could condition the spinlock calls on that. Are we there yet?

On a more general note:

I've tried various manual tests of this feature and it sometimes works as
expected and sometimes doesn't.
I'm getting the feeling that how I expect it to work isn't quite in sync
with how it does work.

I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum
T1: Waits 60 seconds, tries to select from the table. The snapshot should
be too old

For example it seems that in test 002 the select_ok on conn1 following the
vacuum but right before the final sleep is critical to the snapshot too old
error showing up (ie if I remove that select_ok but leave in the sleep I
don't get the error)

Is this intended and if so is there a better way we can explain how things
work?

At every phase I took a conservative approach toward deferring
pruning of tuples still visible to any snapshot -- often reducing
the overhead of tracking by letting things go to the next minute
boundary. The idea is that an extra minute or two probably isn't
going to be a big deal in terms of bloat, so if we can save any
effort on the bookkeeping by letting things go a little longer, it
is a worthwhile trade-off. That does make it hard to give a
precise statement of exactly when a transaction *will* be subject
to cancellation based on this feature, so I have emphasized the
minimum guaranteed time that a transaction will be *safe*. In
reviewing what you describe, I think I still don't have it as
aggressive as I can (and probably should). My biggest concern is
that a long-running transaction which gets a transaction ID
matching the xmin on a snapshot it will hold for a long time may
not be subject to cancellation. That's probably not too hard to
fix, but the bigger problem is the testing.

People have said that issuing SQL commands directly from a TAP test
via DBD::Pg is not acceptable for a core feature, and (despite
assertions to the contrary) I see no way to test this feature with
existing testing mechanisms. The bigger set of work here, if we
don't want this feature to go in without any testing scripts (which
is not acceptable IMO), is to enhance the isolation tester or
hybridize TAP testing with the isolation tester.

Also is 0 intended to be an acceptable value for old_snapshot_threshold and
if so what should we expect to see then?

The docs in the patch say this:

+        <para>
+         A value of <literal>-1</> disables this feature, and is the default.
+         Useful values for production work probably range from a small number
+         of hours to a few days.  The setting will be coerced to a granularity
+         of minutes, and small numbers (such as <literal>0</> or
+         <literal>1min</>) are only allowed because they may sometimes be
+         useful for testing.  While a setting as high as <literal>60d</> is
+         allowed, please note that in many workloads extreme bloat or
+         transaction ID wraparound may occur in much shorter time frames.
+        </para>

Once we can agree on a testing methodology I expect that I will be
adding a number of tests based on a cluster started with
old_snapshot_threshold = 0, but as I said in my initial post of the
patch I was keeping the tests in the patch thin until it was
confirmed whether this testing methodology was acceptable. Since
it isn't, that was just as well. The time put into learning enough
about perl and TAP tests to create those tests already exceeds the
time to develop the actual patch, and it looks like even more will
be needed for a test methodology that doesn't require adding a
package or downloading a CPAN module. C'est la vie. I did expand
my perl and TAP knowledge considerably, for what that's worth.

There has been a review but no replies for more than 1 month. Returned
with feedback?

I do intend to post another version of the patch to tweak the
calculations again, after I can get a patch in to expand the
testing capabilities to allow an acceptable way to test the patch
-- so I put it into the next CF instead.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Michael Paquier

michael.paquier@gmail.com

about 10 years ago

In reply to: Kevin Grittner (#9)

Re: snapshot too old, configured by time

On Thu, Dec 3, 2015 at 5:48 AM, Kevin Grittner wrote:

There has been a review but no replies for more than 1 month. Returned
with feedback?

I do intend to post another version of the patch to tweak the
calculations again, after I can get a patch in to expand the
testing capabilities to allow an acceptable way to test the patch
-- so I put it into the next CF instead.

OK, thanks.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

about 10 years ago

In reply to: Kevin Grittner (#9)

Re: snapshot too old, configured by time

On 2015-12-02 14:48:24 -0600, Kevin Grittner wrote:

On Wed, Dec 2, 2015 at 12:39 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer <steve@ssinger.info> wrote:
In snapmgr.c
+ * XXX: If we can trust a read of an int64 value to be atomic, we can skip the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)
Was your intent with the XXX for it to be a TODO to only aquire the lock on
platforms without the atomic 64bit operations?
I'm not sure whether we can safely assume a read of an int64 to be
atomic on any platform; if we actually can on some platforms, and
we have a #define to tell us whether we are in such an environment,
we could condition the spinlock calls on that. Are we there yet?

We currently don't assume it's atomic. And there are platforms, e.g 32
bit arm, where that's not the case
(c.f. https://wiki.postgresql.org/wiki/Atomics). It'd be rather useful
to abstract that knowledge into a macro...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Alvaro Herrera

alvherre@2ndquadrant.com

about 10 years ago

In reply to: Kevin Grittner (#9)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

People have said that issuing SQL commands directly from a TAP test
via DBD::Pg is not acceptable for a core feature, and (despite
assertions to the contrary) I see no way to test this feature with
existing testing mechanisms. The bigger set of work here, if we
don't want this feature to go in without any testing scripts (which
is not acceptable IMO), is to enhance the isolation tester or
hybridize TAP testing with the isolation tester.

Is it possible to use the PostgresNode stuff to test this? If not,
perhaps if you restate what additional capabilities you need we could
look into adding them there. I suspect that what you need is the
ability to keep more than one session open and feed them commands;
perhaps we could have the framework have a function that opens a psql
process and returns a FD to which the test program can write, using the
IPC::Run stuff (start / pump / finish).

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Kevin Grittner (#9)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

There has been a review but no replies for more than 1 month. Returned
with feedback?

I do intend to post another version of the patch to tweak the
calculations again, after I can get a patch in to expand the
testing capabilities to allow an acceptable way to test the patch
-- so I put it into the next CF instead.

Two months passed since this, and no activity. I'm closing this as
returned-with-feedback now; you're of course free to resubmit to
2016-03.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Alvaro Herrera (#12)

1 attachment(s)

Re: snapshot too old, configured by time

On Fri, Jan 8, 2016 at 5:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

People have said that issuing SQL commands directly from a TAP test
via DBD::Pg is not acceptable for a core feature, and (despite
assertions to the contrary) I see no way to test this feature with
existing testing mechanisms. The bigger set of work here, if we
don't want this feature to go in without any testing scripts (which
is not acceptable IMO), is to enhance the isolation tester or
hybridize TAP testing with the isolation tester.

Is it possible to use the PostgresNode stuff to test this? If not,
perhaps if you restate what additional capabilities you need we could
look into adding them there. I suspect that what you need is the
ability to keep more than one session open and feed them commands;
perhaps we could have the framework have a function that opens a psql
process and returns a FD to which the test program can write, using the
IPC::Run stuff (start / pump / finish).

Resubmitting for the March CF.

The main thing that changed is that I can now run all the
regression and isolation tests using installcheck with
old_snapshot_threshold = 0 and get a clean run. That probably gets
better overall coverage than specific tests to demonstrate the
"snapshot too old" error, but of course we need those, too. While
I can do that with hand-run psql sessions or through connectors
from different languages, I have not been able to wrangle the
testing tools we support through the build system into working for
this purpose. (I had been hoping that the recent improvements to
the TAP testing libraries would give me the traction to get there,
but either it's still not there or my perl-fu is just too weak to
figure out how to use those features -- suggestions welcome.)

Basically, a connection needs to remain open and interleave
commands with other connections, which the isolation tester does
just fine; but it needs to do that using a custom postgresql.conf
file, which TAP does just fine. I haven't been able to see the
right way to get a TAP test to set up a customized installation to
run isolation tests against. If I can get that working, I have
additional tests I can drop into that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

snapshot-too-old-v4.patchinvalid/octet-stream; name=snapshot-too-old-v4.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a09ceb2..6ec434c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1944,6 +1944,42 @@ include_dir 'conf.d'
         </para>
        </listitem>
       </varlistentry>
+
+      <varlistentry id="guc-old-snapshot-threshold" xreflabel="old_snapshot_threshold">
+       <term><varname>old_snapshot_threshold</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>old_snapshot_threshold</> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the minimum time that a snapshot can be used without risk of a
+         <literal>snapshot too old</> error occurring when using the snapshot.
+         This parameter can only be set at server start.
+        </para>
+
+        <para>
+         Beyond the threshold, old data may be vacuumed away.  This can help
+         prevent bloat in the face of snapshots which remain in use for a
+         long time.  To prevent incorrect results due to cleanup of data which
+         would otherwise be visible to the snapshot, an error is generated
+         when the snapshot is older than this threshold and the snapshot is
+         used to read a page which has been modified since the snapshot was
+         built.
+        </para>
+
+        <para>
+         A value of <literal>-1</> disables this feature, and is the default.
+         Useful values for production work probably range from a small number
+         of hours to a few days.  The setting will be coerced to a granularity
+         of minutes, and small numbers (such as <literal>0</> or
+         <literal>1min</>) are only allowed because they may sometimes be
+         useful for testing.  While a setting as high as <literal>60d</> is
+         allowed, please note that in many workloads extreme bloat or
+         transaction ID wraparound may occur in much shorter time frames.
+        </para>
+       </listitem>
+      </varlistentry>
      </variablelist>
     </sect2>
    </sect1>
@@ -2892,6 +2928,10 @@ include_dir 'conf.d'
         You should also consider setting <varname>hot_standby_feedback</>
         on standby server(s) as an alternative to using this parameter.
        </para>
+       <para>
+        This does not prevent cleanup of dead rows which have reached the age
+        specified by <varname>old_snapshot_threshold</>.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -3039,6 +3079,16 @@ include_dir 'conf.d'
         until it eventually reaches the primary.  Standbys make no other use
         of feedback they receive other than to pass upstream.
        </para>
+       <para>
+        This setting does not override the behavior of
+        <varname>old_snapshot_threshold</> on the primary; a snapshot on the
+        standby which exceeds the primary's age threshold can become invalid,
+        resulting in cancellation of transactions on the standby.  This is
+        because <varname>old_snapshot_threshold</> is intended to provide an
+        absolute limit on the time which dead rows can contribute to bloat,
+        which would otherwise be violated because of the configuration of a
+        standby.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c740952..89bad05 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -135,7 +135,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	MemoryContext tupcxt = NULL;
 	MemoryContext oldcxt = NULL;
 
-	revmap = brinRevmapInitialize(idxRel, &pagesPerRange);
+	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
 
 	for (;;)
 	{
@@ -152,7 +152,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		/* normalize the block number to be the first block in the range */
 		heapBlk = (heapBlk / pagesPerRange) * pagesPerRange;
 		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-										 BUFFER_LOCK_SHARE);
+										 BUFFER_LOCK_SHARE, NULL);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -284,7 +284,8 @@ brinbeginscan(Relation r, int nkeys, int norderbys)
 	scan = RelationGetIndexScan(r, nkeys, norderbys);
 
 	opaque = (BrinOpaque *) palloc(sizeof(BrinOpaque));
-	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange);
+	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange,
+											   scan->xs_snapshot);
 	opaque->bo_bdesc = brin_build_desc(r);
 	scan->opaque = opaque;
 
@@ -367,7 +368,8 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 		MemoryContextResetAndDeleteChildren(perRangeCxt);
 
 		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
-									   &off, &size, BUFFER_LOCK_SHARE);
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
 		if (tup)
 		{
 			tup = brin_copy_tuple(tup, size);
@@ -645,7 +647,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/*
 	 * Initialize our state, including the deformed tuple state.
 	 */
-	revmap = brinRevmapInitialize(index, &pagesPerRange);
+	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 	state = initialize_brin_buildstate(index, revmap, pagesPerRange);
 
 	/*
@@ -1040,7 +1042,8 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 		 * the same.)
 		 */
 		phtup = brinGetTupleForHeapBlock(state->bs_rmAccess, heapBlk, &phbuf,
-										 &offset, &phsz, BUFFER_LOCK_SHARE);
+										 &offset, &phsz, BUFFER_LOCK_SHARE,
+										 NULL);
 		/* the placeholder tuple must exist */
 		if (phtup == NULL)
 			elog(ERROR, "missing placeholder tuple");
@@ -1075,7 +1078,7 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 	BlockNumber pagesPerRange;
 	Buffer		buf;
 
-	revmap = brinRevmapInitialize(index, &pagesPerRange);
+	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 
 	/*
 	 * Scan the revmap to find unsummarized items.
@@ -1090,7 +1093,7 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 		CHECK_FOR_INTERRUPTS();
 
 		tup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-									   BUFFER_LOCK_SHARE);
+									   BUFFER_LOCK_SHARE, NULL);
 		if (tup == NULL)
 		{
 			/* no revmap entry for this heap range. Summarize it. */
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index b2c273d..812f76c 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -68,15 +68,19 @@ static void revmap_physical_extend(BrinRevmap *revmap);
  * brinRevmapTerminate when caller is done with it.
  */
 BrinRevmap *
-brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
+brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange,
+					 Snapshot snapshot)
 {
 	BrinRevmap *revmap;
 	Buffer		meta;
 	BrinMetaPageData *metadata;
+	Page		page;
 
 	meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
 	LockBuffer(meta, BUFFER_LOCK_SHARE);
-	metadata = (BrinMetaPageData *) PageGetContents(BufferGetPage(meta));
+	page = BufferGetPage(meta);
+	TestForOldSnapshot(snapshot, idxrel, page);
+	metadata = (BrinMetaPageData *) PageGetContents(page);
 
 	revmap = palloc(sizeof(BrinRevmap));
 	revmap->rm_irel = idxrel;
@@ -185,7 +189,8 @@ brinSetHeapBlockItemptr(Buffer buf, BlockNumber pagesPerRange,
  */
 BrinTuple *
 brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
-						 Buffer *buf, OffsetNumber *off, Size *size, int mode)
+						 Buffer *buf, OffsetNumber *off, Size *size, int mode,
+						 Snapshot snapshot)
 {
 	Relation	idxRel = revmap->rm_irel;
 	BlockNumber mapBlk;
@@ -262,6 +267,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 		}
 		LockBuffer(*buf, mode);
 		page = BufferGetPage(*buf);
+		TestForOldSnapshot(snapshot, idxRel, page);
 
 		/* If we land on a revmap page, start over */
 		if (BRIN_IS_REGULAR_PAGE(page))
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 06ba9cb..dc593c2 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -71,7 +71,7 @@ ginTraverseLock(Buffer buffer, bool searchMode)
  * is share-locked, and stack->parent is NULL.
  */
 GinBtreeStack *
-ginFindLeafPage(GinBtree btree, bool searchMode)
+ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot)
 {
 	GinBtreeStack *stack;
 
@@ -90,6 +90,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 		stack->off = InvalidOffsetNumber;
 
 		page = BufferGetPage(stack->buffer);
+		TestForOldSnapshot(snapshot, btree->index, page);
 
 		access = ginTraverseLock(stack->buffer, searchMode);
 
@@ -116,6 +117,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 			stack->buffer = ginStepRight(stack->buffer, btree->index, access);
 			stack->blkno = rightlink;
 			page = BufferGetPage(stack->buffer);
+			TestForOldSnapshot(snapshot, btree->index, page);
 
 			if (!searchMode && GinPageIsIncompleteSplit(page))
 				ginFinishSplit(btree, stack, false, NULL);
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index a55bb4a..ab14b35 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -1820,7 +1820,7 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 	{
 		/* search for the leaf page where the first item should go to */
 		btree.itemptr = insertdata.items[insertdata.curitem];
-		stack = ginFindLeafPage(&btree, false);
+		stack = ginFindLeafPage(&btree, false, NULL);
 
 		ginInsertValue(&btree, stack, &insertdata, buildStats);
 	}
@@ -1830,7 +1830,8 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
  * Starts a new scan on a posting tree.
  */
 GinBtreeStack *
-ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
+ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno,
+						Snapshot snapshot)
 {
 	GinBtreeStack *stack;
 
@@ -1838,7 +1839,7 @@ ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
 
 	btree->fullScan = TRUE;
 
-	stack = ginFindLeafPage(btree, TRUE);
+	stack = ginFindLeafPage(btree, TRUE, snapshot);
 
 	return stack;
 }
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index a6756d5..7d816d0 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -19,6 +19,7 @@
 #include "miscadmin.h"
 #include "utils/datum.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* GUC parameter */
 int			GinFuzzySearchLimit = 0;
@@ -63,7 +64,7 @@ moveRightIfItNeeded(GinBtreeData *btree, GinBtreeStack *stack)
  */
 static void
 scanPostingTree(Relation index, GinScanEntry scanEntry,
-				BlockNumber rootPostingTree)
+				BlockNumber rootPostingTree, Snapshot snapshot)
 {
 	GinBtreeData btree;
 	GinBtreeStack *stack;
@@ -71,7 +72,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
 	Page		page;
 
 	/* Descend to the leftmost leaf page */
-	stack = ginScanBeginPostingTree(&btree, index, rootPostingTree);
+	stack = ginScanBeginPostingTree(&btree, index, rootPostingTree, snapshot);
 	buffer = stack->buffer;
 	IncrBufferRefCount(buffer); /* prevent unpin in freeGinBtreeStack */
 
@@ -114,7 +115,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
  */
 static bool
 collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
-				   GinScanEntry scanEntry)
+				   GinScanEntry scanEntry, Snapshot snapshot)
 {
 	OffsetNumber attnum;
 	Form_pg_attribute attr;
@@ -145,6 +146,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			return true;
 
 		page = BufferGetPage(stack->buffer);
+		TestForOldSnapshot(snapshot, btree->index, page);
 		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, stack->off));
 
 		/*
@@ -224,7 +226,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			LockBuffer(stack->buffer, GIN_UNLOCK);
 
 			/* Collect all the TIDs in this entry's posting tree */
-			scanPostingTree(btree->index, scanEntry, rootPostingTree);
+			scanPostingTree(btree->index, scanEntry, rootPostingTree, snapshot);
 
 			/*
 			 * We lock again the entry page and while it was unlocked insert
@@ -291,7 +293,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
  * Start* functions setup beginning state of searches: finds correct buffer and pins it.
  */
 static void
-startScanEntry(GinState *ginstate, GinScanEntry entry)
+startScanEntry(GinState *ginstate, GinScanEntry entry, Snapshot snapshot)
 {
 	GinBtreeData btreeEntry;
 	GinBtreeStack *stackEntry;
@@ -316,7 +318,7 @@ restartScanEntry:
 	ginPrepareEntryScan(&btreeEntry, entry->attnum,
 						entry->queryKey, entry->queryCategory,
 						ginstate);
-	stackEntry = ginFindLeafPage(&btreeEntry, true);
+	stackEntry = ginFindLeafPage(&btreeEntry, true, snapshot);
 	page = BufferGetPage(stackEntry->buffer);
 	needUnlock = TRUE;
 
@@ -333,7 +335,7 @@ restartScanEntry:
 		 * for the entry type.
 		 */
 		btreeEntry.findItem(&btreeEntry, stackEntry);
-		if (collectMatchBitmap(&btreeEntry, stackEntry, entry) == false)
+		if (!collectMatchBitmap(&btreeEntry, stackEntry, entry, snapshot))
 		{
 			/*
 			 * GIN tree was seriously restructured, so we will cleanup all
@@ -381,7 +383,7 @@ restartScanEntry:
 			needUnlock = FALSE;
 
 			stack = ginScanBeginPostingTree(&entry->btree, ginstate->index,
-											rootPostingTree);
+											rootPostingTree, snapshot);
 			entry->buffer = stack->buffer;
 
 			/*
@@ -533,7 +535,7 @@ startScan(IndexScanDesc scan)
 	uint32		i;
 
 	for (i = 0; i < so->totalentries; i++)
-		startScanEntry(ginstate, so->entries[i]);
+		startScanEntry(ginstate, so->entries[i], scan->xs_snapshot);
 
 	if (GinFuzzySearchLimit > 0)
 	{
@@ -578,7 +580,8 @@ startScan(IndexScanDesc scan)
  * keep it pinned to prevent interference with vacuum.
  */
 static void
-entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advancePast)
+entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
+				   ItemPointerData advancePast, Snapshot snapshot)
 {
 	Page		page;
 	int			i;
@@ -622,7 +625,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
 			entry->btree.itemptr.ip_posid++;
 		}
 		entry->btree.fullScan = false;
-		stack = ginFindLeafPage(&entry->btree, true);
+		stack = ginFindLeafPage(&entry->btree, true, snapshot);
 
 		/* we don't need the stack, just the buffer. */
 		entry->buffer = stack->buffer;
@@ -732,7 +735,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
  */
 static void
 entryGetItem(GinState *ginstate, GinScanEntry entry,
-			 ItemPointerData advancePast)
+			 ItemPointerData advancePast, Snapshot snapshot)
 {
 	Assert(!entry->isFinished);
 
@@ -855,7 +858,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			/* If we've processed the current batch, load more items */
 			while (entry->offset >= entry->nlist)
 			{
-				entryLoadMoreItems(ginstate, entry, advancePast);
+				entryLoadMoreItems(ginstate, entry, advancePast, snapshot);
 
 				if (entry->isFinished)
 				{
@@ -894,7 +897,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
  */
 static void
 keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
-		   ItemPointerData advancePast)
+		   ItemPointerData advancePast, Snapshot snapshot)
 {
 	ItemPointerData minItem;
 	ItemPointerData curPageLossy;
@@ -941,7 +944,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 		 */
 		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
-			entryGetItem(ginstate, entry, advancePast);
+			entryGetItem(ginstate, entry, advancePast, snapshot);
 			if (entry->isFinished)
 				continue;
 		}
@@ -999,7 +1002,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 
 		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
-			entryGetItem(ginstate, entry, advancePast);
+			entryGetItem(ginstate, entry, advancePast, snapshot);
 			if (entry->isFinished)
 				continue;
 		}
@@ -1208,7 +1211,8 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
 			GinScanKey	key = so->keys + i;
 
 			/* Fetch the next item for this key that is > advancePast. */
-			keyGetItem(&so->ginstate, so->tempCtx, key, advancePast);
+			keyGetItem(&so->ginstate, so->tempCtx, key, advancePast,
+					   scan->xs_snapshot);
 
 			if (key->isFinished)
 				return false;
@@ -1330,6 +1334,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
 	for (;;)
 	{
 		page = BufferGetPage(pos->pendingBuffer);
+		TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
 
 		maxoff = PageGetMaxOffsetNumber(page);
 		if (pos->firstOffset > maxoff)
@@ -1510,6 +1515,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
 			   sizeof(bool) * (pos->lastOffset - pos->firstOffset));
 
 		page = BufferGetPage(pos->pendingBuffer);
+		TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
 
 		for (i = 0; i < so->nkeys; i++)
 		{
@@ -1696,12 +1702,15 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 	int			i;
 	pendingPosition pos;
 	Buffer		metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO);
+	Page		page;
 	BlockNumber blkno;
 
 	*ntids = 0;
 
 	LockBuffer(metabuffer, GIN_SHARE);
-	blkno = GinPageGetMeta(BufferGetPage(metabuffer))->head;
+	page = BufferGetPage(metabuffer);
+	TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
+	blkno = GinPageGetMeta(page)->head;
 
 	/*
 	 * fetch head of list before unlocking metapage. head page must be pinned
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index cd21e0e..7a9c67a 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -192,7 +192,7 @@ ginEntryInsert(GinState *ginstate,
 
 	ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
 
-	stack = ginFindLeafPage(&btree, false);
+	stack = ginFindLeafPage(&btree, false, NULL);
 	page = BufferGetPage(stack->buffer);
 
 	if (btree.findItem(&btree, stack))
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 8138383..affd635 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -337,6 +337,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	LockBuffer(buffer, GIST_SHARE);
 	gistcheckpage(scan->indexRelation, buffer);
 	page = BufferGetPage(buffer);
+	TestForOldSnapshot(scan->xs_snapshot, r, page);
 	opaque = GistPageGetOpaque(page);
 
 	/*
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 3d48c4f..8c89ee7 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -279,6 +279,7 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 		buf = so->hashso_curbuf;
 		Assert(BufferIsValid(buf));
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(scan->xs_snapshot, rel, page);
 		maxoffnum = PageGetMaxOffsetNumber(page);
 		for (offnum = ItemPointerGetOffsetNumber(current);
 			 offnum <= maxoffnum;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 6025a3f..eb8c9cd 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -188,7 +188,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	/* Read the metapage */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	page = BufferGetPage(metabuf);
+	TestForOldSnapshot(scan->xs_snapshot, rel, page);
+	metap = HashPageGetMeta(page);
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -241,6 +243,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	/* Fetch the primary bucket page for the bucket */
 	buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
 	page = BufferGetPage(buf);
+	TestForOldSnapshot(scan->xs_snapshot, rel, page);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
 
@@ -347,6 +350,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 					_hash_readnext(rel, &buf, &page, &opaque);
 					if (BufferIsValid(buf))
 					{
+						TestForOldSnapshot(scan->xs_snapshot, rel, page);
 						maxoff = PageGetMaxOffsetNumber(page);
 						offnum = _hash_binsearch(page, so->hashso_sk_hash);
 					}
@@ -388,6 +392,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 					_hash_readprev(rel, &buf, &page, &opaque);
 					if (BufferIsValid(buf))
 					{
+						TestForOldSnapshot(scan->xs_snapshot, rel, page);
 						maxoff = PageGetMaxOffsetNumber(page);
 						offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
 					}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f443742..13b6549 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -395,6 +395,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 	dp = (Page) BufferGetPage(buffer);
+	TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 	lines = PageGetMaxOffsetNumber(dp);
 	ntup = 0;
 
@@ -538,6 +539,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 		/* page and lineoff now reference the physically next tid */
 
@@ -583,6 +585,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 
 		if (!scan->rs_inited)
@@ -617,6 +620,7 @@ heapgettup(HeapScanDesc scan,
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -746,6 +750,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber((Page) dp);
 		linesleft = lines;
 		if (backward)
@@ -833,6 +838,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 		/* page and lineindex now reference the next visible tid */
 
@@ -876,6 +882,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 
 		if (!scan->rs_inited)
@@ -909,6 +916,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -1028,6 +1036,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		heapgetpage(scan, page);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 		linesleft = lines;
 		if (backward)
@@ -1872,6 +1881,7 @@ heap_fetch(Relation relation,
 	 */
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 	page = BufferGetPage(buffer);
+	TestForOldSnapshot(snapshot, relation, page);
 
 	/*
 	 * We'd better check for out-of-range offnum in case of VACUUM since the
@@ -2201,6 +2211,7 @@ heap_get_latest_tid(Relation relation,
 		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid));
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buffer);
+		TestForOldSnapshot(snapshot, relation, page);
 
 		/*
 		 * Check for bogus item number.  This is not treated as an error
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 59beadd..eb7ae8f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 * need to use the horizon that includes slots, otherwise the data-only
 	 * horizon can be used. Note that the toast relation of user defined
 	 * relations are *not* considered catalog relations.
+	 *
+	 * It is OK to apply the old snapshot limit before acquiring the cleanup
+	 * lock because the worst that can happen is that we are not quite as
+	 * aggressive about the cleanup (by however many transaction IDs are
+	 * consumed between this point and acquiring the lock).  This allows us to
+	 * save significant overhead in the case where the page is found not to be
+	 * prunable.
 	 */
 	if (IsCatalogRelation(relation) ||
 		RelationIsAccessibleInLogicalDecoding(relation))
 		OldestXmin = RecentGlobalXmin;
 	else
-		OldestXmin = RecentGlobalDataXmin;
+		OldestXmin =
+				TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+													relation);
 
 	Assert(TransactionIdIsValid(OldestXmin));
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..66966e0 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -119,7 +119,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 
 top:
 	/* find the first page containing this key */
-	stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
+	stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE, NULL);
 
 	offset = InvalidOffsetNumber;
 
@@ -135,7 +135,7 @@ top:
 	 * precise description.
 	 */
 	buf = _bt_moveright(rel, buf, natts, itup_scankey, false,
-						true, stack, BT_WRITE);
+						true, stack, BT_WRITE, NULL);
 
 	/*
 	 * If we're not allowing duplicates, make sure the key isn't already in
@@ -1671,7 +1671,8 @@ _bt_insert_parent(Relation rel,
 			elog(DEBUG2, "concurrent ROOT page split");
 			lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
 			/* Find the leftmost page at the next level up */
-			pbuf = _bt_get_endpoint(rel, lpageop->btpo.level + 1, false);
+			pbuf = _bt_get_endpoint(rel, lpageop->btpo.level + 1, false,
+									NULL);
 			/* Set up a phony stack entry pointing there */
 			stack = &fakestack;
 			stack->bts_blkno = BufferGetBlockNumber(pbuf);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 67755d7..390bd1a 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1255,7 +1255,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				itup_scankey = _bt_mkscankey(rel, targetkey);
 				/* find the leftmost leaf page containing this key */
 				stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey,
-								   false, &lbuf, BT_READ);
+								   false, &lbuf, BT_READ, NULL);
 				/* don't need a pin on the page */
 				_bt_relbuf(rel, lbuf);
 
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 3db32e8..b316c09 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -30,7 +30,7 @@ static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
 			 OffsetNumber offnum, IndexTuple itup);
 static bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
-static Buffer _bt_walk_left(Relation rel, Buffer buf);
+static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
 
@@ -79,6 +79,10 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
  * address of the leaf-page buffer, which is read-locked and pinned.
  * No locks are held on the parent pages, however!
  *
+ * If the snapshot parameter is not NULL, "old snapshot" checking will take
+ * place during the descent through the tree.  This is not needed when
+ * positioning for an insert or delete, so NULL is used for those cases.
+ *
  * NOTE that the returned buffer is read-locked regardless of the access
  * parameter.  However, access = BT_WRITE will allow an empty root page
  * to be created and returned.  When access = BT_READ, an empty index
@@ -87,7 +91,7 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
  */
 BTStack
 _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
-		   Buffer *bufP, int access)
+		   Buffer *bufP, int access, Snapshot snapshot)
 {
 	BTStack		stack_in = NULL;
 
@@ -96,7 +100,9 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 
 	/* If index is empty and access = BT_READ, no root page is created. */
 	if (!BufferIsValid(*bufP))
+	{
 		return (BTStack) NULL;
+	}
 
 	/* Loop iterates once per level descended in the tree */
 	for (;;)
@@ -124,7 +130,7 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 		 */
 		*bufP = _bt_moveright(rel, *bufP, keysz, scankey, nextkey,
 							  (access == BT_WRITE), stack_in,
-							  BT_READ);
+							  BT_READ, snapshot);
 
 		/* if this is a leaf page, we're done */
 		page = BufferGetPage(*bufP);
@@ -197,6 +203,10 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
  * On entry, we have the buffer pinned and a lock of the type specified by
  * 'access'.  If we move right, we release the buffer and lock and acquire
  * the same on the right sibling.  Return value is the buffer we stop at.
+ *
+ * If the snapshot parameter is not NULL, "old snapshot" checking will take
+ * place during the descent through the tree.  This is not needed when
+ * positioning for an insert or delete, so NULL is used for those cases.
  */
 Buffer
 _bt_moveright(Relation rel,
@@ -206,7 +216,8 @@ _bt_moveright(Relation rel,
 			  bool nextkey,
 			  bool forupdate,
 			  BTStack stack,
-			  int access)
+			  int access,
+			  Snapshot snapshot)
 {
 	Page		page;
 	BTPageOpaque opaque;
@@ -232,6 +243,7 @@ _bt_moveright(Relation rel,
 	for (;;)
 	{
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		if (P_RIGHTMOST(opaque))
@@ -970,7 +982,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	 * Use the manufactured insertion scan key to descend the tree and
 	 * position ourselves on the target leaf page.
 	 */
-	stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ);
+	stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ,
+					   scan->xs_snapshot);
 
 	/* don't need to keep the stack around... */
 	_bt_freestack(stack);
@@ -1363,6 +1376,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
 			/* check for deleted page */
 			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1421,7 +1435,8 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			}
 
 			/* Step to next physical page */
-			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf);
+			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf,
+											scan->xs_snapshot);
 
 			/* if we're physically at end of index, return failure */
 			if (so->currPos.buf == InvalidBuffer)
@@ -1436,6 +1451,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			 * and do it all again.
 			 */
 			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1469,7 +1485,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
  * again if it's important.
  */
 static Buffer
-_bt_walk_left(Relation rel, Buffer buf)
+_bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot)
 {
 	Page		page;
 	BTPageOpaque opaque;
@@ -1499,6 +1515,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 		CHECK_FOR_INTERRUPTS();
 		buf = _bt_getbuf(rel, blkno, BT_READ);
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		/*
@@ -1525,12 +1542,14 @@ _bt_walk_left(Relation rel, Buffer buf)
 			blkno = opaque->btpo_next;
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 			page = BufferGetPage(buf);
+			TestForOldSnapshot(snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
 		/* Return to the original page to see what's up */
 		buf = _bt_relandgetbuf(rel, buf, obknum, BT_READ);
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_ISDELETED(opaque))
 		{
@@ -1548,6 +1567,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 				blkno = opaque->btpo_next;
 				buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 				page = BufferGetPage(buf);
+				TestForOldSnapshot(snapshot, rel, page);
 				opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 				if (!P_ISDELETED(opaque))
 					break;
@@ -1584,7 +1604,8 @@ _bt_walk_left(Relation rel, Buffer buf)
  * The returned buffer is pinned and read-locked.
  */
 Buffer
-_bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
+_bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
+				 Snapshot snapshot)
 {
 	Buffer		buf;
 	Page		page;
@@ -1607,6 +1628,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 		return InvalidBuffer;
 
 	page = BufferGetPage(buf);
+	TestForOldSnapshot(snapshot, rel, page);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	for (;;)
@@ -1626,6 +1648,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 					 RelationGetRelationName(rel));
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 			page = BufferGetPage(buf);
+			TestForOldSnapshot(snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
@@ -1678,7 +1701,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	 * version of _bt_search().  We don't maintain a stack since we know we
 	 * won't need it.
 	 */
-	buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(dir));
+	buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(dir), scan->xs_snapshot);
 
 	if (!BufferIsValid(buf))
 	{
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 620e746..a9f837f 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -295,7 +295,7 @@ spgLeafTest(Relation index, SpGistScanOpaque so,
  */
 static void
 spgWalk(Relation index, SpGistScanOpaque so, bool scanWholeIndex,
-		storeRes_func storeRes)
+		storeRes_func storeRes, Snapshot snapshot)
 {
 	Buffer		buffer = InvalidBuffer;
 	bool		reportedSome = false;
@@ -336,6 +336,7 @@ redirect:
 		/* else new pointer points to the same page, no work needed */
 
 		page = BufferGetPage(buffer);
+		TestForOldSnapshot(snapshot, index, page);
 
 		isnull = SpGistPageStoresNulls(page) ? true : false;
 
@@ -558,7 +559,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	so->tbm = tbm;
 	so->ntids = 0;
 
-	spgWalk(scan->indexRelation, so, true, storeBitmap);
+	spgWalk(scan->indexRelation, so, true, storeBitmap, scan->xs_snapshot);
 
 	return so->ntids;
 }
@@ -617,7 +618,8 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
 		}
 		so->iPtr = so->nPtrs = 0;
 
-		spgWalk(scan->indexRelation, so, false, storeGettuple);
+		spgWalk(scan->indexRelation, so, false, storeGettuple,
+				scan->xs_snapshot);
 
 		if (so->nPtrs == 0)
 			break;				/* must have completed scan */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 4cb4acf..93361a0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -489,7 +489,8 @@ vacuum_set_xid_limits(Relation rel,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin = GetOldestXmin(rel, true);
+	*oldestXmin =
+		TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, true), rel);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 4f6f6e7..3291cd9 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1485,7 +1485,8 @@ should_attempt_truncation(LVRelStats *vacrelstats)
 	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
 	if (possibly_freeable > 0 &&
 		(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
-		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))
+		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION) &&
+		old_snapshot_threshold < 0)
 		return true;
 	else
 		return false;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 36a04fc..c04b17f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -43,6 +43,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/snapmgr.h"
 
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
@@ -136,6 +137,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ReplicationOriginShmemSize());
 		size = add_size(size, WalSndShmemSize());
 		size = add_size(size, WalRcvShmemSize());
+		size = add_size(size, SnapMgrShmemSize());
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
@@ -247,6 +249,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	/*
 	 * Set up other modules that need some shared memory space
 	 */
+	SnapMgrInit();
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 97e8962..19c5fb7 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1759,6 +1759,15 @@ GetSnapshotData(Snapshot snapshot)
 	snapshot->regd_count = 0;
 	snapshot->copied = false;
 
+	/*
+	 * Capture the current time and WAL stream location in case this snapshot
+	 * becomes old enough to need to fall back on the special "old snapshot"
+	 * logic.
+	 */
+	snapshot->lsn = GetXLogInsertRecPtr();
+	snapshot->whenTaken = GetSnapshotCurrentTimestamp();
+	MaintainOldSnapshotTimeMapping(snapshot->whenTaken, xmin);
+
 	return snapshot;
 }
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index c557cb6..f8996cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -46,3 +46,4 @@ CommitTsControlLock					38
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
+OldSnapshotTimeMapLock				42
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
index 04c9c00..48f84d9 100644
--- a/src/backend/utils/errcodes.txt
+++ b/src/backend/utils/errcodes.txt
@@ -413,6 +413,10 @@ Section: Class 58 - System Error (errors external to PostgreSQL itself)
 58P01    E    ERRCODE_UNDEFINED_FILE                                         undefined_file
 58P02    E    ERRCODE_DUPLICATE_FILE                                         duplicate_file
 
+Section: Class 72 - Snapshot Failure
+# (class borrowed from Oracle)
+72000    E    ERRCODE_SNAPSHOT_TOO_OLD                                       snapshot_too_old
+
 Section: Class F0 - Configuration File Error
 
 # (PostgreSQL-specific error class)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..824e060 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2591,6 +2591,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"old_snapshot_threshold", PGC_POSTMASTER, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Time before a snapshot is too old to read pages changed after the snapshot was taken."),
+			gettext_noop("A value of -1 disables this feature."),
+			GUC_UNIT_MIN
+		},
+		&old_snapshot_threshold,
+		-1, -1, MINS_PER_HOUR * HOURS_PER_DAY * 60,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"tcp_keepalives_idle", PGC_USERSET, CLIENT_CONN_OTHER,
 			gettext_noop("Time between issuing TCP keepalives."),
 			gettext_noop("A value of 0 uses the system default."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee3d378..64c0ca7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -165,6 +165,8 @@
 #effective_io_concurrency = 1		# 1-1000; 0 disables prefetching
 #max_worker_processes = 8
 #max_parallel_degree = 0		# max number of worker processes per node
+#old_snapshot_threshold = -1		# 1min-60d; -1 disables; 0 is immediate
+									# (change requires restart)
 
 
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 63e908d..cd1e820a 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -46,14 +46,18 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
 #include "lib/pairingheap.h"
 #include "miscadmin.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinval.h"
+#include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -61,6 +65,64 @@
 
 
 /*
+ * GUC parameters
+ */
+int			old_snapshot_threshold;		/* number of minutes, -1 disables */
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;			/* protect current timestamp */
+	int64		current_timestamp;		/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;		/* protect latest snapshot xmin */
+	TransactionId latest_xmin;			/* latest snapshot xmin */
+	slock_t		mutex_threshold;		/* protect threshold fields */
+	int64		threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;		/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of old_snapshot_threshold means that the buffer is
+	 * full and the head must be advanced to add new entries.  Use timestamps
+	 * aligned to minute boundaries, since that seems less surprising than
+	 * aligning based on the first usage timestamp.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;		/* subscript of oldest tracked time */
+	int64		head_timestamp;		/* time corresponding to head xid */
+	int			count_used;			/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+}	OldSnapshotControlData;
+
+typedef struct OldSnapshotControlData *OldSnapshotControl;
+
+static volatile OldSnapshotControl oldSnapshotControl;
+
+
+/*
  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
  * mode, and to the latest one taken in a read-committed transaction.
  * SecondarySnapshot is a snapshot that's always up-to-date as of the current
@@ -153,6 +215,7 @@ static Snapshot FirstXactSnapshot = NULL;
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
+static int64 AlignTimestampToMinuteBoundary(int64 ts);
 static Snapshot CopySnapshot(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -174,6 +237,49 @@ typedef struct SerializedSnapshotData
 	CommandId	curcid;
 } SerializedSnapshotData;
 
+Size
+SnapMgrShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(OldSnapshotControlData, xid_by_minute);
+	if (old_snapshot_threshold > 0)
+		size = add_size(size, mul_size(sizeof(TransactionId),
+									   old_snapshot_threshold));
+
+	return size;
+}
+
+/*
+ * Initialize for managing old snapshot detection.
+ */
+void
+SnapMgrInit(void)
+{
+	bool		found;
+
+	/*
+	 * Create or attach to the OldSnapshotControl structure.
+	 */
+	oldSnapshotControl = (OldSnapshotControl)
+		ShmemInitStruct("OldSnapshotControlData",
+						SnapMgrShmemSize(), &found);
+
+	if (!found)
+	{
+		SpinLockInit(&oldSnapshotControl->mutex_current);
+		oldSnapshotControl->current_timestamp = 0;
+		SpinLockInit(&oldSnapshotControl->mutex_latest_xmin);
+		oldSnapshotControl->latest_xmin = InvalidTransactionId;
+		SpinLockInit(&oldSnapshotControl->mutex_threshold);
+		oldSnapshotControl->threshold_timestamp = 0;
+		oldSnapshotControl->threshold_xid = InvalidTransactionId;
+		oldSnapshotControl->head_offset = 0;
+		oldSnapshotControl->head_timestamp = 0;
+		oldSnapshotControl->count_used = 0;
+	}
+}
+
 /*
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
@@ -1405,6 +1511,304 @@ ThereAreNoPriorRegisteredSnapshots(void)
 	return false;
 }
 
+
+/*
+ * Return an int64 timestamp which is exactly on a minute boundary.
+ *
+ * If the argument is already aligned, return that value, otherwise move to
+ * the next minute boundary following the given time.
+ */
+static int64
+AlignTimestampToMinuteBoundary(int64 ts)
+{
+	int64		retval = ts + (USECS_PER_MINUTE - 1);
+
+	return retval - (retval % USECS_PER_MINUTE);
+}
+
+/*
+ * Get current timestamp for snapshots as int64 that never moves backward.
+ */
+int64
+GetSnapshotCurrentTimestamp(void)
+{
+	int64		now = GetCurrentIntegerTimestamp();
+
+	/*
+	 * Don't let time move backward; if it hasn't advanced, use the old value.
+	 */
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	if (now <= oldSnapshotControl->current_timestamp)
+		now = oldSnapshotControl->current_timestamp;
+	else
+		oldSnapshotControl->current_timestamp = now;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	return now;
+}
+
+/*
+ * Get timestamp through which vacuum may have processed based on last stored
+ * value for threshold_timestamp.
+ *
+ * XXX: So far, we never trust that a 64-bit value can be read atomically; if
+ * that ever changes, we could get rid of the spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)
+{
+	int64		threshold_timestamp;
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	threshold_timestamp = oldSnapshotControl->threshold_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	return threshold_timestamp;
+}
+
+static void
+SetOldSnapshotThresholdTimestamp(int64 ts, TransactionId xlimit)
+{
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = ts;
+	oldSnapshotControl->threshold_xid = xlimit;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+}
+
+/*
+ * TransactionIdLimitedForOldSnapshots
+ *
+ * Apply old snapshot limit, if any.  This is intended to be called for page
+ * pruning and table vacuuming, to allow old_snapshot_threshold to override
+ * the normal global xmin value.  Actual testing for snapshot too old will be
+ * based on whether a snapshot timestamp is prior to the threshold timestamp
+ * set in this function.
+ */
+TransactionId
+TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+									Relation relation)
+{
+	if (TransactionIdIsNormal(recentXmin)
+		&& old_snapshot_threshold >= 0
+		&& RelationNeedsWAL(relation)
+		&& !IsCatalogRelation(relation)
+		&& !RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		int64		ts = GetSnapshotCurrentTimestamp();
+		TransactionId xlimit = recentXmin;
+		TransactionId latest_xmin = oldSnapshotControl->latest_xmin;
+		bool		same_ts_as_threshold = false;
+
+		/*
+		 * Zero threshold always overrides to latest xmin, if valid.  Without
+		 * some heuristic it will find its own snapshot too old on, for
+		 * example, a simple UPDATE -- which would make it useless for most
+		 * testing, but there is no principled way to ensure that it doesn't
+		 * fail in this way.  Use a five-second delay to try to get useful
+		 * testing behavior, but this may need adjustment.
+		 */
+		if (old_snapshot_threshold == 0)
+		{
+			if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
+				&& TransactionIdFollows(latest_xmin, xlimit))
+				xlimit = latest_xmin;
+
+			ts -= 5 * USECS_PER_SEC;
+			SetOldSnapshotThresholdTimestamp(ts, xlimit);
+
+			return xlimit;
+		}
+
+		ts = AlignTimestampToMinuteBoundary(ts)
+			 - (old_snapshot_threshold * USECS_PER_MINUTE);
+
+		/* Check for fast exit without LW locking. */
+		SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+		if (ts == oldSnapshotControl->threshold_timestamp)
+		{
+			xlimit = oldSnapshotControl->threshold_xid;
+			same_ts_as_threshold = true;
+		}
+		SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+		if (!same_ts_as_threshold)
+		{
+			LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+
+			if (oldSnapshotControl->count_used > 0
+				&& ts >= oldSnapshotControl->head_timestamp)
+			{
+				int		offset;
+
+				offset = ((ts - oldSnapshotControl->head_timestamp)
+						  / USECS_PER_MINUTE);
+				if (offset > oldSnapshotControl->count_used - 1)
+					offset = oldSnapshotControl->count_used - 1;
+				offset = (oldSnapshotControl->head_offset + offset)
+						% old_snapshot_threshold;
+				xlimit = oldSnapshotControl->xid_by_minute[offset];
+
+				if (NormalTransactionIdFollows(xlimit, recentXmin))
+					SetOldSnapshotThresholdTimestamp(ts, xlimit);
+			}
+
+			LWLockRelease(OldSnapshotTimeMapLock);
+		}
+
+		/*
+		 * Failsafe protection against vacuuming work of active transaction.
+		 *
+		 * This is not an assertion because we avoid the spinlock for
+		 * performance, leaving open the possibility that xlimit could advance
+		 * and be more current; but it seems prudent to apply this limit.  It
+		 * might make pruning a tiny bit less agressive than it could be, but
+		 * protects against data loss bugs.
+		 */
+		if (TransactionIdIsNormal(latest_xmin)
+			&& TransactionIdPrecedes(latest_xmin, xlimit))
+			xlimit = latest_xmin;
+
+		if (NormalTransactionIdFollows(xlimit, recentXmin))
+			return xlimit;
+	}
+
+	return recentXmin;
+}
+
+/*
+ * Take care of the circular buffer that maps time to xid.
+ */
+void
+MaintainOldSnapshotTimeMapping(int64 whenTaken, TransactionId xmin)
+{
+	int64		ts;
+
+	/* Fast exit when old_snapshot_threshold is not used. */
+	if (old_snapshot_threshold < 0)
+		return;
+
+	/* Keep track of the latest xmin seen by any process. */
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	if (TransactionIdFollows(xmin, oldSnapshotControl->latest_xmin))
+		oldSnapshotControl->latest_xmin = xmin;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	/* No further tracking needed for 0 (used for testing). */
+	if (old_snapshot_threshold == 0)
+		return;
+
+	/*
+	 * We don't want to do something stupid with unusual values, but we don't
+	 * want to litter the log with warnings or break otherwise normal
+	 * processing for this feature; so if something seems unreasonable, just
+	 * log at DEBUG level and return without doing anything.
+	 */
+	if (whenTaken < 0)
+	{
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with negative whenTaken = %ld",
+			 (long) whenTaken);
+		return;
+	}
+	if (!TransactionIdIsNormal(xmin))
+	{
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with xmin = %lu",
+			 (unsigned long) xmin);
+		return;
+	}
+
+	ts = AlignTimestampToMinuteBoundary(whenTaken);
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+
+	Assert(oldSnapshotControl->head_offset >= 0);
+	Assert(oldSnapshotControl->head_offset < old_snapshot_threshold);
+	Assert((oldSnapshotControl->head_timestamp % USECS_PER_MINUTE) == 0);
+	Assert(oldSnapshotControl->count_used >= 0);
+	Assert(oldSnapshotControl->count_used <= old_snapshot_threshold);
+
+	if (oldSnapshotControl->count_used == 0)
+	{
+		/* set up first entry for empty mapping */
+		oldSnapshotControl->head_offset = 0;
+		oldSnapshotControl->head_timestamp = ts;
+		oldSnapshotControl->count_used = 1;
+		oldSnapshotControl->xid_by_minute[0] = xmin;
+	}
+	else if (ts < oldSnapshotControl->head_timestamp)
+	{
+		/* old ts; log it at DEBUG */
+		LWLockRelease(OldSnapshotTimeMapLock);
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with old whenTaken = %ld",
+			 (long) whenTaken);
+		return;
+	}
+	else if (ts <= (oldSnapshotControl->head_timestamp +
+					((oldSnapshotControl->count_used - 1)
+					 * USECS_PER_MINUTE)))
+	{
+		/* existing mapping; advance xid if possible */
+		int		bucket = (oldSnapshotControl->head_offset
+						  + ((ts - oldSnapshotControl->head_timestamp)
+							 / USECS_PER_MINUTE))
+						 % old_snapshot_threshold;
+
+		if (TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[bucket], xmin))
+			oldSnapshotControl->xid_by_minute[bucket] = xmin;
+	}
+	else
+	{
+		/* We need a new bucket, but it might not be the very next one. */
+		int		advance = ((ts - oldSnapshotControl->head_timestamp)
+						   / USECS_PER_MINUTE);
+
+		oldSnapshotControl->head_timestamp = ts;
+
+		if (advance >= old_snapshot_threshold)
+		{
+			/* Advance is so far that all old data is junk; start over. */
+			oldSnapshotControl->head_offset = 0;
+			oldSnapshotControl->count_used = 1;
+			oldSnapshotControl->xid_by_minute[0] = xmin;
+		}
+		else
+		{
+			/* Store the new value in one or more buckets. */
+			int i;
+
+			for (i = 0; i < advance; i++)
+			{
+				if (oldSnapshotControl->count_used == old_snapshot_threshold)
+				{
+					/* Map full and new value replaces old head. */
+					int		old_head = oldSnapshotControl->head_offset;
+
+					if (old_head == (old_snapshot_threshold - 1))
+						oldSnapshotControl->head_offset = 0;
+					else
+						oldSnapshotControl->head_offset = old_head + 1;
+					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+				}
+				else
+				{
+					/* Extend map to unused entry. */
+					int		new_tail = (oldSnapshotControl->head_offset
+										+ oldSnapshotControl->count_used)
+									   % old_snapshot_threshold;
+
+					oldSnapshotControl->count_used++;
+					oldSnapshotControl->xid_by_minute[new_tail] = xmin;
+				}
+			}
+		}
+	}
+
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/access/brin_revmap.h b/src/include/access/brin_revmap.h
index 19528bf..89054e0 100644
--- a/src/include/access/brin_revmap.h
+++ b/src/include/access/brin_revmap.h
@@ -18,12 +18,13 @@
 #include "storage/itemptr.h"
 #include "storage/off.h"
 #include "utils/relcache.h"
+#include "utils/snapshot.h"
 
 /* struct definition lives in brin_revmap.c */
 typedef struct BrinRevmap BrinRevmap;
 
 extern BrinRevmap *brinRevmapInitialize(Relation idxrel,
-					 BlockNumber *pagesPerRange);
+					 BlockNumber *pagesPerRange, Snapshot snapshot);
 extern void brinRevmapTerminate(BrinRevmap *revmap);
 
 extern void brinRevmapExtend(BrinRevmap *revmap,
@@ -34,6 +35,6 @@ extern void brinSetHeapBlockItemptr(Buffer rmbuf, BlockNumber pagesPerRange,
 						BlockNumber heapBlk, ItemPointerData tid);
 extern BrinTuple *brinGetTupleForHeapBlock(BrinRevmap *revmap,
 						 BlockNumber heapBlk, Buffer *buf, OffsetNumber *off,
-						 Size *size, int mode);
+						 Size *size, int mode, Snapshot snapshot);
 
 #endif   /* BRIN_REVMAP_H */
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index d2ea588..66ce9ac 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -703,7 +703,7 @@ typedef struct
  * PostingItem
  */
 
-extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode);
+extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot);
 extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
@@ -731,7 +731,7 @@ extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
 extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats);
-extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno);
+extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno, Snapshot snapshot);
 extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
 
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 06822fa..660eb20 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -711,17 +711,18 @@ extern int	_bt_pagedel(Relation rel, Buffer buf);
  */
 extern BTStack _bt_search(Relation rel,
 		   int keysz, ScanKey scankey, bool nextkey,
-		   Buffer *bufP, int access);
+		   Buffer *bufP, int access, Snapshot snapshot);
 extern Buffer _bt_moveright(Relation rel, Buffer buf, int keysz,
 			  ScanKey scankey, bool nextkey, bool forupdate, BTStack stack,
-			  int access);
+			  int access, Snapshot snapshot);
 extern OffsetNumber _bt_binsrch(Relation rel, Buffer buf, int keysz,
 			ScanKey scankey, bool nextkey);
 extern int32 _bt_compare(Relation rel, int keysz, ScanKey scankey,
 			Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
-extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost);
+extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
+							   Snapshot snapshot);
 
 /*
  * prototypes for functions in nbtutils.c
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 92c4bc5..7a33a00 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -14,11 +14,14 @@
 #ifndef BUFMGR_H
 #define BUFMGR_H
 
+#include "catalog/catalog.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
+#include "utils/snapmgr.h"
+#include "utils/tqual.h"
 
 typedef void *Block;
 
@@ -148,6 +151,37 @@ extern PGDLLIMPORT int32 *LocalRefCount;
 #define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
 
 /*
+ * Check whether the given snapshot is too old to have safely read the given
+ * page from the given table.  If so, throw a "snapshot too old" error.
+ *
+ * This test generally needs to be performed after every BufferGetPage() call
+ * that is executed as part of a scan.  It is not needed for calls made for
+ * modifying the page (for example, to position to the right place to insert a
+ * new index tuple or for vacuuming).
+ *
+ * Note that a NULL snapshot argument is allowed and causes a fast return
+ * without error; this is to support call sites which can be called from
+ * either scans or index modification areas.
+ *
+ * This is a macro for speed; keep the tests that are fastest and/or most
+ * likely to exclude a page from old snapshot testing near the front.
+ */
+#define TestForOldSnapshot(snapshot, relation, page) \
+	do { \
+		if (old_snapshot_threshold >= 0 \
+		 && (snapshot) != NULL \
+		 && (snapshot)->satisfies == HeapTupleSatisfiesMVCC \
+		 && !XLogRecPtrIsInvalid((snapshot)->lsn) \
+		 && PageGetLSN(page) > (snapshot)->lsn \
+		 && !IsCatalogRelation(relation) \
+		 && !RelationIsAccessibleInLogicalDecoding(relation) \
+		 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) \
+			ereport(ERROR, \
+					(errcode(ERRCODE_SNAPSHOT_TOO_OLD), \
+					 errmsg("snapshot too old"))); \
+	} while (0)
+
+/*
  * prototypes for functions in bufmgr.c
  */
 extern bool ComputeIoConcurrency(int io_concurrency, double *target);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..d417031 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -15,6 +15,7 @@
 #define REL_H
 
 #include "access/tupdesc.h"
+#include "access/xlog.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_index.h"
 #include "fmgr.h"
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index a9e9066..371042a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -14,10 +14,20 @@
 #define SNAPMGR_H
 
 #include "fmgr.h"
+#include "utils/relcache.h"
 #include "utils/resowner.h"
 #include "utils/snapshot.h"
 
 
+/* GUC variables */
+extern int	old_snapshot_threshold;
+
+
+extern Size SnapMgrShmemSize(void);
+extern void SnapMgrInit(void);
+extern int64 GetSnapshotCurrentTimestamp(void);
+extern int64 GetOldSnapshotThresholdTimestamp(void);
+
 extern bool FirstSnapshotSet;
 
 extern TransactionId TransactionXmin;
@@ -54,6 +64,9 @@ extern void ImportSnapshot(const char *idstr);
 extern bool XactHasExportedSnapshots(void);
 extern void DeleteAllExportedSnapshotFiles(void);
 extern bool ThereAreNoPriorRegisteredSnapshots(void);
+extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+														 Relation relation);
+extern void MaintainOldSnapshotTimeMapping(int64 whenTaken, TransactionId xmin);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 2a56363..998e2e5 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -14,6 +14,7 @@
 #define SNAPSHOT_H
 
 #include "access/htup.h"
+#include "access/xlogdefs.h"
 #include "lib/pairingheap.h"
 #include "storage/buf.h"
 
@@ -105,6 +106,9 @@ typedef struct SnapshotData
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
+
+	int64		whenTaken;		/* timestamp when snapshot was taken */
+	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
 } SnapshotData;
 
 /*

#15

Andres Freund

andres@anarazel.de

almost 10 years ago

In reply to: Kevin Grittner (#14)

Re: snapshot too old, configured by time

On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote:

Basically, a connection needs to remain open and interleave
commands with other connections, which the isolation tester does
just fine; but it needs to do that using a custom postgresql.conf
file, which TAP does just fine. I haven't been able to see the
right way to get a TAP test to set up a customized installation to
run isolation tests against. If I can get that working, I have
additional tests I can drop into that.

Check contrib/test_decoding's makefile. It does just that with
isolationtester.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Michael Paquier

michael.paquier@gmail.com

almost 10 years ago

In reply to: Andres Freund (#15)

Re: snapshot too old, configured by time

On Tue, Mar 1, 2016 at 9:35 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote:

Basically, a connection needs to remain open and interleave
commands with other connections, which the isolation tester does
just fine; but it needs to do that using a custom postgresql.conf
file, which TAP does just fine. I haven't been able to see the
right way to get a TAP test to set up a customized installation to
run isolation tests against. If I can get that working, I have
additional tests I can drop into that.

Launching psql from PostgresNode does not hold the connection, we
would need to reinvent/integrate the logic of existing drivers to hold
the connection context properly, but that's utterly complicated with
not that much gain.

Check contrib/test_decoding's makefile. It does just that with
isolationtester.

pg_isolation_regress --temp-config is the key item here, you can
enforce a test to run on a server with a wanted configuration set.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Michael Paquier (#16)

1 attachment(s)

Re: snapshot too old, configured by time

On Tue, Mar 1, 2016 at 12:58 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Mar 1, 2016 at 9:35 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-02-29 18:30:27 -0600, Kevin Grittner wrote:

Basically, a connection needs to remain open and interleave
commands with other connections, which the isolation tester does
just fine; but it needs to do that using a custom postgresql.conf
file, which TAP does just fine. I haven't been able to see the
right way to get a TAP test to set up a customized installation to
run isolation tests against. If I can get that working, I have
additional tests I can drop into that.

Check contrib/test_decoding's makefile. It does just that with
isolationtester.

pg_isolation_regress --temp-config is the key item here, you can
enforce a test to run on a server with a wanted configuration set.

Thanks for the tips. Attached is a minimal set of isolation tests.
I can expand on it if needed, but wanted:

(1) to confirm that this is the right way to do this, and

(2) how long people were willing to tolerate these tests running.

Since we're making this time-based (by popular demand), there must
be delays to see the new behavior. This very minimal pair of tests
runs in just under one minute on my i7. Decent coverage of all the
index AMs would probably require tests which run for at least 10
minutes, and probably double that. I don't recall any satisfactory
resolution to prior discussions about long-running tests.

This is a follow-on patch, just to add isolation testing; the prior
patch must be applied, too.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

snapshot-too-old-v4a.patchtext/x-diff; charset=US-ASCII; name=snapshot-too-old-v4a.patchDownload

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 6167ec1..9b93552 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -8,6 +8,7 @@ SUBDIRS = \
 		  brin \
 		  commit_ts \
 		  dummy_seclabel \
+		  snapshot_too_old \
 		  test_ddl_deparse \
 		  test_extensions \
 		  test_parser \
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
new file mode 100644
index 0000000..7b9feca
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -0,0 +1,47 @@
+# src/test/modules/snapshot_too_old/Makefile
+
+EXTRA_CLEAN = ./isolation_output
+
+ISOLATIONCHECKS=sto_using_cursor sto_using_select
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/snapshot_too_old
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# Disabled because these tests require "old_snapshot_threshold" >= 0, which
+# typical installcheck users do not have (e.g. buildfarm clients).
+installcheck:;
+
+# But it can nonetheless be very helpful to run tests on preexisting
+# installation, allow to do so, but only if requested explicitly.
+installcheck-force: isolationcheck-install-force
+
+check: isolationcheck
+
+submake-isolation:
+	$(MAKE) -C $(top_builddir)/src/test/isolation all
+
+submake-test_decoding:
+	$(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old
+
+isolationcheck: | submake-isolation temp-install
+	$(MKDIR_P) isolation_output
+	$(pg_isolation_regress_check) \
+	    --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf \
+	    --outputdir=./isolation_output \
+	    $(ISOLATIONCHECKS)
+
+isolationcheck-install-force: all | submake-isolation temp-install
+	$(pg_isolation_regress_installcheck) \
+	    $(ISOLATIONCHECKS)
+
+PHONY: check isolationcheck isolationcheck-install-force
+
+temp-install: EXTRA_INSTALL=src/test/modules/snapshot_too_old
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
new file mode 100644
index 0000000..8cc29ec
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -0,0 +1,73 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1decl s1f1 s1sleep s2u s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s1f1 s2u s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s2u s1f1 s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1decl s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
new file mode 100644
index 0000000..eb15bc2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -0,0 +1,55 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1f1 s1sleep s1f2 s2u
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1f1 s1sleep s2u s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s1f1 s2u s1sleep s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/results/sto_using_cursor.out b/src/test/modules/snapshot_too_old/results/sto_using_cursor.out
new file mode 100644
index 0000000..8cc29ec
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/results/sto_using_cursor.out
@@ -0,0 +1,73 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1decl s1f1 s1sleep s2u s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s1f1 s2u s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s2u s1f1 s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1decl s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/results/sto_using_select.out b/src/test/modules/snapshot_too_old/results/sto_using_select.out
new file mode 100644
index 0000000..eb15bc2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/results/sto_using_select.out
@@ -0,0 +1,55 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1f1 s1sleep s1f2 s2u
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1f1 s1sleep s2u s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s1f1 s2u s1sleep s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
new file mode 100644
index 0000000..eac18ca
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -0,0 +1,37 @@
+# This test provokes a "snapshot too old" error using a cursor.
+#
+# The sleep is needed because with a threshold of zero a statement could error
+# on changes it made.  With more normal settings no external delay is needed,
+# but we don't want these tests to run long enough to see that, since
+# granularity is in minutes.
+#
+# Since results depend on the value of old_snapshot_threshold, sneak that into
+# the line generated by the sleep, so that a surprising values isn't so hard
+# to identify.
+
+setup
+{
+    CREATE TABLE sto1 (c int NOT NULL);
+    INSERT INTO sto1 SELECT generate_series(1, 1000);
+    CREATE TABLE sto2 (c int NOT NULL);
+}
+setup
+{
+    VACUUM ANALYZE sto1;
+}
+
+teardown
+{
+    DROP TABLE sto1, sto2;
+}
+
+session "s1"
+setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
+step "s1f1"		{ FETCH FIRST FROM cursor1; }
+step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
+step "s1f2"		{ FETCH FIRST FROM cursor1; }
+teardown		{ COMMIT; }
+
+session "s2"
+step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
new file mode 100644
index 0000000..d7c34f3
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -0,0 +1,36 @@
+# This test provokes a "snapshot too old" error using SELECT statements.
+#
+# The sleep is needed because with a threshold of zero a statement could error
+# on changes it made.  With more normal settings no external delay is needed,
+# but we don't want these tests to run long enough to see that, since
+# granularity is in minutes.
+#
+# Since results depend on the value of old_snapshot_threshold, sneak that into
+# the line generated by the sleep, so that a surprising values isn't so hard
+# to identify.
+
+setup
+{
+    CREATE TABLE sto1 (c int NOT NULL);
+    INSERT INTO sto1 SELECT generate_series(1, 1000);
+    CREATE TABLE sto2 (c int NOT NULL);
+}
+setup
+{
+    VACUUM ANALYZE sto1;
+}
+
+teardown
+{
+    DROP TABLE sto1, sto2;
+}
+
+session "s1"
+setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
+step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+teardown		{ COMMIT; }
+
+session "s2"
+step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
new file mode 100644
index 0000000..ce8048f
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -0,0 +1,3 @@
+autovacuum = off
+old_snapshot_threshold = 0
+

#18

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Kevin Grittner (#17)

Re: snapshot too old, configured by time

On Thu, Mar 3, 2016 at 2:40 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Thanks for the tips. Attached is a minimal set of isolation tests.
I can expand on it if needed, but wanted:

(1) to confirm that this is the right way to do this, and

(2) how long people were willing to tolerate these tests running.

Since we're making this time-based (by popular demand), there must
be delays to see the new behavior. This very minimal pair of tests
runs in just under one minute on my i7. Decent coverage of all the
index AMs would probably require tests which run for at least 10
minutes, and probably double that. I don't recall any satisfactory
resolution to prior discussions about long-running tests.

This is a follow-on patch, just to add isolation testing; the prior
patch must be applied, too.

Michael, any chance that you could take a look at what Kevin did here
and see if it looks good?

I'm sure the base patch could use more review too, if anyone can find the time.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Michael Paquier

michael.paquier@gmail.com

almost 10 years ago

In reply to: Robert Haas (#18)

Re: snapshot too old, configured by time

On Fri, Mar 11, 2016 at 2:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 3, 2016 at 2:40 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Thanks for the tips. Attached is a minimal set of isolation tests.
I can expand on it if needed, but wanted:

(1) to confirm that this is the right way to do this, and

(2) how long people were willing to tolerate these tests running.

Since we're making this time-based (by popular demand), there must
be delays to see the new behavior. This very minimal pair of tests
runs in just under one minute on my i7. Decent coverage of all the
index AMs would probably require tests which run for at least 10
minutes, and probably double that. I don't recall any satisfactory
resolution to prior discussions about long-running tests.

This is a follow-on patch, just to add isolation testing; the prior
patch must be applied, too.

Michael, any chance that you could take a look at what Kevin did here
and see if it looks good?

OK, I am marking this email. Just don't expect any updates from my
side until mid/end of next week.

I'm sure the base patch could use more review too, if anyone can find the time.

I guess I am going to need to look at the patch if if feedback for the
tests is needed.. There is no point in looking at the tests without
poking at the patch.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Michael Paquier (#19)

1 attachment(s)

Re: snapshot too old, configured by time

New patch just to merge in recent commits -- it was starting to
show some bit-rot. Tests folded in with main patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

snapshot-too-old-v5.patchinvalid/octet-stream; name=snapshot-too-old-v5.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7695ec1..49b9892 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2002,6 +2002,42 @@ include_dir 'conf.d'
         </para>
        </listitem>
       </varlistentry>
+
+      <varlistentry id="guc-old-snapshot-threshold" xreflabel="old_snapshot_threshold">
+       <term><varname>old_snapshot_threshold</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>old_snapshot_threshold</> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the minimum time that a snapshot can be used without risk of a
+         <literal>snapshot too old</> error occurring when using the snapshot.
+         This parameter can only be set at server start.
+        </para>
+
+        <para>
+         Beyond the threshold, old data may be vacuumed away.  This can help
+         prevent bloat in the face of snapshots which remain in use for a
+         long time.  To prevent incorrect results due to cleanup of data which
+         would otherwise be visible to the snapshot, an error is generated
+         when the snapshot is older than this threshold and the snapshot is
+         used to read a page which has been modified since the snapshot was
+         built.
+        </para>
+
+        <para>
+         A value of <literal>-1</> disables this feature, and is the default.
+         Useful values for production work probably range from a small number
+         of hours to a few days.  The setting will be coerced to a granularity
+         of minutes, and small numbers (such as <literal>0</> or
+         <literal>1min</>) are only allowed because they may sometimes be
+         useful for testing.  While a setting as high as <literal>60d</> is
+         allowed, please note that in many workloads extreme bloat or
+         transaction ID wraparound may occur in much shorter time frames.
+        </para>
+       </listitem>
+      </varlistentry>
      </variablelist>
     </sect2>
    </sect1>
@@ -2979,6 +3015,10 @@ include_dir 'conf.d'
         You should also consider setting <varname>hot_standby_feedback</>
         on standby server(s) as an alternative to using this parameter.
        </para>
+       <para>
+        This does not prevent cleanup of dead rows which have reached the age
+        specified by <varname>old_snapshot_threshold</>.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -3126,6 +3166,16 @@ include_dir 'conf.d'
         until it eventually reaches the primary.  Standbys make no other use
         of feedback they receive other than to pass upstream.
        </para>
+       <para>
+        This setting does not override the behavior of
+        <varname>old_snapshot_threshold</> on the primary; a snapshot on the
+        standby which exceeds the primary's age threshold can become invalid,
+        resulting in cancellation of transactions on the standby.  This is
+        because <varname>old_snapshot_threshold</> is intended to provide an
+        absolute limit on the time which dead rows can contribute to bloat,
+        which would otherwise be violated because of the configuration of a
+        standby.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c740952..89bad05 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -135,7 +135,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	MemoryContext tupcxt = NULL;
 	MemoryContext oldcxt = NULL;
 
-	revmap = brinRevmapInitialize(idxRel, &pagesPerRange);
+	revmap = brinRevmapInitialize(idxRel, &pagesPerRange, NULL);
 
 	for (;;)
 	{
@@ -152,7 +152,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		/* normalize the block number to be the first block in the range */
 		heapBlk = (heapBlk / pagesPerRange) * pagesPerRange;
 		brtup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-										 BUFFER_LOCK_SHARE);
+										 BUFFER_LOCK_SHARE, NULL);
 
 		/* if range is unsummarized, there's nothing to do */
 		if (!brtup)
@@ -284,7 +284,8 @@ brinbeginscan(Relation r, int nkeys, int norderbys)
 	scan = RelationGetIndexScan(r, nkeys, norderbys);
 
 	opaque = (BrinOpaque *) palloc(sizeof(BrinOpaque));
-	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange);
+	opaque->bo_rmAccess = brinRevmapInitialize(r, &opaque->bo_pagesPerRange,
+											   scan->xs_snapshot);
 	opaque->bo_bdesc = brin_build_desc(r);
 	scan->opaque = opaque;
 
@@ -367,7 +368,8 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 		MemoryContextResetAndDeleteChildren(perRangeCxt);
 
 		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
-									   &off, &size, BUFFER_LOCK_SHARE);
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
 		if (tup)
 		{
 			tup = brin_copy_tuple(tup, size);
@@ -645,7 +647,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/*
 	 * Initialize our state, including the deformed tuple state.
 	 */
-	revmap = brinRevmapInitialize(index, &pagesPerRange);
+	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 	state = initialize_brin_buildstate(index, revmap, pagesPerRange);
 
 	/*
@@ -1040,7 +1042,8 @@ summarize_range(IndexInfo *indexInfo, BrinBuildState *state, Relation heapRel,
 		 * the same.)
 		 */
 		phtup = brinGetTupleForHeapBlock(state->bs_rmAccess, heapBlk, &phbuf,
-										 &offset, &phsz, BUFFER_LOCK_SHARE);
+										 &offset, &phsz, BUFFER_LOCK_SHARE,
+										 NULL);
 		/* the placeholder tuple must exist */
 		if (phtup == NULL)
 			elog(ERROR, "missing placeholder tuple");
@@ -1075,7 +1078,7 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 	BlockNumber pagesPerRange;
 	Buffer		buf;
 
-	revmap = brinRevmapInitialize(index, &pagesPerRange);
+	revmap = brinRevmapInitialize(index, &pagesPerRange, NULL);
 
 	/*
 	 * Scan the revmap to find unsummarized items.
@@ -1090,7 +1093,7 @@ brinsummarize(Relation index, Relation heapRel, double *numSummarized,
 		CHECK_FOR_INTERRUPTS();
 
 		tup = brinGetTupleForHeapBlock(revmap, heapBlk, &buf, &off, NULL,
-									   BUFFER_LOCK_SHARE);
+									   BUFFER_LOCK_SHARE, NULL);
 		if (tup == NULL)
 		{
 			/* no revmap entry for this heap range. Summarize it. */
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index b2c273d..812f76c 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -68,15 +68,19 @@ static void revmap_physical_extend(BrinRevmap *revmap);
  * brinRevmapTerminate when caller is done with it.
  */
 BrinRevmap *
-brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
+brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange,
+					 Snapshot snapshot)
 {
 	BrinRevmap *revmap;
 	Buffer		meta;
 	BrinMetaPageData *metadata;
+	Page		page;
 
 	meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
 	LockBuffer(meta, BUFFER_LOCK_SHARE);
-	metadata = (BrinMetaPageData *) PageGetContents(BufferGetPage(meta));
+	page = BufferGetPage(meta);
+	TestForOldSnapshot(snapshot, idxrel, page);
+	metadata = (BrinMetaPageData *) PageGetContents(page);
 
 	revmap = palloc(sizeof(BrinRevmap));
 	revmap->rm_irel = idxrel;
@@ -185,7 +189,8 @@ brinSetHeapBlockItemptr(Buffer buf, BlockNumber pagesPerRange,
  */
 BrinTuple *
 brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
-						 Buffer *buf, OffsetNumber *off, Size *size, int mode)
+						 Buffer *buf, OffsetNumber *off, Size *size, int mode,
+						 Snapshot snapshot)
 {
 	Relation	idxRel = revmap->rm_irel;
 	BlockNumber mapBlk;
@@ -262,6 +267,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 		}
 		LockBuffer(*buf, mode);
 		page = BufferGetPage(*buf);
+		TestForOldSnapshot(snapshot, idxRel, page);
 
 		/* If we land on a revmap page, start over */
 		if (BRIN_IS_REGULAR_PAGE(page))
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 06ba9cb..dc593c2 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -71,7 +71,7 @@ ginTraverseLock(Buffer buffer, bool searchMode)
  * is share-locked, and stack->parent is NULL.
  */
 GinBtreeStack *
-ginFindLeafPage(GinBtree btree, bool searchMode)
+ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot)
 {
 	GinBtreeStack *stack;
 
@@ -90,6 +90,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 		stack->off = InvalidOffsetNumber;
 
 		page = BufferGetPage(stack->buffer);
+		TestForOldSnapshot(snapshot, btree->index, page);
 
 		access = ginTraverseLock(stack->buffer, searchMode);
 
@@ -116,6 +117,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 			stack->buffer = ginStepRight(stack->buffer, btree->index, access);
 			stack->blkno = rightlink;
 			page = BufferGetPage(stack->buffer);
+			TestForOldSnapshot(snapshot, btree->index, page);
 
 			if (!searchMode && GinPageIsIncompleteSplit(page))
 				ginFinishSplit(btree, stack, false, NULL);
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index a55bb4a..ab14b35 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -1820,7 +1820,7 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 	{
 		/* search for the leaf page where the first item should go to */
 		btree.itemptr = insertdata.items[insertdata.curitem];
-		stack = ginFindLeafPage(&btree, false);
+		stack = ginFindLeafPage(&btree, false, NULL);
 
 		ginInsertValue(&btree, stack, &insertdata, buildStats);
 	}
@@ -1830,7 +1830,8 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
  * Starts a new scan on a posting tree.
  */
 GinBtreeStack *
-ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
+ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno,
+						Snapshot snapshot)
 {
 	GinBtreeStack *stack;
 
@@ -1838,7 +1839,7 @@ ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
 
 	btree->fullScan = TRUE;
 
-	stack = ginFindLeafPage(btree, TRUE);
+	stack = ginFindLeafPage(btree, TRUE, snapshot);
 
 	return stack;
 }
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 53290a4..f07e05a 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -19,6 +19,7 @@
 #include "miscadmin.h"
 #include "utils/datum.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 
 /* GUC parameter */
 int			GinFuzzySearchLimit = 0;
@@ -63,7 +64,7 @@ moveRightIfItNeeded(GinBtreeData *btree, GinBtreeStack *stack)
  */
 static void
 scanPostingTree(Relation index, GinScanEntry scanEntry,
-				BlockNumber rootPostingTree)
+				BlockNumber rootPostingTree, Snapshot snapshot)
 {
 	GinBtreeData btree;
 	GinBtreeStack *stack;
@@ -71,7 +72,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
 	Page		page;
 
 	/* Descend to the leftmost leaf page */
-	stack = ginScanBeginPostingTree(&btree, index, rootPostingTree);
+	stack = ginScanBeginPostingTree(&btree, index, rootPostingTree, snapshot);
 	buffer = stack->buffer;
 	IncrBufferRefCount(buffer); /* prevent unpin in freeGinBtreeStack */
 
@@ -114,7 +115,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
  */
 static bool
 collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
-				   GinScanEntry scanEntry)
+				   GinScanEntry scanEntry, Snapshot snapshot)
 {
 	OffsetNumber attnum;
 	Form_pg_attribute attr;
@@ -145,6 +146,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			return true;
 
 		page = BufferGetPage(stack->buffer);
+		TestForOldSnapshot(snapshot, btree->index, page);
 		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, stack->off));
 
 		/*
@@ -224,7 +226,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			LockBuffer(stack->buffer, GIN_UNLOCK);
 
 			/* Collect all the TIDs in this entry's posting tree */
-			scanPostingTree(btree->index, scanEntry, rootPostingTree);
+			scanPostingTree(btree->index, scanEntry, rootPostingTree, snapshot);
 
 			/*
 			 * We lock again the entry page and while it was unlocked insert
@@ -291,7 +293,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
  * Start* functions setup beginning state of searches: finds correct buffer and pins it.
  */
 static void
-startScanEntry(GinState *ginstate, GinScanEntry entry)
+startScanEntry(GinState *ginstate, GinScanEntry entry, Snapshot snapshot)
 {
 	GinBtreeData btreeEntry;
 	GinBtreeStack *stackEntry;
@@ -318,7 +320,7 @@ restartScanEntry:
 	ginPrepareEntryScan(&btreeEntry, entry->attnum,
 						entry->queryKey, entry->queryCategory,
 						ginstate);
-	stackEntry = ginFindLeafPage(&btreeEntry, true);
+	stackEntry = ginFindLeafPage(&btreeEntry, true, snapshot);
 	page = BufferGetPage(stackEntry->buffer);
 	needUnlock = TRUE;
 
@@ -335,7 +337,7 @@ restartScanEntry:
 		 * for the entry type.
 		 */
 		btreeEntry.findItem(&btreeEntry, stackEntry);
-		if (collectMatchBitmap(&btreeEntry, stackEntry, entry) == false)
+		if (!collectMatchBitmap(&btreeEntry, stackEntry, entry, snapshot))
 		{
 			/*
 			 * GIN tree was seriously restructured, so we will cleanup all
@@ -383,7 +385,7 @@ restartScanEntry:
 			needUnlock = FALSE;
 
 			stack = ginScanBeginPostingTree(&entry->btree, ginstate->index,
-											rootPostingTree);
+											rootPostingTree, snapshot);
 			entry->buffer = stack->buffer;
 
 			/*
@@ -535,7 +537,7 @@ startScan(IndexScanDesc scan)
 	uint32		i;
 
 	for (i = 0; i < so->totalentries; i++)
-		startScanEntry(ginstate, so->entries[i]);
+		startScanEntry(ginstate, so->entries[i], scan->xs_snapshot);
 
 	if (GinFuzzySearchLimit > 0)
 	{
@@ -580,7 +582,8 @@ startScan(IndexScanDesc scan)
  * keep it pinned to prevent interference with vacuum.
  */
 static void
-entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advancePast)
+entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
+				   ItemPointerData advancePast, Snapshot snapshot)
 {
 	Page		page;
 	int			i;
@@ -624,7 +627,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
 			entry->btree.itemptr.ip_posid++;
 		}
 		entry->btree.fullScan = false;
-		stack = ginFindLeafPage(&entry->btree, true);
+		stack = ginFindLeafPage(&entry->btree, true, snapshot);
 
 		/* we don't need the stack, just the buffer. */
 		entry->buffer = stack->buffer;
@@ -734,7 +737,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
  */
 static void
 entryGetItem(GinState *ginstate, GinScanEntry entry,
-			 ItemPointerData advancePast)
+			 ItemPointerData advancePast, Snapshot snapshot)
 {
 	Assert(!entry->isFinished);
 
@@ -857,7 +860,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 			/* If we've processed the current batch, load more items */
 			while (entry->offset >= entry->nlist)
 			{
-				entryLoadMoreItems(ginstate, entry, advancePast);
+				entryLoadMoreItems(ginstate, entry, advancePast, snapshot);
 
 				if (entry->isFinished)
 				{
@@ -896,7 +899,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
  */
 static void
 keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
-		   ItemPointerData advancePast)
+		   ItemPointerData advancePast, Snapshot snapshot)
 {
 	ItemPointerData minItem;
 	ItemPointerData curPageLossy;
@@ -943,7 +946,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 		 */
 		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
-			entryGetItem(ginstate, entry, advancePast);
+			entryGetItem(ginstate, entry, advancePast, snapshot);
 			if (entry->isFinished)
 				continue;
 		}
@@ -1001,7 +1004,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
 
 		if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
 		{
-			entryGetItem(ginstate, entry, advancePast);
+			entryGetItem(ginstate, entry, advancePast, snapshot);
 			if (entry->isFinished)
 				continue;
 		}
@@ -1210,7 +1213,8 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
 			GinScanKey	key = so->keys + i;
 
 			/* Fetch the next item for this key that is > advancePast. */
-			keyGetItem(&so->ginstate, so->tempCtx, key, advancePast);
+			keyGetItem(&so->ginstate, so->tempCtx, key, advancePast,
+					   scan->xs_snapshot);
 
 			if (key->isFinished)
 				return false;
@@ -1332,6 +1336,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
 	for (;;)
 	{
 		page = BufferGetPage(pos->pendingBuffer);
+		TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
 
 		maxoff = PageGetMaxOffsetNumber(page);
 		if (pos->firstOffset > maxoff)
@@ -1512,6 +1517,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
 			   sizeof(bool) * (pos->lastOffset - pos->firstOffset));
 
 		page = BufferGetPage(pos->pendingBuffer);
+		TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
 
 		for (i = 0; i < so->nkeys; i++)
 		{
@@ -1698,12 +1704,15 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 	int			i;
 	pendingPosition pos;
 	Buffer		metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO);
+	Page		page;
 	BlockNumber blkno;
 
 	*ntids = 0;
 
 	LockBuffer(metabuffer, GIN_SHARE);
-	blkno = GinPageGetMeta(BufferGetPage(metabuffer))->head;
+	page = BufferGetPage(metabuffer);
+	TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
+	blkno = GinPageGetMeta(page)->head;
 
 	/*
 	 * fetch head of list before unlocking metapage. head page must be pinned
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index cd21e0e..7a9c67a 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -192,7 +192,7 @@ ginEntryInsert(GinState *ginstate,
 
 	ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
 
-	stack = ginFindLeafPage(&btree, false);
+	stack = ginFindLeafPage(&btree, false, NULL);
 	page = BufferGetPage(stack->buffer);
 
 	if (btree.findItem(&btree, stack))
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 8138383..affd635 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -337,6 +337,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	LockBuffer(buffer, GIST_SHARE);
 	gistcheckpage(scan->indexRelation, buffer);
 	page = BufferGetPage(buffer);
+	TestForOldSnapshot(scan->xs_snapshot, r, page);
 	opaque = GistPageGetOpaque(page);
 
 	/*
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 3d48c4f..8c89ee7 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -279,6 +279,7 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 		buf = so->hashso_curbuf;
 		Assert(BufferIsValid(buf));
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(scan->xs_snapshot, rel, page);
 		maxoffnum = PageGetMaxOffsetNumber(page);
 		for (offnum = ItemPointerGetOffsetNumber(current);
 			 offnum <= maxoffnum;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 6025a3f..eb8c9cd 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -188,7 +188,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	/* Read the metapage */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	page = BufferGetPage(metabuf);
+	TestForOldSnapshot(scan->xs_snapshot, rel, page);
+	metap = HashPageGetMeta(page);
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -241,6 +243,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	/* Fetch the primary bucket page for the bucket */
 	buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
 	page = BufferGetPage(buf);
+	TestForOldSnapshot(scan->xs_snapshot, rel, page);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
 
@@ -347,6 +350,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 					_hash_readnext(rel, &buf, &page, &opaque);
 					if (BufferIsValid(buf))
 					{
+						TestForOldSnapshot(scan->xs_snapshot, rel, page);
 						maxoff = PageGetMaxOffsetNumber(page);
 						offnum = _hash_binsearch(page, so->hashso_sk_hash);
 					}
@@ -388,6 +392,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 					_hash_readprev(rel, &buf, &page, &opaque);
 					if (BufferIsValid(buf))
 					{
+						TestForOldSnapshot(scan->xs_snapshot, rel, page);
 						maxoff = PageGetMaxOffsetNumber(page);
 						offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
 					}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34ba385..7007acf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -395,6 +395,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 	dp = (Page) BufferGetPage(buffer);
+	TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 	lines = PageGetMaxOffsetNumber(dp);
 	ntup = 0;
 
@@ -538,6 +539,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 		/* page and lineoff now reference the physically next tid */
 
@@ -583,6 +585,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber(dp);
 
 		if (!scan->rs_inited)
@@ -617,6 +620,7 @@ heapgettup(HeapScanDesc scan,
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -746,6 +750,7 @@ heapgettup(HeapScanDesc scan,
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(snapshot, scan->rs_rd, dp);
 		lines = PageGetMaxOffsetNumber((Page) dp);
 		linesleft = lines;
 		if (backward)
@@ -833,6 +838,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 		/* page and lineindex now reference the next visible tid */
 
@@ -876,6 +882,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		}
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 
 		if (!scan->rs_inited)
@@ -909,6 +916,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 
 		/* Since the tuple was previously fetched, needn't lock page here */
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -1028,6 +1036,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 		heapgetpage(scan, page);
 
 		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		TestForOldSnapshot(scan->rs_snapshot, scan->rs_rd, dp);
 		lines = scan->rs_ntuples;
 		linesleft = lines;
 		if (backward)
@@ -1872,6 +1881,7 @@ heap_fetch(Relation relation,
 	 */
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 	page = BufferGetPage(buffer);
+	TestForOldSnapshot(snapshot, relation, page);
 
 	/*
 	 * We'd better check for out-of-range offnum in case of VACUUM since the
@@ -2201,6 +2211,7 @@ heap_get_latest_tid(Relation relation,
 		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid));
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buffer);
+		TestForOldSnapshot(snapshot, relation, page);
 
 		/*
 		 * Check for bogus item number.  This is not treated as an error
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 59beadd..eb7ae8f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 * need to use the horizon that includes slots, otherwise the data-only
 	 * horizon can be used. Note that the toast relation of user defined
 	 * relations are *not* considered catalog relations.
+	 *
+	 * It is OK to apply the old snapshot limit before acquiring the cleanup
+	 * lock because the worst that can happen is that we are not quite as
+	 * aggressive about the cleanup (by however many transaction IDs are
+	 * consumed between this point and acquiring the lock).  This allows us to
+	 * save significant overhead in the case where the page is found not to be
+	 * prunable.
 	 */
 	if (IsCatalogRelation(relation) ||
 		RelationIsAccessibleInLogicalDecoding(relation))
 		OldestXmin = RecentGlobalXmin;
 	else
-		OldestXmin = RecentGlobalDataXmin;
+		OldestXmin =
+				TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+													relation);
 
 	Assert(TransactionIdIsValid(OldestXmin));
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..66966e0 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -119,7 +119,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 
 top:
 	/* find the first page containing this key */
-	stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE);
+	stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_WRITE, NULL);
 
 	offset = InvalidOffsetNumber;
 
@@ -135,7 +135,7 @@ top:
 	 * precise description.
 	 */
 	buf = _bt_moveright(rel, buf, natts, itup_scankey, false,
-						true, stack, BT_WRITE);
+						true, stack, BT_WRITE, NULL);
 
 	/*
 	 * If we're not allowing duplicates, make sure the key isn't already in
@@ -1671,7 +1671,8 @@ _bt_insert_parent(Relation rel,
 			elog(DEBUG2, "concurrent ROOT page split");
 			lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
 			/* Find the leftmost page at the next level up */
-			pbuf = _bt_get_endpoint(rel, lpageop->btpo.level + 1, false);
+			pbuf = _bt_get_endpoint(rel, lpageop->btpo.level + 1, false,
+									NULL);
 			/* Set up a phony stack entry pointing there */
 			stack = &fakestack;
 			stack->bts_blkno = BufferGetBlockNumber(pbuf);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 67755d7..390bd1a 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -1255,7 +1255,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 				itup_scankey = _bt_mkscankey(rel, targetkey);
 				/* find the leftmost leaf page containing this key */
 				stack = _bt_search(rel, rel->rd_rel->relnatts, itup_scankey,
-								   false, &lbuf, BT_READ);
+								   false, &lbuf, BT_READ, NULL);
 				/* don't need a pin on the page */
 				_bt_relbuf(rel, lbuf);
 
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 14dffe0..32efb6d 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -30,7 +30,7 @@ static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
 			 OffsetNumber offnum, IndexTuple itup);
 static bool _bt_steppage(IndexScanDesc scan, ScanDirection dir);
-static Buffer _bt_walk_left(Relation rel, Buffer buf);
+static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot);
 static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
 
@@ -79,6 +79,10 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
  * address of the leaf-page buffer, which is read-locked and pinned.
  * No locks are held on the parent pages, however!
  *
+ * If the snapshot parameter is not NULL, "old snapshot" checking will take
+ * place during the descent through the tree.  This is not needed when
+ * positioning for an insert or delete, so NULL is used for those cases.
+ *
  * NOTE that the returned buffer is read-locked regardless of the access
  * parameter.  However, access = BT_WRITE will allow an empty root page
  * to be created and returned.  When access = BT_READ, an empty index
@@ -87,7 +91,7 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
  */
 BTStack
 _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
-		   Buffer *bufP, int access)
+		   Buffer *bufP, int access, Snapshot snapshot)
 {
 	BTStack		stack_in = NULL;
 
@@ -96,7 +100,9 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 
 	/* If index is empty and access = BT_READ, no root page is created. */
 	if (!BufferIsValid(*bufP))
+	{
 		return (BTStack) NULL;
+	}
 
 	/* Loop iterates once per level descended in the tree */
 	for (;;)
@@ -124,7 +130,7 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 		 */
 		*bufP = _bt_moveright(rel, *bufP, keysz, scankey, nextkey,
 							  (access == BT_WRITE), stack_in,
-							  BT_READ);
+							  BT_READ, snapshot);
 
 		/* if this is a leaf page, we're done */
 		page = BufferGetPage(*bufP);
@@ -197,6 +203,10 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
  * On entry, we have the buffer pinned and a lock of the type specified by
  * 'access'.  If we move right, we release the buffer and lock and acquire
  * the same on the right sibling.  Return value is the buffer we stop at.
+ *
+ * If the snapshot parameter is not NULL, "old snapshot" checking will take
+ * place during the descent through the tree.  This is not needed when
+ * positioning for an insert or delete, so NULL is used for those cases.
  */
 Buffer
 _bt_moveright(Relation rel,
@@ -206,7 +216,8 @@ _bt_moveright(Relation rel,
 			  bool nextkey,
 			  bool forupdate,
 			  BTStack stack,
-			  int access)
+			  int access,
+			  Snapshot snapshot)
 {
 	Page		page;
 	BTPageOpaque opaque;
@@ -232,6 +243,7 @@ _bt_moveright(Relation rel,
 	for (;;)
 	{
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		if (P_RIGHTMOST(opaque))
@@ -970,7 +982,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
 	 * Use the manufactured insertion scan key to descend the tree and
 	 * position ourselves on the target leaf page.
 	 */
-	stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ);
+	stack = _bt_search(rel, keysCount, scankeys, nextkey, &buf, BT_READ,
+					   scan->xs_snapshot);
 
 	/* don't need to keep the stack around... */
 	_bt_freestack(stack);
@@ -1336,6 +1349,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
 			/* check for deleted page */
 			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1394,7 +1408,8 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			}
 
 			/* Step to next physical page */
-			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf);
+			so->currPos.buf = _bt_walk_left(rel, so->currPos.buf,
+											scan->xs_snapshot);
 
 			/* if we're physically at end of index, return failure */
 			if (so->currPos.buf == InvalidBuffer)
@@ -1409,6 +1424,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			 * and do it all again.
 			 */
 			page = BufferGetPage(so->currPos.buf);
+			TestForOldSnapshot(scan->xs_snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1442,7 +1458,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
  * again if it's important.
  */
 static Buffer
-_bt_walk_left(Relation rel, Buffer buf)
+_bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot)
 {
 	Page		page;
 	BTPageOpaque opaque;
@@ -1472,6 +1488,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 		CHECK_FOR_INTERRUPTS();
 		buf = _bt_getbuf(rel, blkno, BT_READ);
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		/*
@@ -1498,12 +1515,14 @@ _bt_walk_left(Relation rel, Buffer buf)
 			blkno = opaque->btpo_next;
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 			page = BufferGetPage(buf);
+			TestForOldSnapshot(snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
 		/* Return to the original page to see what's up */
 		buf = _bt_relandgetbuf(rel, buf, obknum, BT_READ);
 		page = BufferGetPage(buf);
+		TestForOldSnapshot(snapshot, rel, page);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_ISDELETED(opaque))
 		{
@@ -1521,6 +1540,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 				blkno = opaque->btpo_next;
 				buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 				page = BufferGetPage(buf);
+				TestForOldSnapshot(snapshot, rel, page);
 				opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 				if (!P_ISDELETED(opaque))
 					break;
@@ -1557,7 +1577,8 @@ _bt_walk_left(Relation rel, Buffer buf)
  * The returned buffer is pinned and read-locked.
  */
 Buffer
-_bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
+_bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
+				 Snapshot snapshot)
 {
 	Buffer		buf;
 	Page		page;
@@ -1580,6 +1601,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 		return InvalidBuffer;
 
 	page = BufferGetPage(buf);
+	TestForOldSnapshot(snapshot, rel, page);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	for (;;)
@@ -1599,6 +1621,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 					 RelationGetRelationName(rel));
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
 			page = BufferGetPage(buf);
+			TestForOldSnapshot(snapshot, rel, page);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
@@ -1651,7 +1674,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	 * version of _bt_search().  We don't maintain a stack since we know we
 	 * won't need it.
 	 */
-	buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(dir));
+	buf = _bt_get_endpoint(rel, 0, ScanDirectionIsBackward(dir), scan->xs_snapshot);
 
 	if (!BufferIsValid(buf))
 	{
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 620e746..a9f837f 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -295,7 +295,7 @@ spgLeafTest(Relation index, SpGistScanOpaque so,
  */
 static void
 spgWalk(Relation index, SpGistScanOpaque so, bool scanWholeIndex,
-		storeRes_func storeRes)
+		storeRes_func storeRes, Snapshot snapshot)
 {
 	Buffer		buffer = InvalidBuffer;
 	bool		reportedSome = false;
@@ -336,6 +336,7 @@ redirect:
 		/* else new pointer points to the same page, no work needed */
 
 		page = BufferGetPage(buffer);
+		TestForOldSnapshot(snapshot, index, page);
 
 		isnull = SpGistPageStoresNulls(page) ? true : false;
 
@@ -558,7 +559,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	so->tbm = tbm;
 	so->ntids = 0;
 
-	spgWalk(scan->indexRelation, so, true, storeBitmap);
+	spgWalk(scan->indexRelation, so, true, storeBitmap, scan->xs_snapshot);
 
 	return so->ntids;
 }
@@ -617,7 +618,8 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
 		}
 		so->iPtr = so->nPtrs = 0;
 
-		spgWalk(scan->indexRelation, so, false, storeGettuple);
+		spgWalk(scan->indexRelation, so, false, storeGettuple,
+				scan->xs_snapshot);
 
 		if (so->nPtrs == 0)
 			break;				/* must have completed scan */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 4cb4acf..93361a0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -489,7 +489,8 @@ vacuum_set_xid_limits(Relation rel,
 	 * working on a particular table at any time, and that each vacuum is
 	 * always an independent transaction.
 	 */
-	*oldestXmin = GetOldestXmin(rel, true);
+	*oldestXmin =
+		TransactionIdLimitedForOldSnapshots(GetOldestXmin(rel, true), rel);
 
 	Assert(TransactionIdIsNormal(*oldestXmin));
 
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 52e19b3..426e756 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1660,7 +1660,8 @@ should_attempt_truncation(LVRelStats *vacrelstats)
 	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
 	if (possibly_freeable > 0 &&
 		(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
-		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))
+		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION) &&
+		old_snapshot_threshold < 0)
 		return true;
 	else
 		return false;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 36a04fc..c04b17f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -43,6 +43,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/snapmgr.h"
 
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
@@ -136,6 +137,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ReplicationOriginShmemSize());
 		size = add_size(size, WalSndShmemSize());
 		size = add_size(size, WalRcvShmemSize());
+		size = add_size(size, SnapMgrShmemSize());
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
@@ -247,6 +249,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	/*
 	 * Set up other modules that need some shared memory space
 	 */
+	SnapMgrInit();
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 740beb6..fd221ff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1759,6 +1759,15 @@ GetSnapshotData(Snapshot snapshot)
 	snapshot->regd_count = 0;
 	snapshot->copied = false;
 
+	/*
+	 * Capture the current time and WAL stream location in case this snapshot
+	 * becomes old enough to need to fall back on the special "old snapshot"
+	 * logic.
+	 */
+	snapshot->lsn = GetXLogInsertRecPtr();
+	snapshot->whenTaken = GetSnapshotCurrentTimestamp();
+	MaintainOldSnapshotTimeMapping(snapshot->whenTaken, xmin);
+
 	return snapshot;
 }
 
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index c557cb6..f8996cd 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -46,3 +46,4 @@ CommitTsControlLock					38
 CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
+OldSnapshotTimeMapLock				42
diff --git a/src/backend/utils/errcodes.txt b/src/backend/utils/errcodes.txt
index 1a920e8..a962308 100644
--- a/src/backend/utils/errcodes.txt
+++ b/src/backend/utils/errcodes.txt
@@ -414,6 +414,10 @@ Section: Class 58 - System Error (errors external to PostgreSQL itself)
 58P01    E    ERRCODE_UNDEFINED_FILE                                         undefined_file
 58P02    E    ERRCODE_DUPLICATE_FILE                                         duplicate_file
 
+Section: Class 72 - Snapshot Failure
+# (class borrowed from Oracle)
+72000    E    ERRCODE_SNAPSHOT_TOO_OLD                                       snapshot_too_old
+
 Section: Class F0 - Configuration File Error
 
 # (PostgreSQL-specific error class)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a325943..bc541b0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2658,6 +2658,17 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"old_snapshot_threshold", PGC_POSTMASTER, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Time before a snapshot is too old to read pages changed after the snapshot was taken."),
+			gettext_noop("A value of -1 disables this feature."),
+			GUC_UNIT_MIN
+		},
+		&old_snapshot_threshold,
+		-1, -1, MINS_PER_HOUR * HOURS_PER_DAY * 60,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"tcp_keepalives_idle", PGC_USERSET, CLIENT_CONN_OTHER,
 			gettext_noop("Time between issuing TCP keepalives."),
 			gettext_noop("A value of 0 uses the system default."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 773b4e8..2c387d4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -165,6 +165,8 @@
 #effective_io_concurrency = 1		# 1-1000; 0 disables prefetching
 #max_worker_processes = 8
 #max_parallel_degree = 0		# max number of worker processes per node
+#old_snapshot_threshold = -1		# 1min-60d; -1 disables; 0 is immediate
+									# (change requires restart)
 
 
 #------------------------------------------------------------------------------
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index b88e012..19504c3 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -46,14 +46,18 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "access/xlog.h"
+#include "catalog/catalog.h"
 #include "lib/pairingheap.h"
 #include "miscadmin.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinval.h"
+#include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -61,6 +65,64 @@
 
 
 /*
+ * GUC parameters
+ */
+int			old_snapshot_threshold;		/* number of minutes, -1 disables */
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;			/* protect current timestamp */
+	int64		current_timestamp;		/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;		/* protect latest snapshot xmin */
+	TransactionId latest_xmin;			/* latest snapshot xmin */
+	slock_t		mutex_threshold;		/* protect threshold fields */
+	int64		threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;		/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of old_snapshot_threshold means that the buffer is
+	 * full and the head must be advanced to add new entries.  Use timestamps
+	 * aligned to minute boundaries, since that seems less surprising than
+	 * aligning based on the first usage timestamp.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;		/* subscript of oldest tracked time */
+	int64		head_timestamp;		/* time corresponding to head xid */
+	int			count_used;			/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+}	OldSnapshotControlData;
+
+typedef struct OldSnapshotControlData *OldSnapshotControl;
+
+static volatile OldSnapshotControl oldSnapshotControl;
+
+
+/*
  * CurrentSnapshot points to the only snapshot taken in transaction-snapshot
  * mode, and to the latest one taken in a read-committed transaction.
  * SecondarySnapshot is a snapshot that's always up-to-date as of the current
@@ -153,6 +215,7 @@ static Snapshot FirstXactSnapshot = NULL;
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
+static int64 AlignTimestampToMinuteBoundary(int64 ts);
 static Snapshot CopySnapshot(Snapshot snapshot);
 static void FreeSnapshot(Snapshot snapshot);
 static void SnapshotResetXmin(void);
@@ -174,6 +237,49 @@ typedef struct SerializedSnapshotData
 	CommandId	curcid;
 } SerializedSnapshotData;
 
+Size
+SnapMgrShmemSize(void)
+{
+	Size		size;
+
+	size = offsetof(OldSnapshotControlData, xid_by_minute);
+	if (old_snapshot_threshold > 0)
+		size = add_size(size, mul_size(sizeof(TransactionId),
+									   old_snapshot_threshold));
+
+	return size;
+}
+
+/*
+ * Initialize for managing old snapshot detection.
+ */
+void
+SnapMgrInit(void)
+{
+	bool		found;
+
+	/*
+	 * Create or attach to the OldSnapshotControl structure.
+	 */
+	oldSnapshotControl = (OldSnapshotControl)
+		ShmemInitStruct("OldSnapshotControlData",
+						SnapMgrShmemSize(), &found);
+
+	if (!found)
+	{
+		SpinLockInit(&oldSnapshotControl->mutex_current);
+		oldSnapshotControl->current_timestamp = 0;
+		SpinLockInit(&oldSnapshotControl->mutex_latest_xmin);
+		oldSnapshotControl->latest_xmin = InvalidTransactionId;
+		SpinLockInit(&oldSnapshotControl->mutex_threshold);
+		oldSnapshotControl->threshold_timestamp = 0;
+		oldSnapshotControl->threshold_xid = InvalidTransactionId;
+		oldSnapshotControl->head_offset = 0;
+		oldSnapshotControl->head_timestamp = 0;
+		oldSnapshotControl->count_used = 0;
+	}
+}
+
 /*
  * GetTransactionSnapshot
  *		Get the appropriate snapshot for a new query in a transaction.
@@ -1405,6 +1511,304 @@ ThereAreNoPriorRegisteredSnapshots(void)
 	return false;
 }
 
+
+/*
+ * Return an int64 timestamp which is exactly on a minute boundary.
+ *
+ * If the argument is already aligned, return that value, otherwise move to
+ * the next minute boundary following the given time.
+ */
+static int64
+AlignTimestampToMinuteBoundary(int64 ts)
+{
+	int64		retval = ts + (USECS_PER_MINUTE - 1);
+
+	return retval - (retval % USECS_PER_MINUTE);
+}
+
+/*
+ * Get current timestamp for snapshots as int64 that never moves backward.
+ */
+int64
+GetSnapshotCurrentTimestamp(void)
+{
+	int64		now = GetCurrentIntegerTimestamp();
+
+	/*
+	 * Don't let time move backward; if it hasn't advanced, use the old value.
+	 */
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	if (now <= oldSnapshotControl->current_timestamp)
+		now = oldSnapshotControl->current_timestamp;
+	else
+		oldSnapshotControl->current_timestamp = now;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	return now;
+}
+
+/*
+ * Get timestamp through which vacuum may have processed based on last stored
+ * value for threshold_timestamp.
+ *
+ * XXX: So far, we never trust that a 64-bit value can be read atomically; if
+ * that ever changes, we could get rid of the spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)
+{
+	int64		threshold_timestamp;
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	threshold_timestamp = oldSnapshotControl->threshold_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	return threshold_timestamp;
+}
+
+static void
+SetOldSnapshotThresholdTimestamp(int64 ts, TransactionId xlimit)
+{
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = ts;
+	oldSnapshotControl->threshold_xid = xlimit;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+}
+
+/*
+ * TransactionIdLimitedForOldSnapshots
+ *
+ * Apply old snapshot limit, if any.  This is intended to be called for page
+ * pruning and table vacuuming, to allow old_snapshot_threshold to override
+ * the normal global xmin value.  Actual testing for snapshot too old will be
+ * based on whether a snapshot timestamp is prior to the threshold timestamp
+ * set in this function.
+ */
+TransactionId
+TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+									Relation relation)
+{
+	if (TransactionIdIsNormal(recentXmin)
+		&& old_snapshot_threshold >= 0
+		&& RelationNeedsWAL(relation)
+		&& !IsCatalogRelation(relation)
+		&& !RelationIsAccessibleInLogicalDecoding(relation))
+	{
+		int64		ts = GetSnapshotCurrentTimestamp();
+		TransactionId xlimit = recentXmin;
+		TransactionId latest_xmin = oldSnapshotControl->latest_xmin;
+		bool		same_ts_as_threshold = false;
+
+		/*
+		 * Zero threshold always overrides to latest xmin, if valid.  Without
+		 * some heuristic it will find its own snapshot too old on, for
+		 * example, a simple UPDATE -- which would make it useless for most
+		 * testing, but there is no principled way to ensure that it doesn't
+		 * fail in this way.  Use a five-second delay to try to get useful
+		 * testing behavior, but this may need adjustment.
+		 */
+		if (old_snapshot_threshold == 0)
+		{
+			if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
+				&& TransactionIdFollows(latest_xmin, xlimit))
+				xlimit = latest_xmin;
+
+			ts -= 5 * USECS_PER_SEC;
+			SetOldSnapshotThresholdTimestamp(ts, xlimit);
+
+			return xlimit;
+		}
+
+		ts = AlignTimestampToMinuteBoundary(ts)
+			 - (old_snapshot_threshold * USECS_PER_MINUTE);
+
+		/* Check for fast exit without LW locking. */
+		SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+		if (ts == oldSnapshotControl->threshold_timestamp)
+		{
+			xlimit = oldSnapshotControl->threshold_xid;
+			same_ts_as_threshold = true;
+		}
+		SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+		if (!same_ts_as_threshold)
+		{
+			LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+
+			if (oldSnapshotControl->count_used > 0
+				&& ts >= oldSnapshotControl->head_timestamp)
+			{
+				int		offset;
+
+				offset = ((ts - oldSnapshotControl->head_timestamp)
+						  / USECS_PER_MINUTE);
+				if (offset > oldSnapshotControl->count_used - 1)
+					offset = oldSnapshotControl->count_used - 1;
+				offset = (oldSnapshotControl->head_offset + offset)
+						% old_snapshot_threshold;
+				xlimit = oldSnapshotControl->xid_by_minute[offset];
+
+				if (NormalTransactionIdFollows(xlimit, recentXmin))
+					SetOldSnapshotThresholdTimestamp(ts, xlimit);
+			}
+
+			LWLockRelease(OldSnapshotTimeMapLock);
+		}
+
+		/*
+		 * Failsafe protection against vacuuming work of active transaction.
+		 *
+		 * This is not an assertion because we avoid the spinlock for
+		 * performance, leaving open the possibility that xlimit could advance
+		 * and be more current; but it seems prudent to apply this limit.  It
+		 * might make pruning a tiny bit less agressive than it could be, but
+		 * protects against data loss bugs.
+		 */
+		if (TransactionIdIsNormal(latest_xmin)
+			&& TransactionIdPrecedes(latest_xmin, xlimit))
+			xlimit = latest_xmin;
+
+		if (NormalTransactionIdFollows(xlimit, recentXmin))
+			return xlimit;
+	}
+
+	return recentXmin;
+}
+
+/*
+ * Take care of the circular buffer that maps time to xid.
+ */
+void
+MaintainOldSnapshotTimeMapping(int64 whenTaken, TransactionId xmin)
+{
+	int64		ts;
+
+	/* Fast exit when old_snapshot_threshold is not used. */
+	if (old_snapshot_threshold < 0)
+		return;
+
+	/* Keep track of the latest xmin seen by any process. */
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	if (TransactionIdFollows(xmin, oldSnapshotControl->latest_xmin))
+		oldSnapshotControl->latest_xmin = xmin;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	/* No further tracking needed for 0 (used for testing). */
+	if (old_snapshot_threshold == 0)
+		return;
+
+	/*
+	 * We don't want to do something stupid with unusual values, but we don't
+	 * want to litter the log with warnings or break otherwise normal
+	 * processing for this feature; so if something seems unreasonable, just
+	 * log at DEBUG level and return without doing anything.
+	 */
+	if (whenTaken < 0)
+	{
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with negative whenTaken = %ld",
+			 (long) whenTaken);
+		return;
+	}
+	if (!TransactionIdIsNormal(xmin))
+	{
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with xmin = %lu",
+			 (unsigned long) xmin);
+		return;
+	}
+
+	ts = AlignTimestampToMinuteBoundary(whenTaken);
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+
+	Assert(oldSnapshotControl->head_offset >= 0);
+	Assert(oldSnapshotControl->head_offset < old_snapshot_threshold);
+	Assert((oldSnapshotControl->head_timestamp % USECS_PER_MINUTE) == 0);
+	Assert(oldSnapshotControl->count_used >= 0);
+	Assert(oldSnapshotControl->count_used <= old_snapshot_threshold);
+
+	if (oldSnapshotControl->count_used == 0)
+	{
+		/* set up first entry for empty mapping */
+		oldSnapshotControl->head_offset = 0;
+		oldSnapshotControl->head_timestamp = ts;
+		oldSnapshotControl->count_used = 1;
+		oldSnapshotControl->xid_by_minute[0] = xmin;
+	}
+	else if (ts < oldSnapshotControl->head_timestamp)
+	{
+		/* old ts; log it at DEBUG */
+		LWLockRelease(OldSnapshotTimeMapLock);
+		elog(DEBUG1,
+			 "MaintainOldSnapshotTimeMapping called with old whenTaken = %ld",
+			 (long) whenTaken);
+		return;
+	}
+	else if (ts <= (oldSnapshotControl->head_timestamp +
+					((oldSnapshotControl->count_used - 1)
+					 * USECS_PER_MINUTE)))
+	{
+		/* existing mapping; advance xid if possible */
+		int		bucket = (oldSnapshotControl->head_offset
+						  + ((ts - oldSnapshotControl->head_timestamp)
+							 / USECS_PER_MINUTE))
+						 % old_snapshot_threshold;
+
+		if (TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[bucket], xmin))
+			oldSnapshotControl->xid_by_minute[bucket] = xmin;
+	}
+	else
+	{
+		/* We need a new bucket, but it might not be the very next one. */
+		int		advance = ((ts - oldSnapshotControl->head_timestamp)
+						   / USECS_PER_MINUTE);
+
+		oldSnapshotControl->head_timestamp = ts;
+
+		if (advance >= old_snapshot_threshold)
+		{
+			/* Advance is so far that all old data is junk; start over. */
+			oldSnapshotControl->head_offset = 0;
+			oldSnapshotControl->count_used = 1;
+			oldSnapshotControl->xid_by_minute[0] = xmin;
+		}
+		else
+		{
+			/* Store the new value in one or more buckets. */
+			int i;
+
+			for (i = 0; i < advance; i++)
+			{
+				if (oldSnapshotControl->count_used == old_snapshot_threshold)
+				{
+					/* Map full and new value replaces old head. */
+					int		old_head = oldSnapshotControl->head_offset;
+
+					if (old_head == (old_snapshot_threshold - 1))
+						oldSnapshotControl->head_offset = 0;
+					else
+						oldSnapshotControl->head_offset = old_head + 1;
+					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+				}
+				else
+				{
+					/* Extend map to unused entry. */
+					int		new_tail = (oldSnapshotControl->head_offset
+										+ oldSnapshotControl->count_used)
+									   % old_snapshot_threshold;
+
+					oldSnapshotControl->count_used++;
+					oldSnapshotControl->xid_by_minute[new_tail] = xmin;
+				}
+			}
+		}
+	}
+
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/access/brin_revmap.h b/src/include/access/brin_revmap.h
index 19528bf..89054e0 100644
--- a/src/include/access/brin_revmap.h
+++ b/src/include/access/brin_revmap.h
@@ -18,12 +18,13 @@
 #include "storage/itemptr.h"
 #include "storage/off.h"
 #include "utils/relcache.h"
+#include "utils/snapshot.h"
 
 /* struct definition lives in brin_revmap.c */
 typedef struct BrinRevmap BrinRevmap;
 
 extern BrinRevmap *brinRevmapInitialize(Relation idxrel,
-					 BlockNumber *pagesPerRange);
+					 BlockNumber *pagesPerRange, Snapshot snapshot);
 extern void brinRevmapTerminate(BrinRevmap *revmap);
 
 extern void brinRevmapExtend(BrinRevmap *revmap,
@@ -34,6 +35,6 @@ extern void brinSetHeapBlockItemptr(Buffer rmbuf, BlockNumber pagesPerRange,
 						BlockNumber heapBlk, ItemPointerData tid);
 extern BrinTuple *brinGetTupleForHeapBlock(BrinRevmap *revmap,
 						 BlockNumber heapBlk, Buffer *buf, OffsetNumber *off,
-						 Size *size, int mode);
+						 Size *size, int mode, Snapshot snapshot);
 
 #endif   /* BRIN_REVMAP_H */
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index d2ea588..66ce9ac 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -703,7 +703,7 @@ typedef struct
  * PostingItem
  */
 
-extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode);
+extern GinBtreeStack *ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot);
 extern Buffer ginStepRight(Buffer buffer, Relation index, int lockmode);
 extern void freeGinBtreeStack(GinBtreeStack *stack);
 extern void ginInsertValue(GinBtree btree, GinBtreeStack *stack,
@@ -731,7 +731,7 @@ extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
 extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
 					  ItemPointerData *items, uint32 nitem,
 					  GinStatsData *buildStats);
-extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno);
+extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno, Snapshot snapshot);
 extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
 extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
 
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 9046b16..ca50349 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -710,17 +710,18 @@ extern int	_bt_pagedel(Relation rel, Buffer buf);
  */
 extern BTStack _bt_search(Relation rel,
 		   int keysz, ScanKey scankey, bool nextkey,
-		   Buffer *bufP, int access);
+		   Buffer *bufP, int access, Snapshot snapshot);
 extern Buffer _bt_moveright(Relation rel, Buffer buf, int keysz,
 			  ScanKey scankey, bool nextkey, bool forupdate, BTStack stack,
-			  int access);
+			  int access, Snapshot snapshot);
 extern OffsetNumber _bt_binsrch(Relation rel, Buffer buf, int keysz,
 			ScanKey scankey, bool nextkey);
 extern int32 _bt_compare(Relation rel, int keysz, ScanKey scankey,
 			Page page, OffsetNumber offnum);
 extern bool _bt_first(IndexScanDesc scan, ScanDirection dir);
 extern bool _bt_next(IndexScanDesc scan, ScanDirection dir);
-extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost);
+extern Buffer _bt_get_endpoint(Relation rel, uint32 level, bool rightmost,
+							   Snapshot snapshot);
 
 /*
  * prototypes for functions in nbtutils.c
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 7d57c04..a9a876a 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -14,11 +14,14 @@
 #ifndef BUFMGR_H
 #define BUFMGR_H
 
+#include "catalog/catalog.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/bufpage.h"
 #include "storage/relfilenode.h"
 #include "utils/relcache.h"
+#include "utils/snapmgr.h"
+#include "utils/tqual.h"
 
 typedef void *Block;
 
@@ -168,6 +171,37 @@ extern PGDLLIMPORT int32 *LocalRefCount;
 #define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
 
 /*
+ * Check whether the given snapshot is too old to have safely read the given
+ * page from the given table.  If so, throw a "snapshot too old" error.
+ *
+ * This test generally needs to be performed after every BufferGetPage() call
+ * that is executed as part of a scan.  It is not needed for calls made for
+ * modifying the page (for example, to position to the right place to insert a
+ * new index tuple or for vacuuming).
+ *
+ * Note that a NULL snapshot argument is allowed and causes a fast return
+ * without error; this is to support call sites which can be called from
+ * either scans or index modification areas.
+ *
+ * This is a macro for speed; keep the tests that are fastest and/or most
+ * likely to exclude a page from old snapshot testing near the front.
+ */
+#define TestForOldSnapshot(snapshot, relation, page) \
+	do { \
+		if (old_snapshot_threshold >= 0 \
+		 && (snapshot) != NULL \
+		 && (snapshot)->satisfies == HeapTupleSatisfiesMVCC \
+		 && !XLogRecPtrIsInvalid((snapshot)->lsn) \
+		 && PageGetLSN(page) > (snapshot)->lsn \
+		 && !IsCatalogRelation(relation) \
+		 && !RelationIsAccessibleInLogicalDecoding(relation) \
+		 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) \
+			ereport(ERROR, \
+					(errcode(ERRCODE_SNAPSHOT_TOO_OLD), \
+					 errmsg("snapshot too old"))); \
+	} while (0)
+
+/*
  * prototypes for functions in bufmgr.c
  */
 extern bool ComputeIoConcurrency(int io_concurrency, double *target);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index f2bebf2..d417031 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -15,6 +15,7 @@
 #define REL_H
 
 #include "access/tupdesc.h"
+#include "access/xlog.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_index.h"
 #include "fmgr.h"
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index a9e9066..371042a 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -14,10 +14,20 @@
 #define SNAPMGR_H
 
 #include "fmgr.h"
+#include "utils/relcache.h"
 #include "utils/resowner.h"
 #include "utils/snapshot.h"
 
 
+/* GUC variables */
+extern int	old_snapshot_threshold;
+
+
+extern Size SnapMgrShmemSize(void);
+extern void SnapMgrInit(void);
+extern int64 GetSnapshotCurrentTimestamp(void);
+extern int64 GetOldSnapshotThresholdTimestamp(void);
+
 extern bool FirstSnapshotSet;
 
 extern TransactionId TransactionXmin;
@@ -54,6 +64,9 @@ extern void ImportSnapshot(const char *idstr);
 extern bool XactHasExportedSnapshots(void);
 extern void DeleteAllExportedSnapshotFiles(void);
 extern bool ThereAreNoPriorRegisteredSnapshots(void);
+extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+														 Relation relation);
+extern void MaintainOldSnapshotTimeMapping(int64 whenTaken, TransactionId xmin);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 2a56363..998e2e5 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -14,6 +14,7 @@
 #define SNAPSHOT_H
 
 #include "access/htup.h"
+#include "access/xlogdefs.h"
 #include "lib/pairingheap.h"
 #include "storage/buf.h"
 
@@ -105,6 +106,9 @@ typedef struct SnapshotData
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
 	uint32		regd_count;		/* refcount on RegisteredSnapshots */
 	pairingheap_node ph_node;	/* link in the RegisteredSnapshots heap */
+
+	int64		whenTaken;		/* timestamp when snapshot was taken */
+	XLogRecPtr	lsn;			/* position in the WAL stream when taken */
 } SnapshotData;
 
 /*
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 6167ec1..9b93552 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -8,6 +8,7 @@ SUBDIRS = \
 		  brin \
 		  commit_ts \
 		  dummy_seclabel \
+		  snapshot_too_old \
 		  test_ddl_deparse \
 		  test_extensions \
 		  test_parser \
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
new file mode 100644
index 0000000..7b9feca
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -0,0 +1,47 @@
+# src/test/modules/snapshot_too_old/Makefile
+
+EXTRA_CLEAN = ./isolation_output
+
+ISOLATIONCHECKS=sto_using_cursor sto_using_select
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/snapshot_too_old
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
+
+# Disabled because these tests require "old_snapshot_threshold" >= 0, which
+# typical installcheck users do not have (e.g. buildfarm clients).
+installcheck:;
+
+# But it can nonetheless be very helpful to run tests on preexisting
+# installation, allow to do so, but only if requested explicitly.
+installcheck-force: isolationcheck-install-force
+
+check: isolationcheck
+
+submake-isolation:
+	$(MAKE) -C $(top_builddir)/src/test/isolation all
+
+submake-test_decoding:
+	$(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old
+
+isolationcheck: | submake-isolation temp-install
+	$(MKDIR_P) isolation_output
+	$(pg_isolation_regress_check) \
+	    --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf \
+	    --outputdir=./isolation_output \
+	    $(ISOLATIONCHECKS)
+
+isolationcheck-install-force: all | submake-isolation temp-install
+	$(pg_isolation_regress_installcheck) \
+	    $(ISOLATIONCHECKS)
+
+PHONY: check isolationcheck isolationcheck-install-force
+
+temp-install: EXTRA_INSTALL=src/test/modules/snapshot_too_old
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
new file mode 100644
index 0000000..8cc29ec
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -0,0 +1,73 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1decl s1f1 s1sleep s2u s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s1f1 s2u s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s2u s1f1 s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1decl s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
new file mode 100644
index 0000000..eb15bc2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -0,0 +1,55 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1f1 s1sleep s1f2 s2u
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1f1 s1sleep s2u s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s1f1 s2u s1sleep s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/results/sto_using_cursor.out b/src/test/modules/snapshot_too_old/results/sto_using_cursor.out
new file mode 100644
index 0000000..8cc29ec
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/results/sto_using_cursor.out
@@ -0,0 +1,73 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1decl s1f1 s1sleep s2u s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s1f1 s2u s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s1decl s2u s1f1 s1sleep s1f2
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1decl s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
+step s1f1: FETCH FIRST FROM cursor1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/results/sto_using_select.out b/src/test/modules/snapshot_too_old/results/sto_using_select.out
new file mode 100644
index 0000000..eb15bc2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/results/sto_using_select.out
@@ -0,0 +1,55 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1f1 s1sleep s1f2 s2u
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+
+starting permutation: s1f1 s1sleep s2u s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s1f1 s2u s1sleep s1f2
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
+
+starting permutation: s2u s1f1 s1sleep s1f2
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+2              
+step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+setting        pg_sleep       
+
+0                             
+step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
new file mode 100644
index 0000000..eac18ca
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -0,0 +1,37 @@
+# This test provokes a "snapshot too old" error using a cursor.
+#
+# The sleep is needed because with a threshold of zero a statement could error
+# on changes it made.  With more normal settings no external delay is needed,
+# but we don't want these tests to run long enough to see that, since
+# granularity is in minutes.
+#
+# Since results depend on the value of old_snapshot_threshold, sneak that into
+# the line generated by the sleep, so that a surprising values isn't so hard
+# to identify.
+
+setup
+{
+    CREATE TABLE sto1 (c int NOT NULL);
+    INSERT INTO sto1 SELECT generate_series(1, 1000);
+    CREATE TABLE sto2 (c int NOT NULL);
+}
+setup
+{
+    VACUUM ANALYZE sto1;
+}
+
+teardown
+{
+    DROP TABLE sto1, sto2;
+}
+
+session "s1"
+setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
+step "s1f1"		{ FETCH FIRST FROM cursor1; }
+step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
+step "s1f2"		{ FETCH FIRST FROM cursor1; }
+teardown		{ COMMIT; }
+
+session "s2"
+step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
new file mode 100644
index 0000000..d7c34f3
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -0,0 +1,36 @@
+# This test provokes a "snapshot too old" error using SELECT statements.
+#
+# The sleep is needed because with a threshold of zero a statement could error
+# on changes it made.  With more normal settings no external delay is needed,
+# but we don't want these tests to run long enough to see that, since
+# granularity is in minutes.
+#
+# Since results depend on the value of old_snapshot_threshold, sneak that into
+# the line generated by the sleep, so that a surprising values isn't so hard
+# to identify.
+
+setup
+{
+    CREATE TABLE sto1 (c int NOT NULL);
+    INSERT INTO sto1 SELECT generate_series(1, 1000);
+    CREATE TABLE sto2 (c int NOT NULL);
+}
+setup
+{
+    VACUUM ANALYZE sto1;
+}
+
+teardown
+{
+    DROP TABLE sto1, sto2;
+}
+
+session "s1"
+setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
+step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+teardown		{ COMMIT; }
+
+session "s2"
+step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
new file mode 100644
index 0000000..ce8048f
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -0,0 +1,3 @@
+autovacuum = off
+old_snapshot_threshold = 0
+

#21

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Kevin Grittner (#20)

Re: snapshot too old, configured by time

On Thu, Mar 17, 2016 at 2:15 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

New patch just to merge in recent commits -- it was starting to
show some bit-rot. Tests folded in with main patch.

I'm not sure if this is operating as expected.

I set the value to 1min.

I set up a test like this:

pgbench -i

pgbench -c4 -j4 -T 3600 &

### watch the size of branches table
while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &

### set up a long lived snapshot.
psql -c 'begin; set transaction isolation level repeatable read;
select sum(bbalance) from pgbench_branches; select pg_sleep(300);
select sum(bbalance) from pgbench_branches;'

As this runs, I can see the size of the pgbench_branches bloating once
the snapshot is taken, and continues bloating at a linear rate for the
full 300 seconds.

Once the 300 second pg_sleep is up, the long-lived snapshot holder
receives an error once it tries to access the table again, and then
the bloat stops increasing. But shouldn't the bloat have stopped
increasing as soon as the snapshot became doomed, which would be after
a minute or so?

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Kevin Grittner (#20)

Re: snapshot too old, configured by time

On Thu, Mar 17, 2016 at 2:15 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

New patch just to merge in recent commits -- it was starting to
show some bit-rot. Tests folded in with main patch.

I haven't read the patch, but I wonder: What are the implications here
for B-Tree page recycling by VACUUM? I know that you understand this
topic well, so I don't assume that you didn't address it.

Offhand, I imagine that there'd be some special considerations. Why is
it okay that an index scan could land on a deleted page with no
interlock against VACUUM's page recycling? Or, what prevents that from
happening in the first place?

I worry that something weird could happen there. For example, perhaps
the page LSN on what is actually a newly recycled page could be set
such that the backend following a stale right spuriously raises a
"snapshot too old" error.

I suggest you consider making amcheck [1]https://commitfest.postgresql.org/9/561/ -- Peter Geoghegan a part of your testing
strategy. I think that this patch is a good idea, and I'd be happy to
take feedback from you on how to make amcheck more effective for
testing this patch in particular.

[1]: https://commitfest.postgresql.org/9/561/ -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Peter Geoghegan (#22)

Re: snapshot too old, configured by time

On Sun, Mar 20, 2016 at 4:25 PM, Peter Geoghegan <pg@heroku.com> wrote:

I worry that something weird could happen there. For example, perhaps
the page LSN on what is actually a newly recycled page could be set
such that the backend following a stale right spuriously raises a
"snapshot too old" error.

I mean a stale right-link, of course.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Thom Brown

thom@linux.com

almost 10 years ago

In reply to: Kevin Grittner (#20)

Re: snapshot too old, configured by time

On 17 March 2016 at 21:15, Kevin Grittner <kgrittn@gmail.com> wrote:

New patch just to merge in recent commits -- it was starting to
show some bit-rot. Tests folded in with main patch.

In session 1, I've run:

# begin transaction isolation level repeatable read ;
BEGIN

*# declare stuff scroll cursor for select * from test where num between 5 and 9;
DECLARE CURSOR

*# fetch forward 5 from stuff;
id | num | thing
-----+-----+------------------------------------
2 | 8 | hellofji djf odsjfiojdsif ojdsiof
3 | 7 | hellofji djf odsjfiojdsif ojdsiof
112 | 9 | hellofji djf odsjfiojdsif ojdsiof
115 | 6 | hellofji djf odsjfiojdsif ojdsiof
119 | 8 | hellofji djf odsjfiojdsif ojdsiof
(5 rows)

In session 2, over a min later:

# update test set num = 12 where num between 5 and 9 and id between 120 and 250;
ERROR: snapshot too old

Then back to session 1:

*# fetch forward 5 from stuff;
ERROR: snapshot too old

Should session 2 be getting that error?

Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Thom Brown (#24)

Re: snapshot too old, configured by time

Thanks to all for the feedback; I will try to respond later this
week. First I'm trying to get my reviews for other patches posted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Michael Paquier

michael.paquier@gmail.com

almost 10 years ago

In reply to: Kevin Grittner (#25)

Re: snapshot too old, configured by time

On Tue, Mar 22, 2016 at 5:05 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

Thanks to all for the feedback; I will try to respond later this
week. First I'm trying to get my reviews for other patches posted.

I have been looking at 4a, the test module, and things are looking
good IMO. Something that I think would be adapted would be to define
the options for isolation tests in a variable, like ISOLATION_OPTS to
allow MSVC scripts to fetch those option values more easily.

+submake-test_decoding:
+   $(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old
The target name here is incorrect. This should refer to snapshot_too_old.

Regarding the main patch:
+ <primary><varname>old_snapshot_threshold</> configuration
parameter</primary>
snapshot_valid_limit?

page = BufferGetPage(buf);
+ TestForOldSnapshot(scan->xs_snapshot, rel, page);
This is a sequence repeated many times in this patch, a new routine,
say BufferGetPageExtended with a uint8 flag, one flag being used to
test old snapshots would be more adapted. But this would require
creating a header dependency between the buffer manager and
SnapshotData.. Or more simply we may want a common code path when
fetching a page that a backend is going to use to fetch tuples. I am
afraid of TestForOldSnapshot() being something that could be easily
forgotten in code paths introduced in future patches...

+   if (whenTaken < 0)
+   {
+       elog(DEBUG1,
+            "MaintainOldSnapshotTimeMapping called with negative
whenTaken = %ld",
+            (long) whenTaken);
+       return;
+   }
+   if (!TransactionIdIsNormal(xmin))
+   {
+       elog(DEBUG1,
+            "MaintainOldSnapshotTimeMapping called with xmin = %lu",
+            (unsigned long) xmin);
+       return;
+   }
Material for two assertions?
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

David Steele

david@pgmasters.net

almost 10 years ago

In reply to: Kevin Grittner (#25)

Re: snapshot too old, configured by time

Hi Kevin,

On 3/21/16 4:05 PM, Kevin Grittner wrote:

Thanks to all for the feedback; I will try to respond later this
week. First I'm trying to get my reviews for other patches posted.

We're getting to the end of the CF now. Do you know when you'll have an
updated patch ready?

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: David Steele (#27)

Re: snapshot too old, configured by time

On Tue, Mar 29, 2016 at 8:58 AM, David Steele <david@pgmasters.net> wrote:

We're getting to the end of the CF now. Do you know when you'll have an
updated patch ready?

I am working on it right now. Hopefully I can get it all sorted today.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Michael Paquier (#26)

Re: snapshot too old, configured by time

Michael Paquier wrote:

page = BufferGetPage(buf);
+ TestForOldSnapshot(scan->xs_snapshot, rel, page);
This is a sequence repeated many times in this patch, a new routine,
say BufferGetPageExtended with a uint8 flag, one flag being used to
test old snapshots would be more adapted. But this would require
creating a header dependency between the buffer manager and
SnapshotData.. Or more simply we may want a common code path when
fetching a page that a backend is going to use to fetch tuples. I am
afraid of TestForOldSnapshot() being something that could be easily
forgotten in code paths introduced in future patches...

I said exactly the same thing, and Kevin dismissed it.

I would be worried about your specific proposal though, because it's
easy to just call BufferGetPage() (i.e. the not-extended version) and
forget the old-snapshot protection completely.

I think a safer proposition would be to replace all current
BufferGetPage() calls (there are about 500) by adding the necessary
arguments: buffer, snapshot, rel, and an integer "flags". All this
without adding the feature. Then a subsequent commit would add the
TestForOldSnapshot inside BufferGetPage, *except* when a
BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get
the snapshot test by default.

I don't like the new header dependency either, though.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Alvaro Herrera (#29)

Re: snapshot too old, configured by time

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I think a safer proposition would be to replace all current
BufferGetPage() calls (there are about 500) by adding the necessary
arguments: buffer, snapshot, rel, and an integer "flags". All this
without adding the feature. Then a subsequent commit would add the
TestForOldSnapshot inside BufferGetPage, *except* when a
BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get
the snapshot test by default.

That seems awfully invasive, not to mention performance-killing if
the expectation is that most such calls are going to need a snapshot
check. (Quite aside from the calls themselves, are they all in
routines that are being passed the right snapshot today?)

TBH, I think that shoving in something like this at the end of the last
commitfest would be a bad idea even if there were widespread consensus
that we wanted the feature ... which I am not sure there is.

I think it might be time to bounce this one to 9.7.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Tom Lane (#30)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I think a safer proposition would be to replace all current
BufferGetPage() calls (there are about 500) by adding the necessary
arguments: buffer, snapshot, rel, and an integer "flags". All this
without adding the feature. Then a subsequent commit would add the
TestForOldSnapshot inside BufferGetPage, *except* when a
BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get
the snapshot test by default.

That seems awfully invasive,

That's the argument I made, which Álvaro described as "dismissing"
his suggestion. In this post from October of 2015, I pointed out
that there are 36 calls where we need a snapshot and 450 where we
don't.

/messages/by-id/56479263.1140984.1444945639606.JavaMail.yahoo@mail.yahoo.com

not to mention performance-killing if the expectation is that
most such calls are going to need a snapshot check.

This patch is one which has allowed a customer where we could not
meet their performance requirements to pass them. It is the
opposite of performance-killing.

(Quite aside from the calls themselves, are they all in
routines that are being passed the right snapshot today?)

I went over that very carefully when the patch was first proposed
in January of 2015, and have kept an eye on things to try to avoid
bit-rot which might introduce new calls which need to be touched.
The customer for which this was initially developed uses a 30 day
test run with very complex production releases driven by a
simulated user load with large numbers of users. EDB has
back-patched it to 9.4 where an earlier version of it is being used
in production by this (large) customer.

TBH, I think that shoving in something like this at the end of the last
commitfest would be a bad idea even if there were widespread consensus
that we wanted the feature ... which I am not sure there is.

I don't recall anyone opposing the feature itself it except you,
and it has had enthusiastic support from many. Before I was made
aware of a relevant isolation tester feature, there were many
objections to my efforts at regression testing, and Álvaro has
argued for touching 450 locations in the code that otherwise don't
need it, just as "reminders" to people to consider whether newly
added calls might need a snapshot, and he doesn't like the new
header dependencies. Simon seemed to want it in 9.5, but that was
clearly premature, IMO.

Characterizing this as being shoved in at the last moment seems
odd, since the big hang-up from the November CF was the testing
methodology in the patch. It has been feature-complete since the
September CF, and modified based on feedback. Granted, some
additional testing in this CF brought up a couple things that merit
a look, but this patch is hardly unique in that regard.

I think it might be time to bounce this one to 9.7.

If there is a consensus for that, sure, or if I can't sort out the
latest issues by feature freeze (which is admittedly looming).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Peter Geoghegan (#22)

Re: snapshot too old, configured by time

On Sun, Mar 20, 2016 at 6:25 PM, Peter Geoghegan <pg@heroku.com> wrote:

I haven't read the patch, but I wonder: What are the implications here
for B-Tree page recycling by VACUUM?

Offhand, I imagine that there'd be some special considerations. Why is
it okay that an index scan could land on a deleted page with no
interlock against VACUUM's page recycling? Or, what prevents that from
happening in the first place?

When the initial "proof of concept" patch was tested by the
customer, it was not effective due to issues related to what you
raise. Autovacuum workers were blocking due to the page pins for
scans using these old snapshots, causing the bloat to accumulate in
spite of this particular patch. This was addressed, at least to a
degree sufficient for this customer, with this patch:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069

Basically, for most common cases the "super-exclusive" locking has
been eliminated from btree logic; and I have been happy to see that
Simon has been working on dealing with the corner cases where I
hadn't rooted it out.

I worry that something weird could happen there. For example, perhaps
the page LSN on what is actually a newly recycled page could be set
such that the backend following a stale right spuriously raises a
"snapshot too old" error.

That particular detail doesn't seem to be a realistic concern,
though -- if a page has been deleted, assigned to a new place in
the index with an LSN corresponding to that action, it would be a
pretty big bug if a right-pointer still referenced it.

I suggest you consider making amcheck [1] a part of your testing
strategy. I think that this patch is a good idea, and I'd be happy to
take feedback from you on how to make amcheck more effective for
testing this patch in particular.

I'm not sure how that would fit in; could you elaborate?

The biggest contradiction making testing hard is that everyone (and
I mean everyone!) preferred to see this configured by time rather
than number of transactions, so there is no change in behavior
without some sort of wait for elapsed time. But nobody wants to
drive time needed for running regression tests too high. Testing
was far easier when a transaction count was used for configuration.
old_snapshot_threshold = -1 (the default) completely disables the
new behavior, and I basically abused a configuration setting of 0
to mean a few seconds so I could get some basic testing added to
make check-world while keeping the additional time for the tests
(barely) below one minute.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Kevin Grittner (#32)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 11:53 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

When the initial "proof of concept" patch was tested by the
customer, it was not effective due to issues related to what you
raise. Autovacuum workers were blocking due to the page pins for
scans using these old snapshots, causing the bloat to accumulate in
spite of this particular patch. This was addressed, at least to a
degree sufficient for this customer, with this patch:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069

Basically, for most common cases the "super-exclusive" locking has
been eliminated from btree logic; and I have been happy to see that
Simon has been working on dealing with the corner cases where I
hadn't rooted it out.

I worry that something weird could happen there. For example, perhaps
the page LSN on what is actually a newly recycled page could be set
such that the backend following a stale right spuriously raises a
"snapshot too old" error.

That particular detail doesn't seem to be a realistic concern,
though -- if a page has been deleted, assigned to a new place in
the index with an LSN corresponding to that action, it would be a
pretty big bug if a right-pointer still referenced it.

Yes, that would be a big bug. But I wasn't talking about
"super-exclusive" locking. Rather, I was talking about the general way
in which index scans are guaranteed to not land on an already-recycled
page (not a half-dead page, and not a fully deleted page -- a fully
reclaimed/recycled page). This works without buffer pins or buffer
locks needing to be held at all -- there is a global interlock against
page *recycling* based on RecentGlobalXmin, per the nbtree README. So,
this vague concern of mine is about VACUUM's B-Tree page recycling.

During an index scan, we expect to be able to land on the next page,
and to be at least able to reason about it being deleted, even though
we don't hold a pin on anything for a period. We certainly are shy
about explaining all this, but if you look at a routine like
_bt_search() carefully (the routine that is used to descend a B-Tree),
it doesn't actually hold a pin concurrently, as we drop a level (and
certainly not a buffer lock, either). The next page/block should be
substantively the same page as expected from the downlink (or right
link) we followed, entirely because of the RecentGlobalXmin interlock.
Backwards scans also rely on this.

This is just a vague concern, and possibly this is completely
irrelevant. I haven't read the patch.

I suggest you consider making amcheck [1] a part of your testing
strategy. I think that this patch is a good idea, and I'd be happy to
take feedback from you on how to make amcheck more effective for
testing this patch in particular.

I'm not sure how that would fit in; could you elaborate?

Well, amcheck is a tool that in essence makes sure that B-Trees look
structurally sound, and respect invariants like having every item on
each page in logical order. That could catch a bug of the kind I just
described, because it's quite likely that the recycled page would
happen to have items that didn't comport with the ordering on the
page. The block has been reused essentially at random. Importantly,
amcheck can catch this without anything more than an AccessShareLock,
so you have some hope of catching this kind of race condition (the
stale downlink that you followed to get to the
spuriously-recycled-early page doesn't stay stale for long). Or, maybe
it would happen to catch some other random problem. Difficult to say.

Again, this is based on a speculation that might be wrong. But it's
worth considering.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Kevin Grittner (#31)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I think a safer proposition would be to replace all current
BufferGetPage() calls (there are about 500) by adding the necessary
arguments: buffer, snapshot, rel, and an integer "flags". All this
without adding the feature. Then a subsequent commit would add the
TestForOldSnapshot inside BufferGetPage, *except* when a
BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get
the snapshot test by default.

That seems awfully invasive,

That's the argument I made, which ï¿½lvaro described as "dismissing"
his suggestion. In this post from October of 2015, I pointed out
that there are 36 calls where we need a snapshot and 450 where we
don't.

/messages/by-id/56479263.1140984.1444945639606.JavaMail.yahoo@mail.yahoo.com

I understand the invasiveness argument, but to me the danger of
introducing new bugs trumps that. The problem is not the current code,
but future patches: it is just too easy to make the mistake of not
checking the snapshot in new additions of BufferGetPage. And you said
that the result of missing a check is silent wrong results from queries
that should instead be cancelled, which seems fairly bad to me. My
impression was that you were actually considering doing something about
that -- sorry for the lack of clarity.

We have made similarly invasive changes in the past -- the
SearchSysCache API for instance.

not to mention performance-killing if the expectation is that
most such calls are going to need a snapshot check.

This patch is one which has allowed a customer where we could not
meet their performance requirements to pass them. It is the
opposite of performance-killing.

I think Tom misunderstood what I said and you misunderstood what Tom
said. Let me attempt to set things straight.

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

Tom said that my proposal would be performance-killing, not that your
patch would be performance-killing. But as I argue above, with my
proposal performance would stay the same, so we're actually okay.

I don't think nobody disputes that your patch is good in general.
I would be happy with it in 9.6, but I have my reservations about the
aforementioned problem.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Peter Geoghegan (#33)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 12:21 PM, Peter Geoghegan <pg@heroku.com> wrote:

Well, amcheck is a tool that in essence makes sure that B-Trees look
structurally sound, and respect invariants like having every item on
each page in logical order. That could catch a bug of the kind I just
described, because it's quite likely that the recycled page would
happen to have items that didn't comport with the ordering on the
page.

I mean: didn't comport with the ordering of the last page "on the same
level" (but, due to this issue, maybe not actually on the same level).
We check if the first item on the "right page" (in actuality, due to
this bug, the new page image following a spurious early recycle) is
greater than or equal to the previous page (the page whose right-link
we followed) last item. On each level, everything should be in order
-- that's an invariant that (it is posited by me) we can safely check
with only an AccessShareLock.

Making the relation lock only a AccessShareLock is not just about
reducing the impact on production systems. It's also about making the
tool more effective at finding these kinds of transient races, with
subtle user-visible symptoms.

Again, I don't want to prejudice anyone against your patch, which I
haven't read.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Jeff Janes (#21)

Re: snapshot too old, configured by time

On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'm not sure if this is operating as expected.

I set the value to 1min.

I set up a test like this:

pgbench -i

pgbench -c4 -j4 -T 3600 &

### watch the size of branches table
while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &

### set up a long lived snapshot.
psql -c 'begin; set transaction isolation level repeatable read;
select sum(bbalance) from pgbench_branches; select pg_sleep(300);
select sum(bbalance) from pgbench_branches;'

As this runs, I can see the size of the pgbench_branches bloating once
the snapshot is taken, and continues bloating at a linear rate for the
full 300 seconds.

Once the 300 second pg_sleep is up, the long-lived snapshot holder
receives an error once it tries to access the table again, and then
the bloat stops increasing. But shouldn't the bloat have stopped
increasing as soon as the snapshot became doomed, which would be after
a minute or so?

This is actually operating as intended, not a bug. Try running a
manual VACUUM command about two minutes after the snapshot is taken
and you should get a handle on what's going on. The old tuples
become eligible for vacuuming after one minute, but that doesn't
necessarily mean that autovacuum jumps in and that the space starts
getting reused. The manual vacuum will allow that, as you should
see on your monitoring window. A connection should not get the
error just because it is using a snapshot that tries to look at
data that might be wrong, and the connection holding the long-lived
snapshot doesn't do that until it awakes from the sleep and runs
the next SELECT command.

All is well as far as I can see here.

Thanks for checking, though! It is an interesting test!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Peter Geoghegan (#35)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan <pg@heroku.com> wrote:

[Does the patch allow dangling page pointers?]

Again, I don't want to prejudice anyone against your patch, which I
haven't read.

I don't believe that the way the patch does its business opens any
new vulnerabilities of this type. If you see such after looking at
the patch, let me know.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Kevin Grittner (#36)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 2:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

A connection should not get the
error just because it is using a snapshot that tries to look at
data that might be wrong, and the connection holding the long-lived
snapshot doesn't do that until it awakes from the sleep and runs
the next SELECT command.

Well, that came out wrong.

A connection should not get the "snapshot too old" error just
because it is *holds* an old snapshot; it must actually attempt to
read an affected page *using* an old snapshot, and the connection
holding the long-lived snapshot doesn't do that until it awakes
from the sleep and runs the next SELECT command.

Sorry for any confusion from the sloppy editing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Alvaro Herrera (#34)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I understand the invasiveness argument, but to me the danger of
introducing new bugs trumps that. The problem is not the current code,
but future patches: it is just too easy to make the mistake of not
checking the snapshot in new additions of BufferGetPage. And you said
that the result of missing a check is silent wrong results from queries
that should instead be cancelled, which seems fairly bad to me.

Fair point.

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

In many of the places that BufferGetPage is called there is not a
snapshot available. I assume that you would be OK with an Assert
that the flag was passed if the snapshot is NULL? I had been
picturing what you were requesting as just adding a snapshot
parameter and assuming that NULL meant not to check; adding two
parameters where the flag explicitly calls that the check is not
needed might do more to prevent accidents, but I do wonder how much
it would help during copy/paste frenzy. Touching all spots to use
the new function signature would be a mechanical job with the
compiler catching any errors, so it doesn't seem crazy to refactor
that now, but I would like to hear what some others think about
this.

Tom said that my proposal would be performance-killing, not that your
patch would be performance-killing. But as I argue above, with my
proposal performance would stay the same, so we're actually okay.

I don't think nobody disputes that your patch is good in general.
I would be happy with it in 9.6, but I have my reservations about the
aforementioned problem.

We have a lot of places in our code where people need to know
things that they are not reminded of by the surrounding code, but
I'm not about to argue that's a good thing; if the consensus is
that this would help prevent future bugs when new BufferGetPage
calls are added, I can go with the flow.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Kevin Grittner (#39)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

In many of the places that BufferGetPage is called there is not a
snapshot available. I assume that you would be OK with an Assert
that the flag was passed if the snapshot is NULL?

Sure, that's fine.

BTW I said "a macro" but I was forgetting that we have static inline
functions in headers now, which means you can avoid the horrors of
actually writing a macro.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Michael Paquier (#26)

Re: snapshot too old, configured by time

On Thu, Mar 24, 2016 at 2:24 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I have been looking at 4a, the test module, and things are looking
good IMO. Something that I think would be adapted would be to define
the options for isolation tests in a variable, like ISOLATION_OPTS to
allow MSVC scripts to fetch those option values more easily.

Maybe, but that seems like material for a separate patch.

+submake-test_decoding:
+   $(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old
The target name here is incorrect. This should refer to snapshot_too_old.

Good catch. Fixed.

So far, pending the resolution of the suggestion to add three new
parameters to BufferGetPage in 450 places that otherwise don't need
to be touched, this is the only change from the flurry of recent
testing and review, so I'm holding off on posting a new patch just
for this.

Regarding the main patch:
+ <primary><varname>old_snapshot_threshold</> configuration
parameter</primary>
snapshot_valid_limit?

There have already been responses supporting
old_snapshot_threshold, so I would need to hear a few more votes to
consider a change.

page = BufferGetPage(buf);
+ TestForOldSnapshot(scan->xs_snapshot, rel, page);
This is a sequence repeated many times in this patch, a new routine,
say BufferGetPageExtended with a uint8 flag, one flag being used to
test old snapshots would be more adapted. But this would require
creating a header dependency between the buffer manager and
SnapshotData.. Or more simply we may want a common code path when
fetching a page that a backend is going to use to fetch tuples. I am
afraid of TestForOldSnapshot() being something that could be easily
forgotten in code paths introduced in future patches...

Let's keep that discussion on the other branch of this thread.

+   if (whenTaken < 0)
+   {
+       elog(DEBUG1,
+            "MaintainOldSnapshotTimeMapping called with negative
whenTaken = %ld",
+            (long) whenTaken);
+       return;
+   }
+   if (!TransactionIdIsNormal(xmin))
+   {
+       elog(DEBUG1,
+            "MaintainOldSnapshotTimeMapping called with xmin = %lu",
+            (unsigned long) xmin);
+       return;
+   }
Material for two assertions?

You omitted the immediately preceding comment block:

+    /*
+     * We don't want to do something stupid with unusual values, but we don't
+     * want to litter the log with warnings or break otherwise normal
+     * processing for this feature; so if something seems unreasonable, just
+     * log at DEBUG level and return without doing anything.
+     */

I'm not clear that more drastic action is a good idea, since the
"fallback" is existing behavior. I fear that doing something more
aggressive might force other logic to become more precise about
aggressive clean-up, adding overhead beyond the value gained.

Thanks for the review!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Michael Paquier

michael.paquier@gmail.com

almost 10 years ago

In reply to: Kevin Grittner (#39)

Re: snapshot too old, configured by time

On Thu, Mar 31, 2016 at 5:09 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I understand the invasiveness argument, but to me the danger of
introducing new bugs trumps that. The problem is not the current code,
but future patches: it is just too easy to make the mistake of not
checking the snapshot in new additions of BufferGetPage. And you said
that the result of missing a check is silent wrong results from queries
that should instead be cancelled, which seems fairly bad to me.

Fair point.

That's my main concern after going through the patch, and the patch
written as-is does not help much future users. This could be easily
forgotten by committers as well.

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

In many of the places that BufferGetPage is called there is not a
snapshot available. I assume that you would be OK with an Assert
that the flag was passed if the snapshot is NULL? I had been
picturing what you were requesting as just adding a snapshot
parameter and assuming that NULL meant not to check; adding two
parameters where the flag explicitly calls that the check is not
needed might do more to prevent accidents, but I do wonder how much
it would help during copy/paste frenzy. Touching all spots to use
the new function signature would be a mechanical job with the
compiler catching any errors, so it doesn't seem crazy to refactor
that now, but I would like to hear what some others think about
this.

That's better than what the existing patch for sure. When calling
BufferGetPage() one could be tempted to forget to set snapshot to NULL
though. It should be clearly documented in the header of
BufferGetPage() where and for which purpose a snapshot should be set,
and in which code paths it is expected to be used. In our case, that's
mainly when a page is fetched from shared buffers and that it is used
for reading tuples from it.

Just a note: I began looking at the tests, but finished looking at the
patch entirely at the end by curiosity. Regarding the integration of
this patch for 9.6, I think that bumping that to 9.7 would be wiser
because the patch needs to be re-written largely, and that's never a
good sign at this point of the development cycle.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Michael Paquier (#42)

Re: snapshot too old, configured by time

Michael Paquier wrote:

Just a note: I began looking at the tests, but finished looking at the
patch entirely at the end by curiosity. Regarding the integration of
this patch for 9.6, I think that bumping that to 9.7 would be wiser
because the patch needs to be re-written largely, and that's never a
good sign at this point of the development cycle.

Not rewritten surelY? It will need a very large mechanical change to
existing BufferGetPage calls, but to me that doesn't equate "rewriting"
it.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Alvaro Herrera (#43)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 9:19 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Michael Paquier wrote:

Just a note: I began looking at the tests, but finished looking at the
patch entirely at the end by curiosity. Regarding the integration of
this patch for 9.6, I think that bumping that to 9.7 would be wiser
because the patch needs to be re-written largely, and that's never a
good sign at this point of the development cycle.

Not rewritten surelY? It will need a very large mechanical change to
existing BufferGetPage calls, but to me that doesn't equate "rewriting"
it.

I'll submit patches later today to make the mechanical change to
the nearly 500 BufferGetPage() calls and to tweak to the 36 places
to use the new "test" flag with the new signature rather than
adding a line for the test.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Alvaro Herrera (#40)

1 attachment(s)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

In many of the places that BufferGetPage is called there is not a
snapshot available. I assume that you would be OK with an Assert
that the flag was passed if the snapshot is NULL?

Sure, that's fine.

BTW I said "a macro" but I was forgetting that we have static inline
functions in headers now, which means you can avoid the horrors of
actually writing a macro.

Attached is what I think you're talking about for the first patch.
AFAICS this should generate identical executable code to unpatched.
Then the patch to actually implement the feature would, instead
of adding 30-some lines with TestForOldSnapshot() would implement
that as the behavior for the other enum value, and alter those
30-some BufferGetPage() calls.

Álvaro and Michael, is this what you were looking for?

Is everyone else OK with this approach?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

snapshot-too-old-BufferGetPage-prep-v1.patchbinary/octet-stream; name=snapshot-too-old-BufferGetPage-prep-v1.patchDownload

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index d088ce5..cdeffe3 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -90,7 +90,7 @@ typedef struct BTPageStat
 static void
 GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	PageHeader	phdr = (PageHeader) page;
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
 	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -317,7 +317,9 @@ bt_page_items(PG_FUNCTION_ARGS)
 		uargs = palloc(sizeof(struct user_args));
 
 		uargs->page = palloc(BLCKSZ);
-		memcpy(uargs->page, BufferGetPage(buffer), BLCKSZ);
+		memcpy(uargs->page,
+			   BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+			   BLCKSZ);
 
 		UnlockReleaseBuffer(buffer);
 		relation_close(rel, AccessShareLock);
@@ -447,7 +449,7 @@ bt_metap(PG_FUNCTION_ARGS)
 	buffer = ReadBuffer(rel, 0);
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metad = BTPageGetMeta(page);
 
 	/* Build a tuple descriptor for our result type */
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 71d0c8d..139419a 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -147,7 +147,9 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
 	buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL);
 	LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-	memcpy(raw_page_data, BufferGetPage(buf), BLCKSZ);
+	memcpy(raw_page_data,
+		   BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+		   BLCKSZ);
 
 	LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 	ReleaseBuffer(buf);
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 5e5c7cc..4a626c2 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -107,7 +107,7 @@ pg_visibility(PG_FUNCTION_ARGS)
 	buffer = ReadBuffer(rel, blkno);
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	values[2] = BoolGetDatum(PageIsAllVisible(page));
 
 	UnlockReleaseBuffer(buffer);
@@ -333,7 +333,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 										bstrategy);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			if (PageIsAllVisible(page))
 				info->bits[blkno] |= (1 << 2);
 
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 5d08c73..83cc6dd 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -100,7 +100,7 @@ statapprox_heap(Relation rel, output_type *stat)
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/*
 		 * It's not safe to call PageGetHeapFreeSpace() on new pages, so we
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 9f1377c..4596632 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -173,7 +173,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 	 */
 	{
 		Buffer		buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 0, RBM_NORMAL, bstrategy);
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		BTMetaPageData *metad = BTPageGetMeta(page);
 
 		indexStat.version = metad->btm_version;
@@ -211,7 +211,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
 		buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		/* Determine page type, and update totals */
@@ -399,7 +399,7 @@ pgstatginindex(PG_FUNCTION_ARGS)
 	 */
 	buffer = ReadBuffer(rel, GIN_METAPAGE_BLKNO);
 	LockBuffer(buffer, GIN_SHARE);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metadata = GinPageGetMeta(page);
 
 	stats.version = metadata->ginVersion;
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index c1122b4..46655ac 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -320,7 +320,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 			buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
 										RBM_NORMAL, scan->rs_strategy);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
-			stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
+			stat.free_space += PageGetHeapFreeSpace
+				(BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST));
 			UnlockReleaseBuffer(buffer);
 			block++;
 		}
@@ -333,7 +334,8 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
 		buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
 									RBM_NORMAL, scan->rs_strategy);
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
-		stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer));
+		stat.free_space += PageGetHeapFreeSpace
+			(BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST));
 		UnlockReleaseBuffer(buffer);
 		block++;
 	}
@@ -358,7 +360,7 @@ pgstat_btree_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
 
 	buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
 	LockBuffer(buf, BT_READ);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/* Page is valid, see what to do with it */
 	if (PageIsNew(page))
@@ -402,7 +404,7 @@ pgstat_hash_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
 
 	_hash_getlock(rel, blkno, HASH_SHARE);
 	buf = _hash_getbuf_with_strategy(rel, blkno, HASH_READ, 0, bstrategy);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (PageGetSpecialSize(page) == MAXALIGN(sizeof(HashPageOpaqueData)))
 	{
@@ -447,7 +449,7 @@ pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
 	buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
 	LockBuffer(buf, GIST_SHARE);
 	gistcheckpage(rel, buf);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (GistPageIsLeaf(page))
 	{
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index c740952..6f6f1b1 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -208,7 +208,8 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 		}
 		else
 		{
-			Page		page = BufferGetPage(buf);
+			Page		page = BufferGetPage(buf, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 			ItemId		lp = PageGetItemId(page, off);
 			Size		origsz;
 			BrinTuple  *origtup;
@@ -617,7 +618,8 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	Assert(BufferGetBlockNumber(meta) == BRIN_METAPAGE_BLKNO);
 	LockBuffer(meta, BUFFER_LOCK_EXCLUSIVE);
 
-	brin_metapage_init(BufferGetPage(meta), BrinGetPagesPerRange(index),
+	brin_metapage_init(BufferGetPage(meta, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+					   BrinGetPagesPerRange(index),
 					   BRIN_CURRENT_VERSION);
 	MarkBufferDirty(meta);
 
@@ -636,7 +638,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 
 		recptr = XLogInsert(RM_BRIN_ID, XLOG_BRIN_CREATE_INDEX);
 
-		page = BufferGetPage(meta);
+		page = BufferGetPage(meta, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
 	}
 
@@ -686,7 +688,9 @@ brinbuildempty(Relation index)
 
 	/* Initialize and xlog metabuffer. */
 	START_CRIT_SECTION();
-	brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
+	brin_metapage_init(BufferGetPage(metabuf, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST),
+					   BrinGetPagesPerRange(index),
 					   BRIN_CURRENT_VERSION);
 	MarkBufferDirty(metabuf);
 	log_newpage_buffer(metabuf, false);
@@ -941,7 +945,8 @@ terminate_brin_buildstate(BrinBuildState *state)
 	{
 		Page		page;
 
-		page = BufferGetPage(state->bs_currentInsertBuf);
+		page = BufferGetPage(state->bs_currentInsertBuf, NULL, NULL,
+							 BGP_NO_SNAPSHOT_TEST);
 		RecordPageWithFreeSpace(state->bs_irel,
 							BufferGetBlockNumber(state->bs_currentInsertBuf),
 								PageGetFreeSpace(page));
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index d0ca485..a522b0b 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -110,7 +110,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		newbuf = InvalidBuffer;
 		extended = false;
 	}
-	oldpage = BufferGetPage(oldbuf);
+	oldpage = BufferGetPage(oldbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	oldlp = PageGetItemId(oldpage, oldoff);
 
 	/*
@@ -228,7 +228,8 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		 * Not enough free space on the oldpage. Put the new tuple on the new
 		 * page, and update the revmap.
 		 */
-		Page		newpage = BufferGetPage(newbuf);
+		Page		newpage = BufferGetPage(newbuf, NULL, NULL,
+											BGP_NO_SNAPSHOT_TEST);
 		Buffer		revmapbuf;
 		ItemPointerData newtid;
 		OffsetNumber newoff;
@@ -245,7 +246,9 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 		 * need to do that here.
 		 */
 		if (extended)
-			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
+			brin_page_init(BufferGetPage(newbuf, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST),
+						   BRIN_PAGETYPE_REGULAR);
 
 		PageIndexDeleteNoCompact(oldpage, &oldoff, 1);
 		newoff = PageAddItem(newpage, (Item) newtup, newsz,
@@ -298,7 +301,9 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 
 			PageSetLSN(oldpage, recptr);
 			PageSetLSN(newpage, recptr);
-			PageSetLSN(BufferGetPage(revmapbuf), recptr);
+			PageSetLSN(BufferGetPage(revmapbuf, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST),
+					   recptr);
 		}
 
 		END_CRIT_SECTION();
@@ -326,7 +331,9 @@ brin_can_do_samepage_update(Buffer buffer, Size origsz, Size newsz)
 {
 	return
 		((newsz <= origsz) ||
-		 PageGetExactFreeSpace(BufferGetPage(buffer)) >= (newsz - origsz));
+		 PageGetExactFreeSpace(BufferGetPage(buffer, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST))
+			>= (newsz - origsz));
 }
 
 /*
@@ -381,7 +388,9 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 		 * it's still a regular page.
 		 */
 		LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
-		if (br_page_get_freespace(BufferGetPage(*buffer)) < itemsz)
+		if (br_page_get_freespace(BufferGetPage(*buffer, NULL, NULL,
+												BGP_NO_SNAPSHOT_TEST))
+			< itemsz)
 		{
 			UnlockReleaseBuffer(*buffer);
 			*buffer = InvalidBuffer;
@@ -404,13 +413,15 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 	/* Now obtain lock on revmap buffer */
 	revmapbuf = brinLockRevmapPageForUpdate(revmap, heapBlk);
 
-	page = BufferGetPage(*buffer);
+	page = BufferGetPage(*buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	blk = BufferGetBlockNumber(*buffer);
 
 	/* Execute the actual insertion */
 	START_CRIT_SECTION();
 	if (extended)
-		brin_page_init(BufferGetPage(*buffer), BRIN_PAGETYPE_REGULAR);
+		brin_page_init(BufferGetPage(*buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST),
+					   BRIN_PAGETYPE_REGULAR);
 	off = PageAddItem(page, (Item) tup, itemsz, InvalidOffsetNumber,
 					  false, false);
 	if (off == InvalidOffsetNumber)
@@ -447,7 +458,8 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
 		recptr = XLogInsert(RM_BRIN_ID, info);
 
 		PageSetLSN(page, recptr);
-		PageSetLSN(BufferGetPage(revmapbuf), recptr);
+		PageSetLSN(BufferGetPage(revmapbuf, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST), recptr);
 	}
 
 	END_CRIT_SECTION();
@@ -515,7 +527,7 @@ brin_start_evacuating_page(Relation idxRel, Buffer buf)
 	OffsetNumber maxoff;
 	Page		page;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (PageIsNew(page))
 		return false;
@@ -551,7 +563,7 @@ brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange,
 	OffsetNumber maxoff;
 	Page		page;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	Assert(BrinPageFlags(page) & BRIN_EVACUATE_PAGE);
 
@@ -598,7 +610,7 @@ brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange,
 bool
 brin_page_cleanup(Relation idxrel, Buffer buf)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	Size		freespace;
 
 	/*
@@ -627,8 +639,10 @@ brin_page_cleanup(Relation idxrel, Buffer buf)
 	}
 
 	/* Nothing to be done for non-regular index pages */
-	if (BRIN_IS_META_PAGE(BufferGetPage(buf)) ||
-		BRIN_IS_REVMAP_PAGE(BufferGetPage(buf)))
+	if (BRIN_IS_META_PAGE(BufferGetPage(buf, NULL, NULL,
+										BGP_NO_SNAPSHOT_TEST)) ||
+		BRIN_IS_REVMAP_PAGE(BufferGetPage(buf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST)))
 		return false;
 
 	/* Measure free space and record it */
@@ -738,7 +752,8 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 		if (BufferIsValid(oldbuf) && oldblk < newblk)
 		{
 			LockBuffer(oldbuf, BUFFER_LOCK_EXCLUSIVE);
-			if (!BRIN_IS_REGULAR_PAGE(BufferGetPage(oldbuf)))
+			if (!BRIN_IS_REGULAR_PAGE(BufferGetPage(oldbuf, NULL, NULL,
+													BGP_NO_SNAPSHOT_TEST)))
 			{
 				LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
 
@@ -770,7 +785,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 		if (extensionLockHeld)
 			UnlockRelationForExtension(irel, ExclusiveLock);
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/*
 		 * We have a new buffer to insert into.  Check that the new page has
@@ -805,7 +820,8 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 			if (BufferIsValid(oldbuf) && oldblk > newblk)
 			{
 				LockBuffer(oldbuf, BUFFER_LOCK_EXCLUSIVE);
-				Assert(BRIN_IS_REGULAR_PAGE(BufferGetPage(oldbuf)));
+				Assert(BRIN_IS_REGULAR_PAGE(BufferGetPage(oldbuf, NULL, NULL,
+														  BGP_NO_SNAPSHOT_TEST)));
 			}
 
 			return buf;
@@ -862,7 +878,7 @@ brin_initialize_empty_new_buffer(Relation idxrel, Buffer buffer)
 			   BufferGetBlockNumber(buffer)));
 
 	START_CRIT_SECTION();
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	brin_page_init(page, BRIN_PAGETYPE_REGULAR);
 	MarkBufferDirty(buffer);
 	log_newpage_buffer(buffer, true);
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index b2c273d..a9c2584 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -76,7 +76,9 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
 
 	meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
 	LockBuffer(meta, BUFFER_LOCK_SHARE);
-	metadata = (BrinMetaPageData *) PageGetContents(BufferGetPage(meta));
+	metadata = (BrinMetaPageData *)
+		PageGetContents(BufferGetPage(meta, NULL, NULL,
+									  BGP_NO_SNAPSHOT_TEST));
 
 	revmap = palloc(sizeof(BrinRevmap));
 	revmap->rm_irel = idxrel;
@@ -159,7 +161,7 @@ brinSetHeapBlockItemptr(Buffer buf, BlockNumber pagesPerRange,
 	Page		page;
 
 	/* The correct page should already be pinned and locked */
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	contents = (RevmapContents *) PageGetContents(page);
 	iptr = (ItemPointerData *) contents->rm_tids;
 	iptr += HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk);
@@ -226,7 +228,8 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 		LockBuffer(revmap->rm_currBuf, BUFFER_LOCK_SHARE);
 
 		contents = (RevmapContents *)
-			PageGetContents(BufferGetPage(revmap->rm_currBuf));
+			PageGetContents(BufferGetPage(revmap->rm_currBuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 		iptr = contents->rm_tids;
 		iptr += HEAPBLK_TO_REVMAP_INDEX(revmap->rm_pagesPerRange, heapBlk);
 
@@ -261,7 +264,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 			*buf = ReadBuffer(idxRel, blk);
 		}
 		LockBuffer(*buf, mode);
-		page = BufferGetPage(*buf);
+		page = BufferGetPage(*buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/* If we land on a revmap page, start over */
 		if (BRIN_IS_REGULAR_PAGE(page))
@@ -393,7 +396,8 @@ revmap_physical_extend(BrinRevmap *revmap)
 	 * another backend can extend the index with regular BRIN pages.
 	 */
 	LockBuffer(revmap->rm_metaBuf, BUFFER_LOCK_EXCLUSIVE);
-	metapage = BufferGetPage(revmap->rm_metaBuf);
+	metapage = BufferGetPage(revmap->rm_metaBuf, NULL, NULL,
+							 BGP_NO_SNAPSHOT_TEST);
 	metadata = (BrinMetaPageData *) PageGetContents(metapage);
 
 	/*
@@ -413,7 +417,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 	{
 		buf = ReadBuffer(irel, mapBlk);
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	}
 	else
 	{
@@ -436,7 +440,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 			return;
 		}
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (needLock)
 			UnlockRelationForExtension(irel, ExclusiveLock);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index deb7af4..36e4a99 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -30,7 +30,7 @@ brin_xlog_createidx(XLogReaderState *record)
 	/* create the index' metapage */
 	buf = XLogInitBufferForRedo(record, 0);
 	Assert(BufferIsValid(buf));
-	page = (Page) BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	brin_metapage_init(page, xlrec->pagesPerRange, xlrec->version);
 	PageSetLSN(page, lsn);
 	MarkBufferDirty(buf);
@@ -58,7 +58,7 @@ brin_xlog_insert_update(XLogReaderState *record,
 	if (XLogRecGetInfo(record) & XLOG_BRIN_INIT_PAGE)
 	{
 		buffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		brin_page_init(page, BRIN_PAGETYPE_REGULAR);
 		action = BLK_NEEDS_REDO;
 	}
@@ -81,7 +81,7 @@ brin_xlog_insert_update(XLogReaderState *record,
 
 		Assert(tuple->bt_blkno == xlrec->heapBlk);
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		offnum = xlrec->offnum;
 		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
 			elog(PANIC, "brin_xlog_insert_update: invalid max offset number");
@@ -103,7 +103,7 @@ brin_xlog_insert_update(XLogReaderState *record,
 		ItemPointerData tid;
 
 		ItemPointerSet(&tid, regpgno, xlrec->offnum);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		brinSetHeapBlockItemptr(buffer, xlrec->pagesPerRange, xlrec->heapBlk,
 								tid);
@@ -145,7 +145,7 @@ brin_xlog_update(XLogReaderState *record)
 		Page		page;
 		OffsetNumber offnum;
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->oldOffnum;
 		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
@@ -186,7 +186,7 @@ brin_xlog_samepage_update(XLogReaderState *record)
 
 		brintuple = (BrinTuple *) XLogRecGetBlockData(record, 0, &tuplen);
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->offnum;
 		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
@@ -232,7 +232,7 @@ brin_xlog_revmap_extend(XLogReaderState *record)
 		Page		metapg;
 		BrinMetaPageData *metadata;
 
-		metapg = BufferGetPage(metabuf);
+		metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		metadata = (BrinMetaPageData *) PageGetContents(metapg);
 
 		Assert(metadata->lastRevmapPage == xlrec->targetBlk - 1);
@@ -248,7 +248,7 @@ brin_xlog_revmap_extend(XLogReaderState *record)
 	 */
 
 	buf = XLogInitBufferForRedo(record, 1);
-	page = (Page) BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	brin_page_init(page, BRIN_PAGETYPE_REVMAP);
 
 	PageSetLSN(page, lsn);
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 06ba9cb..10dded4 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -36,7 +36,7 @@ ginTraverseLock(Buffer buffer, bool searchMode)
 	int			access = GIN_SHARE;
 
 	LockBuffer(buffer, GIN_SHARE);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	if (GinPageIsLeaf(page))
 	{
 		if (searchMode == FALSE)
@@ -89,7 +89,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 
 		stack->off = InvalidOffsetNumber;
 
-		page = BufferGetPage(stack->buffer);
+		page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		access = ginTraverseLock(stack->buffer, searchMode);
 
@@ -115,7 +115,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode)
 
 			stack->buffer = ginStepRight(stack->buffer, btree->index, access);
 			stack->blkno = rightlink;
-			page = BufferGetPage(stack->buffer);
+			page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			if (!searchMode && GinPageIsIncompleteSplit(page))
 				ginFinishSplit(btree, stack, false, NULL);
@@ -161,7 +161,7 @@ Buffer
 ginStepRight(Buffer buffer, Relation index, int lockmode)
 {
 	Buffer		nextbuffer;
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	bool		isLeaf = GinPageIsLeaf(page);
 	bool		isData = GinPageIsData(page);
 	BlockNumber blkno = GinPageGetOpaque(page)->rightlink;
@@ -171,7 +171,7 @@ ginStepRight(Buffer buffer, Relation index, int lockmode)
 	UnlockReleaseBuffer(buffer);
 
 	/* Sanity check that the page we stepped to is of similar kind. */
-	page = BufferGetPage(nextbuffer);
+	page = BufferGetPage(nextbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
 		elog(ERROR, "right sibling of GIN page is of different type");
 
@@ -243,7 +243,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
 	for (;;)
 	{
 		LockBuffer(buffer, GIN_EXCLUSIVE);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		if (GinPageIsLeaf(page))
 			elog(ERROR, "Lost path");
 
@@ -274,7 +274,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
 				break;
 			}
 			buffer = ginStepRight(buffer, btree->index, GIN_EXCLUSIVE);
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			/* finish any incomplete splits, as above */
 			if (GinPageIsIncompleteSplit(page))
@@ -325,7 +325,8 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 			   void *insertdata, BlockNumber updateblkno,
 			   Buffer childbuf, GinStatsData *buildStats)
 {
-	Page		page = BufferGetPage(stack->buffer);
+	Page		page = BufferGetPage(stack->buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	GinPlaceToPageRC rc;
 	uint16		xlflags = 0;
 	Page		childpage = NULL;
@@ -344,7 +345,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 	{
 		Assert(BufferIsValid(childbuf));
 		Assert(updateblkno != InvalidBlockNumber);
-		childpage = BufferGetPage(childbuf);
+		childpage = BufferGetPage(childbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	}
 
 	/*
@@ -456,7 +457,8 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		data.flags = xlflags;
 		if (childbuf != InvalidBuffer)
 		{
-			Page		childpage = BufferGetPage(childbuf);
+			Page		childpage = BufferGetPage(childbuf, NULL, NULL,
+												  BGP_NO_SNAPSHOT_TEST);
 
 			GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
 
@@ -538,14 +540,21 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		if (stack->parent == NULL)
 		{
 			MarkBufferDirty(lbuffer);
-			memcpy(BufferGetPage(stack->buffer), newrootpg, BLCKSZ);
-			memcpy(BufferGetPage(lbuffer), newlpage, BLCKSZ);
-			memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ);
+			memcpy(BufferGetPage(stack->buffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST),
+				   newrootpg, BLCKSZ);
+			memcpy(BufferGetPage(lbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				   newlpage, BLCKSZ);
+			memcpy(BufferGetPage(rbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				   newrpage, BLCKSZ);
 		}
 		else
 		{
-			memcpy(BufferGetPage(stack->buffer), newlpage, BLCKSZ);
-			memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ);
+			memcpy(BufferGetPage(stack->buffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST),
+				   newlpage, BLCKSZ);
+			memcpy(BufferGetPage(rbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				   newrpage, BLCKSZ);
 		}
 
 		/* write WAL record */
@@ -577,10 +586,16 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 			XLogRegisterData((char *) &data, sizeof(ginxlogSplit));
 
 			recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT);
-			PageSetLSN(BufferGetPage(stack->buffer), recptr);
-			PageSetLSN(BufferGetPage(rbuffer), recptr);
+			PageSetLSN(BufferGetPage(stack->buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST),
+					   recptr);
+			PageSetLSN(BufferGetPage(rbuffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST),
+					   recptr);
 			if (stack->parent == NULL)
-				PageSetLSN(BufferGetPage(lbuffer), recptr);
+				PageSetLSN(BufferGetPage(lbuffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST),
+						   recptr);
 			if (BufferIsValid(childbuf))
 				PageSetLSN(childpage, recptr);
 		}
@@ -662,11 +677,12 @@ ginFinishSplit(GinBtree btree, GinBtreeStack *stack, bool freestack,
 		 * page that has no downlink in the parent, and splitting it further
 		 * would fail.
 		 */
-		if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
+		if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer, NULL, NULL,
+												   BGP_NO_SNAPSHOT_TEST)))
 			ginFinishSplit(btree, parent, false, buildStats);
 
 		/* move right if it's needed */
-		page = BufferGetPage(parent->buffer);
+		page = BufferGetPage(parent->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		while ((parent->off = btree->findChildPtr(btree, page, stack->blkno, parent->off)) == InvalidOffsetNumber)
 		{
 			if (GinPageRightMost(page))
@@ -684,15 +700,17 @@ ginFinishSplit(GinBtree btree, GinBtreeStack *stack, bool freestack,
 
 			parent->buffer = ginStepRight(parent->buffer, btree->index, GIN_EXCLUSIVE);
 			parent->blkno = BufferGetBlockNumber(parent->buffer);
-			page = BufferGetPage(parent->buffer);
+			page = BufferGetPage(parent->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
-			if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
+			if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer, NULL, NULL,
+													   BGP_NO_SNAPSHOT_TEST)))
 				ginFinishSplit(btree, parent, false, buildStats);
 		}
 
 		/* insert the downlink */
 		insertdata = btree->prepareDownlink(btree, stack->buffer);
-		updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer))->rightlink;
+		updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer, NULL, NULL,
+													 BGP_NO_SNAPSHOT_TEST))->rightlink;
 		done = ginPlaceToPage(btree, parent,
 							  insertdata, updateblkno,
 							  stack->buffer, buildStats);
@@ -742,7 +760,8 @@ ginInsertValue(GinBtree btree, GinBtreeStack *stack, void *insertdata,
 	bool		done;
 
 	/* If the leaf page was incompletely split, finish the split first */
-	if (GinPageIsIncompleteSplit(BufferGetPage(stack->buffer)))
+	if (GinPageIsIncompleteSplit(BufferGetPage(stack->buffer, NULL, NULL,
+											   BGP_NO_SNAPSHOT_TEST)))
 		ginFinishSplit(btree, stack, false, buildStats);
 
 	done = ginPlaceToPage(btree, stack,
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index a55bb4a..9c501a1 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -246,7 +246,7 @@ dataLocateItem(GinBtree btree, GinBtreeStack *stack)
 				maxoff;
 	PostingItem *pitem = NULL;
 	int			result;
-	Page		page = BufferGetPage(stack->buffer);
+	Page		page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	Assert(!GinPageIsLeaf(page));
 	Assert(GinPageIsData(page));
@@ -432,7 +432,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 	GinBtreeDataLeafInsertData *items = insertdata;
 	ItemPointer newItems = &items->items[items->curitem];
 	int			maxitems = items->nitem - items->curitem;
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	int			i;
 	ItemPointerData rbound;
 	ItemPointerData lbound;
@@ -714,7 +714,7 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 void
 ginVacuumPostingTreeLeaf(Relation indexrel, Buffer buffer, GinVacuumState *gvs)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	disassembledLeaf *leaf;
 	bool		removedsomething = false;
 	dlist_iter	iter;
@@ -953,7 +953,7 @@ registerLeafRecompressWALData(Buffer buf, disassembledLeaf *leaf)
 static void
 dataPlaceToPageLeafRecompress(Buffer buf, disassembledLeaf *leaf)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	char	   *ptr;
 	int			newsize;
 	bool		modified = false;
@@ -1091,7 +1091,7 @@ dataPlaceToPageInternal(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 						void *insertdata, BlockNumber updateblkno,
 						Page *newlpage, Page *newrpage)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber off = stack->off;
 	PostingItem *pitem;
 
@@ -1141,7 +1141,7 @@ dataPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 				void *insertdata, BlockNumber updateblkno,
 				Page *newlpage, Page *newrpage)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	Assert(GinPageIsData(page));
 
@@ -1164,7 +1164,7 @@ dataSplitPageInternal(GinBtree btree, Buffer origbuf,
 					  void *insertdata, BlockNumber updateblkno,
 					  Page *newlpage, Page *newrpage)
 {
-	Page		oldpage = BufferGetPage(origbuf);
+	Page		oldpage = BufferGetPage(origbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber off = stack->off;
 	int			nitems = GinPageGetOpaque(oldpage)->maxoff;
 	int			nleftitems;
@@ -1242,7 +1242,7 @@ static void *
 dataPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
 	PostingItem *pitem = palloc(sizeof(PostingItem));
-	Page		lpage = BufferGetPage(lbuf);
+	Page		lpage = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	PostingItemSetBlockNumber(pitem, BufferGetBlockNumber(lbuf));
 	pitem->key = *GinDataPageGetRightBound(lpage);
@@ -1726,7 +1726,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
 	 * All set. Get a new physical page, and copy the in-memory page to it.
 	 */
 	buffer = GinNewBuffer(index);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	blkno = BufferGetBlockNumber(buffer);
 
 	START_CRIT_SECTION();
diff --git a/src/backend/access/gin/ginentrypage.c b/src/backend/access/gin/ginentrypage.c
index 2512745..8a5d9e1 100644
--- a/src/backend/access/gin/ginentrypage.c
+++ b/src/backend/access/gin/ginentrypage.c
@@ -274,7 +274,8 @@ entryLocateEntry(GinBtree btree, GinBtreeStack *stack)
 				maxoff;
 	IndexTuple	itup = NULL;
 	int			result;
-	Page		page = BufferGetPage(stack->buffer);
+	Page		page = BufferGetPage(stack->buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 
 	Assert(!GinPageIsLeaf(page));
 	Assert(!GinPageIsData(page));
@@ -345,7 +346,8 @@ entryLocateEntry(GinBtree btree, GinBtreeStack *stack)
 static bool
 entryLocateLeafEntry(GinBtree btree, GinBtreeStack *stack)
 {
-	Page		page = BufferGetPage(stack->buffer);
+	Page		page = BufferGetPage(stack->buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber low,
 				high;
 
@@ -461,7 +463,7 @@ entryIsEnoughSpace(GinBtree btree, Buffer buf, OffsetNumber off,
 {
 	Size		releasedsz = 0;
 	Size		addedsz;
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	Assert(insertData->entry);
 	Assert(!GinPageIsData(page));
@@ -525,7 +527,7 @@ entryPlaceToPage(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 				 Page *newlpage, Page *newrpage)
 {
 	GinBtreeEntryInsertData *insertData = insertPayload;
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber off = stack->off;
 	OffsetNumber placed;
 
@@ -592,8 +594,10 @@ entrySplitPage(GinBtree btree, Buffer origbuf,
 	char	   *ptr;
 	IndexTuple	itup;
 	Page		page;
-	Page		lpage = PageGetTempPageCopy(BufferGetPage(origbuf));
-	Page		rpage = PageGetTempPageCopy(BufferGetPage(origbuf));
+	Page		lpage = PageGetTempPageCopy(BufferGetPage(origbuf, NULL, NULL,
+														  BGP_NO_SNAPSHOT_TEST));
+	Page		rpage = PageGetTempPageCopy(BufferGetPage(origbuf, NULL, NULL,
+														  BGP_NO_SNAPSHOT_TEST));
 	Size		pageSize = PageGetPageSize(lpage);
 	char		tupstore[2 * BLCKSZ];
 
@@ -674,7 +678,7 @@ static void *
 entryPrepareDownlink(GinBtree btree, Buffer lbuf)
 {
 	GinBtreeEntryInsertData *insertData;
-	Page		lpage = BufferGetPage(lbuf);
+	Page		lpage = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BlockNumber lblkno = BufferGetBlockNumber(lbuf);
 	IndexTuple	itup;
 
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 2ddf568..08ec16f 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -53,7 +53,7 @@ static int32
 writeListPage(Relation index, Buffer buffer,
 			  IndexTuple *tuples, int32 ntuples, BlockNumber rightlink)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	int32		i,
 				freesize,
 				size = 0;
@@ -239,7 +239,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 	data.newRightlink = data.prevTail = InvalidBlockNumber;
 
 	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (collector->sumsize + collector->ntuples * sizeof(ItemIdData) > GinListPageSize)
 	{
@@ -310,7 +310,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 
 			buffer = ReadBuffer(index, metadata->tail);
 			LockBuffer(buffer, GIN_EXCLUSIVE);
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			Assert(GinPageGetOpaque(page)->rightlink == InvalidBlockNumber);
 
@@ -344,7 +344,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 
 		buffer = ReadBuffer(index, metadata->tail);
 		LockBuffer(buffer, GIN_EXCLUSIVE);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		off = (PageIsEmpty(page)) ? FirstOffsetNumber :
 			OffsetNumberNext(PageGetMaxOffsetNumber(page));
@@ -514,7 +514,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
 	GinMetaPageData *metadata;
 	BlockNumber blknoToDelete;
 
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metadata = GinPageGetMeta(metapage);
 	blknoToDelete = metadata->head;
 
@@ -533,7 +533,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
 			freespace[data.ndeleted] = blknoToDelete;
 			buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete);
 			LockBuffer(buffers[data.ndeleted], GIN_EXCLUSIVE);
-			page = BufferGetPage(buffers[data.ndeleted]);
+			page = BufferGetPage(buffers[data.ndeleted], NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			data.ndeleted++;
 
@@ -582,7 +582,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
 
 		for (i = 0; i < data.ndeleted; i++)
 		{
-			page = BufferGetPage(buffers[i]);
+			page = BufferGetPage(buffers[i], NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			GinPageGetOpaque(page)->flags = GIN_DELETED;
 			MarkBufferDirty(buffers[i]);
 		}
@@ -606,7 +606,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
 
 			for (i = 0; i < data.ndeleted; i++)
 			{
-				page = BufferGetPage(buffers[i]);
+				page = BufferGetPage(buffers[i], NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				PageSetLSN(page, recptr);
 			}
 		}
@@ -760,7 +760,7 @@ ginInsertCleanup(GinState *ginstate,
 
 	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
 	LockBuffer(metabuffer, GIN_SHARE);
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metadata = GinPageGetMeta(metapage);
 
 	if (metadata->head == InvalidBlockNumber)
@@ -776,7 +776,7 @@ ginInsertCleanup(GinState *ginstate,
 	blkno = metadata->head;
 	buffer = ReadBuffer(index, blkno);
 	LockBuffer(buffer, GIN_SHARE);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	LockBuffer(metabuffer, GIN_UNLOCK);
 
@@ -943,7 +943,7 @@ ginInsertCleanup(GinState *ginstate,
 		vacuum_delay_point();
 		buffer = ReadBuffer(index, blkno);
 		LockBuffer(buffer, GIN_SHARE);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	}
 
 	ReleaseBuffer(metabuffer);
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 53290a4..8b9629c 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -39,7 +39,8 @@ typedef struct pendingPosition
 static bool
 moveRightIfItNeeded(GinBtreeData *btree, GinBtreeStack *stack)
 {
-	Page		page = BufferGetPage(stack->buffer);
+	Page		page = BufferGetPage(stack->buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 
 	if (stack->off > PageGetMaxOffsetNumber(page))
 	{
@@ -82,7 +83,7 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
 	 */
 	for (;;)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		if ((GinPageGetOpaque(page)->flags & GIN_DELETED) == 0)
 		{
 			int			n = GinDataLeafPageGetItemsToTbm(page, scanEntry->matchBitmap);
@@ -144,7 +145,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 		if (moveRightIfItNeeded(btree, stack) == false)
 			return true;
 
-		page = BufferGetPage(stack->buffer);
+		page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, stack->off));
 
 		/*
@@ -231,7 +232,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 			 * might have occurred, so we need to re-find our position.
 			 */
 			LockBuffer(stack->buffer, GIN_SHARE);
-			page = BufferGetPage(stack->buffer);
+			page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			if (!GinPageIsLeaf(page))
 			{
 				/*
@@ -251,7 +252,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
 				if (moveRightIfItNeeded(btree, stack) == false)
 					elog(ERROR, "lost saved point in index");	/* must not happen !!! */
 
-				page = BufferGetPage(stack->buffer);
+				page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, stack->off));
 
 				if (gintuple_get_attrnum(btree->ginstate, itup) != attnum)
@@ -319,7 +320,7 @@ restartScanEntry:
 						entry->queryKey, entry->queryCategory,
 						ginstate);
 	stackEntry = ginFindLeafPage(&btreeEntry, true);
-	page = BufferGetPage(stackEntry->buffer);
+	page = BufferGetPage(stackEntry->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	needUnlock = TRUE;
 
 	entry->isFinished = TRUE;
@@ -393,7 +394,7 @@ restartScanEntry:
 			 */
 			IncrBufferRefCount(entry->buffer);
 
-			page = BufferGetPage(entry->buffer);
+			page = BufferGetPage(entry->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			/*
 			 * Load the first page into memory.
@@ -638,7 +639,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
 		 GinItemPointerGetOffsetNumber(&advancePast),
 		 !stepright);
 
-	page = BufferGetPage(entry->buffer);
+	page = BufferGetPage(entry->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	for (;;)
 	{
 		entry->offset = InvalidOffsetNumber;
@@ -670,7 +671,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
 			entry->buffer = ginStepRight(entry->buffer,
 										 ginstate->index,
 										 GIN_SHARE);
-			page = BufferGetPage(entry->buffer);
+			page = BufferGetPage(entry->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		}
 		stepright = true;
 
@@ -1331,7 +1332,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
 	ItemPointerSetInvalid(&pos->item);
 	for (;;)
 	{
-		page = BufferGetPage(pos->pendingBuffer);
+		page = BufferGetPage(pos->pendingBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		maxoff = PageGetMaxOffsetNumber(page);
 		if (pos->firstOffset > maxoff)
@@ -1511,7 +1512,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
 		memset(datumExtracted + pos->firstOffset - 1, 0,
 			   sizeof(bool) * (pos->lastOffset - pos->firstOffset));
 
-		page = BufferGetPage(pos->pendingBuffer);
+		page = BufferGetPage(pos->pendingBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		for (i = 0; i < so->nkeys; i++)
 		{
@@ -1703,7 +1704,8 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 	*ntids = 0;
 
 	LockBuffer(metabuffer, GIN_SHARE);
-	blkno = GinPageGetMeta(BufferGetPage(metabuffer))->head;
+	blkno = GinPageGetMeta(BufferGetPage(metabuffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST))->head;
 
 	/*
 	 * fetch head of list before unlocking metapage. head page must be pinned
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index cd21e0e..1265011 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -193,7 +193,7 @@ ginEntryInsert(GinState *ginstate,
 	ginPrepareEntryScan(&btree, attnum, key, category, ginstate);
 
 	stack = ginFindLeafPage(&btree, false);
-	page = BufferGetPage(stack->buffer);
+	page = BufferGetPage(stack->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (btree.findItem(&btree, stack))
 	{
@@ -352,10 +352,10 @@ ginbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 
 		recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_CREATE_INDEX);
 
-		page = BufferGetPage(RootBuffer);
+		page = BufferGetPage(RootBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
 
-		page = BufferGetPage(MetaBuffer);
+		page = BufferGetPage(MetaBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
 	}
 
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 9450267..de3532b 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -273,7 +273,8 @@ GinNewBuffer(Relation index)
 		 */
 		if (ConditionalLockBuffer(buffer))
 		{
-			Page		page = BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 
 			if (PageIsNew(page))
 				return buffer;	/* OK to use, if never initialized */
@@ -318,14 +319,15 @@ GinInitPage(Page page, uint32 f, Size pageSize)
 void
 GinInitBuffer(Buffer b, uint32 f)
 {
-	GinInitPage(BufferGetPage(b), f, BufferGetPageSize(b));
+	GinInitPage(BufferGetPage(b, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				f, BufferGetPageSize(b));
 }
 
 void
 GinInitMetabuffer(Buffer b)
 {
 	GinMetaPageData *metadata;
-	Page		page = BufferGetPage(b);
+	Page		page = BufferGetPage(b, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitPage(page, GIN_META, BufferGetPageSize(b));
 
@@ -605,7 +607,7 @@ ginGetStats(Relation index, GinStatsData *stats)
 
 	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
 	LockBuffer(metabuffer, GIN_SHARE);
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metadata = GinPageGetMeta(metapage);
 
 	stats->nPendingPages = metadata->nPendingPages;
@@ -632,7 +634,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
 
 	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
 	LockBuffer(metabuffer, GIN_EXCLUSIVE);
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metadata = GinPageGetMeta(metapage);
 
 	START_CRIT_SECTION();
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 6a4b98a..f26dc79 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -87,7 +87,8 @@ ginVacuumItemPointers(GinVacuumState *gvs, ItemPointerData *items,
 static void
 xlogVacuumPage(Relation index, Buffer buffer)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	XLogRecPtr	recptr;
 
 	/* This is only used for entry tree leaf pages. */
@@ -118,7 +119,7 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
 
 	buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
 								RBM_NORMAL, gvs->strategy);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * We should be sure that we don't concurrent with inserts, insert process
@@ -212,14 +213,14 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 	START_CRIT_SECTION();
 
 	/* Unlink the page by changing left sibling's rightlink */
-	page = BufferGetPage(dBuffer);
+	page = BufferGetPage(dBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	rightlink = GinPageGetOpaque(page)->rightlink;
 
-	page = BufferGetPage(lBuffer);
+	page = BufferGetPage(lBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	GinPageGetOpaque(page)->rightlink = rightlink;
 
 	/* Delete downlink from parent */
-	parentPage = BufferGetPage(pBuffer);
+	parentPage = BufferGetPage(pBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 #ifdef USE_ASSERT_CHECKING
 	do
 	{
@@ -230,7 +231,7 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 #endif
 	GinPageDeletePostingItem(parentPage, myoff);
 
-	page = BufferGetPage(dBuffer);
+	page = BufferGetPage(dBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * we shouldn't change rightlink field to save workability of running
@@ -268,7 +269,8 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
 		recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_DELETE_PAGE);
 		PageSetLSN(page, recptr);
 		PageSetLSN(parentPage, recptr);
-		PageSetLSN(BufferGetPage(lBuffer), recptr);
+		PageSetLSN(BufferGetPage(lBuffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST), recptr);
 	}
 
 	if (!isParentRoot)
@@ -324,7 +326,7 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
 
 	buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
 								RBM_NORMAL, gvs->strategy);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	Assert(GinPageIsData(page));
 
@@ -407,7 +409,8 @@ ginVacuumPostingTree(GinVacuumState *gvs, BlockNumber rootBlkno)
 static Page
 ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint32 *nroot)
 {
-	Page		origpage = BufferGetPage(buffer),
+	Page		origpage = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST),
 				tmppage;
 	OffsetNumber i,
 				maxoff = PageGetMaxOffsetNumber(origpage);
@@ -554,7 +557,8 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	/* find leaf page */
 	for (;;)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		IndexTuple	itup;
 
 		LockBuffer(buffer, GIN_SHARE);
@@ -589,7 +593,8 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 
 	for (;;)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		Page		resPage;
 		uint32		i;
 
@@ -703,7 +708,7 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
 									RBM_NORMAL, info->strategy);
 		LockBuffer(buffer, GIN_SHARE);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageIsNew(page) || GinPageIsDeleted(page))
 		{
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index b4d310f..8bfa7ec 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -28,7 +28,7 @@ ginRedoClearIncompleteSplit(XLogReaderState *record, uint8 block_id)
 
 	if (XLogReadBufferForRedo(record, block_id, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		GinPageGetOpaque(page)->flags &= ~GIN_INCOMPLETE_SPLIT;
 
 		PageSetLSN(page, lsn);
@@ -48,7 +48,7 @@ ginRedoCreateIndex(XLogReaderState *record)
 
 	MetaBuffer = XLogInitBufferForRedo(record, 0);
 	Assert(BufferGetBlockNumber(MetaBuffer) == GIN_METAPAGE_BLKNO);
-	page = (Page) BufferGetPage(MetaBuffer);
+	page = BufferGetPage(MetaBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitMetabuffer(MetaBuffer);
 
@@ -57,7 +57,7 @@ ginRedoCreateIndex(XLogReaderState *record)
 
 	RootBuffer = XLogInitBufferForRedo(record, 1);
 	Assert(BufferGetBlockNumber(RootBuffer) == GIN_ROOT_BLKNO);
-	page = (Page) BufferGetPage(RootBuffer);
+	page = BufferGetPage(RootBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitBuffer(RootBuffer, GIN_LEAF);
 
@@ -78,7 +78,7 @@ ginRedoCreatePTree(XLogReaderState *record)
 	Page		page;
 
 	buffer = XLogInitBufferForRedo(record, 0);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitBuffer(buffer, GIN_DATA | GIN_LEAF | GIN_COMPRESSED);
 
@@ -98,7 +98,7 @@ ginRedoCreatePTree(XLogReaderState *record)
 static void
 ginRedoInsertEntry(Buffer buffer, bool isLeaf, BlockNumber rightblkno, void *rdata)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	ginxlogInsertEntry *data = (ginxlogInsertEntry *) rdata;
 	OffsetNumber offset = data->offset;
 	IndexTuple	itup;
@@ -293,7 +293,7 @@ ginRedoRecompress(Page page, ginxlogRecompressDataLeaf *data)
 static void
 ginRedoInsertData(Buffer buffer, bool isLeaf, BlockNumber rightblkno, void *rdata)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (isLeaf)
 	{
@@ -350,7 +350,7 @@ ginRedoInsert(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		Size		len;
 		char	   *payload = XLogRecGetBlockData(record, 0, &len);
 
@@ -431,7 +431,7 @@ ginRedoVacuumDataLeafPage(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		Size		len;
 		ginxlogVacuumDataLeafPage *xlrec;
 
@@ -460,7 +460,7 @@ ginRedoDeletePage(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &dbuffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(dbuffer);
+		page = BufferGetPage(dbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		Assert(GinPageIsData(page));
 		GinPageGetOpaque(page)->flags = GIN_DELETED;
 		PageSetLSN(page, lsn);
@@ -469,7 +469,7 @@ ginRedoDeletePage(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 1, &pbuffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(pbuffer);
+		page = BufferGetPage(pbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		Assert(GinPageIsData(page));
 		Assert(!GinPageIsLeaf(page));
 		GinPageDeletePostingItem(page, data->parentOffset);
@@ -479,7 +479,7 @@ ginRedoDeletePage(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 2, &lbuffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(lbuffer);
+		page = BufferGetPage(lbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		Assert(GinPageIsData(page));
 		GinPageGetOpaque(page)->rightlink = data->rightLink;
 		PageSetLSN(page, lsn);
@@ -510,7 +510,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
 	 */
 	metabuffer = XLogInitBufferForRedo(record, 0);
 	Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
 	memcpy(GinPageGetMeta(metapage), &data->metadata, sizeof(GinMetaPageData));
@@ -524,7 +524,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
 		 */
 		if (XLogReadBufferForRedo(record, 1, &buffer) == BLK_NEEDS_REDO)
 		{
-			Page		page = BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			OffsetNumber off;
 			int			i;
 			Size		tupsize;
@@ -572,7 +572,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
 		 */
 		if (XLogReadBufferForRedo(record, 1, &buffer) == BLK_NEEDS_REDO)
 		{
-			Page		page = BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			GinPageGetOpaque(page)->rightlink = data->newRightlink;
 
@@ -603,7 +603,7 @@ ginRedoInsertListPage(XLogReaderState *record)
 
 	/* We always re-initialize the page. */
 	buffer = XLogInitBufferForRedo(record, 0);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitBuffer(buffer, GIN_LIST);
 	GinPageGetOpaque(page)->rightlink = data->rightlink;
@@ -652,7 +652,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
 
 	metabuffer = XLogInitBufferForRedo(record, 0);
 	Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
-	metapage = BufferGetPage(metabuffer);
+	metapage = BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
 
@@ -681,7 +681,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
 		Page		page;
 
 		buffer = XLogInitBufferForRedo(record, i + 1);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		GinInitBuffer(buffer, GIN_DELETED);
 
 		PageSetLSN(page, lsn);
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 996363c..999e71c 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -211,7 +211,8 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 				bool markfollowright)
 {
 	BlockNumber blkno = BufferGetBlockNumber(buffer);
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	bool		is_leaf = (GistPageIsLeaf(page)) ? true : false;
 	XLogRecPtr	recptr;
 	int			i;
@@ -316,7 +317,9 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 
 			dist->buffer = buffer;
 			dist->block.blkno = BufferGetBlockNumber(buffer);
-			dist->page = PageGetTempPageCopySpecial(BufferGetPage(buffer));
+			dist->page =
+				PageGetTempPageCopySpecial(BufferGetPage(buffer, NULL, NULL,
+														 BGP_NO_SNAPSHOT_TEST));
 
 			/* clean all flags except F_LEAF */
 			GistPageGetOpaque(dist->page)->flags = (is_leaf) ? F_LEAF : 0;
@@ -328,7 +331,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 			/* Allocate new page */
 			ptr->buffer = gistNewBuffer(rel);
 			GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0);
-			ptr->page = BufferGetPage(ptr->buffer);
+			ptr->page = BufferGetPage(ptr->buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			ptr->block.blkno = BufferGetBlockNumber(ptr->buffer);
 		}
 
@@ -354,7 +357,10 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 			int			i;
 
 			rootpg.buffer = buffer;
-			rootpg.page = PageGetTempPageCopySpecial(BufferGetPage(rootpg.buffer));
+			rootpg.page =
+				PageGetTempPageCopySpecial(BufferGetPage(rootpg.buffer,
+														 NULL, NULL,
+														 BGP_NO_SNAPSHOT_TEST));
 			GistPageGetOpaque(rootpg.page)->flags = 0;
 
 			/* Prepare a vector of all the downlinks */
@@ -462,8 +468,11 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 		 * The first page in the chain was a temporary working copy meant to
 		 * replace the old page. Copy it over the old page.
 		 */
-		PageRestoreTempPage(dist->page, BufferGetPage(dist->buffer));
-		dist->page = BufferGetPage(dist->buffer);
+		PageRestoreTempPage(dist->page, BufferGetPage(dist->buffer,
+													  NULL, NULL,
+													  BGP_NO_SNAPSHOT_TEST));
+		dist->page = BufferGetPage(dist->buffer, NULL, NULL,
+								   BGP_NO_SNAPSHOT_TEST);
 
 		/* Write the WAL record */
 		if (RelationNeedsWAL(rel))
@@ -554,7 +563,8 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 	 */
 	if (BufferIsValid(leftchildbuf))
 	{
-		Page		leftpg = BufferGetPage(leftchildbuf);
+		Page		leftpg = BufferGetPage(leftchildbuf, NULL, NULL,
+										   BGP_NO_SNAPSHOT_TEST);
 
 		GistPageSetNSN(leftpg, recptr);
 		GistClearFollowRight(leftpg);
@@ -614,7 +624,8 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTSTATE *giststate)
 			gistcheckpage(state.r, stack->buffer);
 		}
 
-		stack->page = (Page) BufferGetPage(stack->buffer);
+		stack->page = BufferGetPage(stack->buffer, NULL, NULL,
+									BGP_NO_SNAPSHOT_TEST);
 		stack->lsn = PageGetLSN(stack->page);
 		Assert(!RelationNeedsWAL(state.r) || !XLogRecPtrIsInvalid(stack->lsn));
 
@@ -699,7 +710,8 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTSTATE *giststate)
 					LockBuffer(stack->buffer, GIST_UNLOCK);
 					LockBuffer(stack->buffer, GIST_EXCLUSIVE);
 					xlocked = true;
-					stack->page = (Page) BufferGetPage(stack->buffer);
+					stack->page = BufferGetPage(stack->buffer, NULL, NULL,
+												BGP_NO_SNAPSHOT_TEST);
 
 					if (PageGetLSN(stack->page) != stack->lsn)
 					{
@@ -763,7 +775,8 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTSTATE *giststate)
 				LockBuffer(stack->buffer, GIST_UNLOCK);
 				LockBuffer(stack->buffer, GIST_EXCLUSIVE);
 				xlocked = true;
-				stack->page = (Page) BufferGetPage(stack->buffer);
+				stack->page = BufferGetPage(stack->buffer, NULL, NULL,
+											BGP_NO_SNAPSHOT_TEST);
 				stack->lsn = PageGetLSN(stack->page);
 
 				if (stack->blkno == GIST_ROOT_BLKNO)
@@ -853,7 +866,7 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum)
 		buffer = ReadBuffer(r, top->blkno);
 		LockBuffer(buffer, GIST_SHARE);
 		gistcheckpage(r, buffer);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (GistPageIsLeaf(page))
 		{
@@ -941,7 +954,8 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child)
 	GISTInsertStack *parent = child->parent;
 
 	gistcheckpage(r, parent->buffer);
-	parent->page = (Page) BufferGetPage(parent->buffer);
+	parent->page = BufferGetPage(parent->buffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST);
 
 	/* here we don't need to distinguish between split and page update */
 	if (child->downlinkoffnum == InvalidOffsetNumber ||
@@ -982,7 +996,8 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child)
 			parent->buffer = ReadBuffer(r, parent->blkno);
 			LockBuffer(parent->buffer, GIST_EXCLUSIVE);
 			gistcheckpage(r, parent->buffer);
-			parent->page = (Page) BufferGetPage(parent->buffer);
+			parent->page = BufferGetPage(parent->buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		}
 
 		/*
@@ -1006,7 +1021,8 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child)
 		while (ptr)
 		{
 			ptr->buffer = ReadBuffer(r, ptr->blkno);
-			ptr->page = (Page) BufferGetPage(ptr->buffer);
+			ptr->page = BufferGetPage(ptr->buffer, NULL, NULL,
+									  BGP_NO_SNAPSHOT_TEST);
 			ptr = ptr->parent;
 		}
 
@@ -1028,7 +1044,7 @@ static IndexTuple
 gistformdownlink(Relation rel, Buffer buf, GISTSTATE *giststate,
 				 GISTInsertStack *stack)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber maxoff;
 	OffsetNumber offset;
 	IndexTuple	downlink = NULL;
@@ -1109,7 +1125,7 @@ gistfixsplit(GISTInsertState *state, GISTSTATE *giststate)
 		GISTPageSplitInfo *si = palloc(sizeof(GISTPageSplitInfo));
 		IndexTuple	downlink;
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/* Form the new downlink tuples to insert to parent */
 		downlink = gistformdownlink(state->r, buf, giststate, stack);
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 4e43a69..8e7389c 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -169,7 +169,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	/* initialize the root page */
 	buffer = gistNewBuffer(index);
 	Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	START_CRIT_SECTION();
 
@@ -589,7 +589,7 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
 		buffer = ReadBuffer(indexrel, blkno);
 		LockBuffer(buffer, GIST_EXCLUSIVE);
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		childoffnum = gistchoose(indexrel, page, itup, giststate);
 		iid = PageGetItemId(page, childoffnum);
 		idxtuple = (IndexTuple) PageGetItem(page, iid);
@@ -699,7 +699,8 @@ gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, int level,
 	 */
 	if (is_split && BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		OffsetNumber off;
 		OffsetNumber maxoff;
 
@@ -866,7 +867,7 @@ gistBufferingFindCorrectParent(GISTBuildState *buildstate,
 	}
 
 	buffer = ReadBuffer(buildstate->indexrel, parent);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	LockBuffer(buffer, GIST_EXCLUSIVE);
 	gistcheckpage(buildstate->indexrel, buffer);
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -1067,7 +1068,7 @@ gistGetMaxLevel(Relation index)
 		 * pro forma.
 		 */
 		LockBuffer(buffer, GIST_SHARE);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (GistPageIsLeaf(page))
 		{
@@ -1167,7 +1168,8 @@ gistMemorizeAllDownlinks(GISTBuildState *buildstate, Buffer parentbuf)
 	OffsetNumber maxoff;
 	OffsetNumber off;
 	BlockNumber parentblkno = BufferGetBlockNumber(parentbuf);
-	Page		page = BufferGetPage(parentbuf);
+	Page		page = BufferGetPage(parentbuf, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 
 	Assert(!GistPageIsLeaf(page));
 
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 8138383..13a0399 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -54,7 +54,7 @@ gistkillitems(IndexScanDesc scan)
 
 	LockBuffer(buffer, GIST_SHARE);
 	gistcheckpage(scan->indexRelation, buffer);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * If page LSN differs it means that the page was modified since the last read.
@@ -336,7 +336,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	buffer = ReadBuffer(scan->indexRelation, pageItem->blkno);
 	LockBuffer(buffer, GIST_SHARE);
 	gistcheckpage(scan->indexRelation, buffer);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = GistPageGetOpaque(page);
 
 	/*
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index fac166d..5d16cf5 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -701,7 +701,7 @@ GISTInitBuffer(Buffer b, uint32 f)
 	Size		pageSize;
 
 	pageSize = BufferGetPageSize(b);
-	page = BufferGetPage(b);
+	page = BufferGetPage(b, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	PageInit(page, pageSize, sizeof(GISTPageOpaqueData));
 
 	opaque = GistPageGetOpaque(page);
@@ -718,7 +718,7 @@ GISTInitBuffer(Buffer b, uint32 f)
 void
 gistcheckpage(Relation rel, Buffer buf)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * ReadBuffer verifies that every newly-read page passes
@@ -776,7 +776,7 @@ gistNewBuffer(Relation r)
 		 */
 		if (ConditionalLockBuffer(buffer))
 		{
-			Page		page = BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			if (PageIsNew(page))
 				return buffer;	/* OK to use, if never initialized */
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 7947ff9..9d9f5dc 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -75,7 +75,7 @@ gistvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 		buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 									info->strategy);
 		LockBuffer(buffer, GIST_SHARE);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageIsNew(page) || GistPageIsDeleted(page))
 		{
@@ -166,7 +166,7 @@ gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 									RBM_NORMAL, info->strategy);
 		LockBuffer(buffer, GIST_SHARE);
 		gistcheckpage(rel, buffer);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (GistPageIsLeaf(page))
 		{
@@ -176,7 +176,7 @@ gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 			LockBuffer(buffer, GIST_UNLOCK);
 			LockBuffer(buffer, GIST_EXCLUSIVE);
 
-			page = (Page) BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			if (stack->blkno == GIST_ROOT_BLKNO && !GistPageIsLeaf(page))
 			{
 				/* only the root can become non-leaf during relock */
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index b48e97c..8ef6e98 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -46,7 +46,7 @@ gistRedoClearFollowRight(XLogReaderState *record, uint8 block_id)
 	action = XLogReadBufferForRedo(record, block_id, &buffer);
 	if (action == BLK_NEEDS_REDO || action == BLK_RESTORED)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		GistPageSetNSN(page, lsn);
 		GistClearFollowRight(page);
@@ -78,7 +78,7 @@ gistRedoPageUpdateRecord(XLogReaderState *record)
 
 		data = begin = XLogRecGetBlockData(record, 0, &datalen);
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/* Delete old tuples */
 		if (xldata->ntodelete > 0)
@@ -199,7 +199,7 @@ gistRedoPageSplitRecord(XLogReaderState *record)
 		}
 
 		buffer = XLogInitBufferForRedo(record, i + 1);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		data = XLogRecGetBlockData(record, i + 1, &datalen);
 
 		tuples = decodePageSplitRecord(data, datalen, &num);
@@ -265,7 +265,7 @@ gistRedoCreateIndex(XLogReaderState *record)
 
 	buffer = XLogInitBufferForRedo(record, 0);
 	Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	GISTInitBuffer(buffer, F_LEAF);
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 3d48c4f..a5032e1 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -278,7 +278,7 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 
 		buf = so->hashso_curbuf;
 		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		maxoffnum = PageGetMaxOffsetNumber(page);
 		for (offnum = ItemPointerGetOffsetNumber(current);
 			 offnum <= maxoffnum;
@@ -327,7 +327,8 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 		while (res)
 		{
 			offnum = ItemPointerGetOffsetNumber(current);
-			page = BufferGetPage(so->hashso_curbuf);
+			page = BufferGetPage(so->hashso_curbuf, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST);
 			if (!ItemIdIsDead(PageGetItemId(page, offnum)))
 				break;
 			res = _hash_next(scan, dir);
@@ -370,7 +371,8 @@ hashgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 			OffsetNumber offnum;
 
 			offnum = ItemPointerGetOffsetNumber(&(so->hashso_curpos));
-			page = BufferGetPage(so->hashso_curbuf);
+			page = BufferGetPage(so->hashso_curbuf, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST);
 			add_tuple = !ItemIdIsDead(PageGetItemId(page, offnum));
 		}
 		else
@@ -515,7 +517,8 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	 * each bucket.
 	 */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 	orig_maxbucket = metap->hashm_maxbucket;
 	orig_ntuples = metap->hashm_ntuples;
 	memcpy(&local_metapage, metap, sizeof(local_metapage));
@@ -559,7 +562,7 @@ loop_top:
 			buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
 										   LH_BUCKET_PAGE | LH_OVERFLOW_PAGE,
 											 info->strategy);
-			page = BufferGetPage(buf);
+			page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 			Assert(opaque->hasho_bucket == cur_bucket);
 
@@ -614,7 +617,8 @@ loop_top:
 
 	/* Write-lock metapage and check for split since we started */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_WRITE, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 
 	if (cur_maxbucket != metap->hashm_maxbucket)
 	{
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index acd2e64..92152e3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -53,7 +53,8 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 
 	/* Read the metapage */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -111,7 +112,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 
 	/* Fetch the primary bucket page for the bucket */
 	buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(pageopaque->hasho_bucket == bucket);
 
@@ -131,7 +132,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 			 */
 			_hash_relbuf(rel, buf);
 			buf = _hash_getbuf(rel, nextblkno, HASH_WRITE, LH_OVERFLOW_PAGE);
-			page = BufferGetPage(buf);
+			page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		}
 		else
 		{
@@ -145,7 +146,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 
 			/* chain to a new overflow page */
 			buf = _hash_addovflpage(rel, metabuf, buf);
-			page = BufferGetPage(buf);
+			page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			/* should fit now, given test above */
 			Assert(PageGetFreeSpace(page) >= itemsz);
@@ -206,7 +207,7 @@ _hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup)
 	uint32		hashkey;
 
 	_hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/* Find where to insert the tuple (preserving page's hashkey ordering) */
 	hashkey = _hash_get_indextuple_hashkey(itup);
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index db3e268..3a8916a 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -123,7 +123,7 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf)
 	{
 		BlockNumber nextblkno;
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
 		nextblkno = pageopaque->hasho_nextblkno;
 
@@ -137,7 +137,7 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf)
 	}
 
 	/* now that we have correct backlink, initialize new overflow page */
-	ovflpage = BufferGetPage(ovflbuf);
+	ovflpage = BufferGetPage(ovflbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
 	ovflopaque->hasho_prevblkno = BufferGetBlockNumber(buf);
 	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
@@ -186,7 +186,8 @@ _hash_getovflpage(Relation rel, Buffer metabuf)
 	_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
 
 	_hash_checkpage(rel, metabuf, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 
 	/* start search at hashm_firstfree */
 	orig_firstfree = metap->hashm_firstfree;
@@ -224,7 +225,7 @@ _hash_getovflpage(Relation rel, Buffer metabuf)
 		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 
 		mapbuf = _hash_getbuf(rel, mapblkno, HASH_WRITE, LH_BITMAP_PAGE);
-		mappage = BufferGetPage(mapbuf);
+		mappage = BufferGetPage(mapbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		freep = HashPageGetBitmap(mappage);
 
 		for (; bit <= last_inpage; j++, bit += BITS_PER_MAP)
@@ -396,7 +397,7 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 	/* Get information from the doomed page */
 	_hash_checkpage(rel, ovflbuf, LH_OVERFLOW_PAGE);
 	ovflblkno = BufferGetBlockNumber(ovflbuf);
-	ovflpage = BufferGetPage(ovflbuf);
+	ovflpage = BufferGetPage(ovflbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
 	nextblkno = ovflopaque->hasho_nextblkno;
 	prevblkno = ovflopaque->hasho_prevblkno;
@@ -423,7 +424,7 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 														 HASH_WRITE,
 										   LH_BUCKET_PAGE | LH_OVERFLOW_PAGE,
 														 bstrategy);
-		Page		prevpage = BufferGetPage(prevbuf);
+		Page		prevpage = BufferGetPage(prevbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		HashPageOpaque prevopaque = (HashPageOpaque) PageGetSpecialPointer(prevpage);
 
 		Assert(prevopaque->hasho_bucket == bucket);
@@ -437,7 +438,7 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 														 HASH_WRITE,
 														 LH_OVERFLOW_PAGE,
 														 bstrategy);
-		Page		nextpage = BufferGetPage(nextbuf);
+		Page		nextpage = BufferGetPage(nextbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		HashPageOpaque nextopaque = (HashPageOpaque) PageGetSpecialPointer(nextpage);
 
 		Assert(nextopaque->hasho_bucket == bucket);
@@ -449,7 +450,8 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 
 	/* Read the metapage so we can determine which bitmap page to use */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 
 	/* Identify which bit to set */
 	ovflbitno = blkno_to_bitno(metap, ovflblkno);
@@ -466,7 +468,7 @@ _hash_freeovflpage(Relation rel, Buffer ovflbuf,
 
 	/* Clear the bitmap bit to indicate that this overflow page is free */
 	mapbuf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BITMAP_PAGE);
-	mappage = BufferGetPage(mapbuf);
+	mappage = BufferGetPage(mapbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	freep = HashPageGetBitmap(mappage);
 	Assert(ISSET(freep, bitmapbit));
 	CLRBIT(freep, bitmapbit);
@@ -521,7 +523,7 @@ _hash_initbitmap(Relation rel, HashMetaPage metap, BlockNumber blkno,
 	 * that it's not worth worrying about.
 	 */
 	buf = _hash_getnewbuf(rel, blkno, forkNum);
-	pg = BufferGetPage(buf);
+	pg = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/* initialize the page's special space */
 	op = (HashPageOpaque) PageGetSpecialPointer(pg);
@@ -601,7 +603,7 @@ _hash_squeezebucket(Relation rel,
 									  HASH_WRITE,
 									  LH_BUCKET_PAGE,
 									  bstrategy);
-	wpage = BufferGetPage(wbuf);
+	wpage = BufferGetPage(wbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	wopaque = (HashPageOpaque) PageGetSpecialPointer(wpage);
 
 	/*
@@ -631,7 +633,7 @@ _hash_squeezebucket(Relation rel,
 										  HASH_WRITE,
 										  LH_OVERFLOW_PAGE,
 										  bstrategy);
-		rpage = BufferGetPage(rbuf);
+		rpage = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		ropaque = (HashPageOpaque) PageGetSpecialPointer(rpage);
 		Assert(ropaque->hasho_bucket == bucket);
 	} while (BlockNumberIsValid(ropaque->hasho_nextblkno));
@@ -696,7 +698,7 @@ _hash_squeezebucket(Relation rel,
 												  HASH_WRITE,
 												  LH_OVERFLOW_PAGE,
 												  bstrategy);
-				wpage = BufferGetPage(wbuf);
+				wpage = BufferGetPage(wbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				wopaque = (HashPageOpaque) PageGetSpecialPointer(wpage);
 				Assert(wopaque->hasho_bucket == bucket);
 				wbuf_dirty = false;
@@ -752,7 +754,7 @@ _hash_squeezebucket(Relation rel,
 										  HASH_WRITE,
 										  LH_OVERFLOW_PAGE,
 										  bstrategy);
-		rpage = BufferGetPage(rbuf);
+		rpage = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		ropaque = (HashPageOpaque) PageGetSpecialPointer(rpage);
 		Assert(ropaque->hasho_bucket == bucket);
 	}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 178463f..2e2588b 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -161,7 +161,8 @@ _hash_getinitbuf(Relation rel, BlockNumber blkno)
 	/* ref count and lock type are correct */
 
 	/* initialize the page */
-	_hash_pageinit(BufferGetPage(buf), BufferGetPageSize(buf));
+	_hash_pageinit(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				   BufferGetPageSize(buf));
 
 	return buf;
 }
@@ -210,7 +211,8 @@ _hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
 	/* ref count and lock type are correct */
 
 	/* initialize the page */
-	_hash_pageinit(BufferGetPage(buf), BufferGetPageSize(buf));
+	_hash_pageinit(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				   BufferGetPageSize(buf));
 
 	return buf;
 }
@@ -384,7 +386,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 	 * the physical index length.
 	 */
 	metabuf = _hash_getnewbuf(rel, HASH_METAPAGE, forkNum);
-	pg = BufferGetPage(metabuf);
+	pg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
 	pageopaque->hasho_prevblkno = InvalidBlockNumber;
@@ -452,7 +454,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
-		pg = BufferGetPage(buf);
+		pg = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
 		pageopaque->hasho_prevblkno = InvalidBlockNumber;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
@@ -517,7 +519,8 @@ _hash_expandtable(Relation rel, Buffer metabuf)
 	_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
 
 	_hash_checkpage(rel, metabuf, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 
 	/*
 	 * Check to see if split is still needed; someone else might have already
@@ -774,10 +777,10 @@ _hash_splitbucket(Relation rel,
 	 * either bucket.
 	 */
 	obuf = _hash_getbuf(rel, start_oblkno, HASH_WRITE, LH_BUCKET_PAGE);
-	opage = BufferGetPage(obuf);
+	opage = BufferGetPage(obuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
 
-	npage = BufferGetPage(nbuf);
+	npage = BufferGetPage(nbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/* initialize the new bucket's primary page */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
@@ -841,7 +844,7 @@ _hash_splitbucket(Relation rel,
 					_hash_chgbufaccess(rel, nbuf, HASH_WRITE, HASH_NOLOCK);
 					/* chain to a new overflow page */
 					nbuf = _hash_addovflpage(rel, metabuf, nbuf);
-					npage = BufferGetPage(nbuf);
+					npage = BufferGetPage(nbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 					/* we don't need nopaque within the loop */
 				}
 
@@ -888,7 +891,7 @@ _hash_splitbucket(Relation rel,
 
 		/* Else, advance to next old page */
 		obuf = _hash_getbuf(rel, oblkno, HASH_WRITE, LH_OVERFLOW_PAGE);
-		opage = BufferGetPage(obuf);
+		opage = BufferGetPage(obuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
 	}
 
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 6025a3f..e63f6d3 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -55,7 +55,7 @@ _hash_next(IndexScanDesc scan, ScanDirection dir)
 	current = &(so->hashso_curpos);
 	offnum = ItemPointerGetOffsetNumber(current);
 	_hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
 	so->hashso_heappos = itup->t_tid;
 
@@ -79,7 +79,7 @@ _hash_readnext(Relation rel,
 	if (BlockNumberIsValid(blkno))
 	{
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE);
-		*pagep = BufferGetPage(*bufp);
+		*pagep = BufferGetPage(*bufp, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		*opaquep = (HashPageOpaque) PageGetSpecialPointer(*pagep);
 	}
 }
@@ -102,7 +102,7 @@ _hash_readprev(Relation rel,
 	{
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
-		*pagep = BufferGetPage(*bufp);
+		*pagep = BufferGetPage(*bufp, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		*opaquep = (HashPageOpaque) PageGetSpecialPointer(*pagep);
 	}
 }
@@ -188,7 +188,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	/* Read the metapage */
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metap = HashPageGetMeta(BufferGetPage(metabuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -240,7 +241,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	/* Fetch the primary bucket page for the bucket */
 	buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
 
@@ -258,7 +259,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	/* if we're here, _hash_step found a valid tuple */
 	offnum = ItemPointerGetOffsetNumber(current);
 	_hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
 	so->hashso_heappos = itup->t_tid;
 
@@ -294,7 +295,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 	buf = *bufP;
 	_hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 456954b..5dbc2a4 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -155,7 +155,7 @@ _hash_log2(uint32 num)
 void
 _hash_checkpage(Relation rel, Buffer buf, int flags)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * ReadBuffer verifies that every newly-read page passes
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34ba385..66b2354 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -394,7 +394,7 @@ heapgetpage(HeapScanDesc scan, BlockNumber page)
 	 */
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
-	dp = (Page) BufferGetPage(buffer);
+	dp = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	lines = PageGetMaxOffsetNumber(dp);
 	ntup = 0;
 
@@ -537,7 +537,7 @@ heapgettup(HeapScanDesc scan,
 
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lines = PageGetMaxOffsetNumber(dp);
 		/* page and lineoff now reference the physically next tid */
 
@@ -582,7 +582,7 @@ heapgettup(HeapScanDesc scan,
 
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lines = PageGetMaxOffsetNumber(dp);
 
 		if (!scan->rs_inited)
@@ -616,7 +616,7 @@ heapgettup(HeapScanDesc scan,
 			heapgetpage(scan, page);
 
 		/* Since the tuple was previously fetched, needn't lock page here */
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -745,7 +745,7 @@ heapgettup(HeapScanDesc scan,
 
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lines = PageGetMaxOffsetNumber((Page) dp);
 		linesleft = lines;
 		if (backward)
@@ -832,7 +832,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 			lineindex = scan->rs_cindex + 1;
 		}
 
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lines = scan->rs_ntuples;
 		/* page and lineindex now reference the next visible tid */
 
@@ -875,7 +875,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 			page = scan->rs_cblock;		/* current page */
 		}
 
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lines = scan->rs_ntuples;
 
 		if (!scan->rs_inited)
@@ -908,7 +908,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 			heapgetpage(scan, page);
 
 		/* Since the tuple was previously fetched, needn't lock page here */
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lineoff = ItemPointerGetOffsetNumber(&(tuple->t_self));
 		lpp = PageGetItemId(dp, lineoff);
 		Assert(ItemIdIsNormal(lpp));
@@ -1027,7 +1027,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 
 		heapgetpage(scan, page);
 
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lines = scan->rs_ntuples;
 		linesleft = lines;
 		if (backward)
@@ -1871,7 +1871,7 @@ heap_fetch(Relation relation,
 	 * Need share lock on buffer to examine tuple commit status.
 	 */
 	LockBuffer(buffer, BUFFER_LOCK_SHARE);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * We'd better check for out-of-range offnum in case of VACUUM since the
@@ -1986,7 +1986,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 					   Snapshot snapshot, HeapTuple heapTuple,
 					   bool *all_dead, bool first_call)
 {
-	Page		dp = (Page) BufferGetPage(buffer);
+	Page		dp = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	TransactionId prev_xmax = InvalidTransactionId;
 	OffsetNumber offnum;
 	bool		at_chain_start;
@@ -2200,7 +2200,7 @@ heap_get_latest_tid(Relation relation,
 		 */
 		buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid));
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/*
 		 * Check for bogus item number.  This is not treated as an error
@@ -2418,10 +2418,12 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(BufferGetPage(buffer, NULL, NULL,
+									   BGP_NO_SNAPSHOT_TEST)))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(BufferGetPage(buffer, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer);
@@ -2446,7 +2448,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2705,7 +2708,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
@@ -3019,7 +3022,7 @@ heap_delete(Relation relation, ItemPointer tid,
 
 	block = ItemPointerGetBlockNumber(tid);
 	buffer = ReadBuffer(relation, block);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * Before locking the buffer, pin the visibility map page if it appears to
@@ -3509,7 +3512,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 
 	block = ItemPointerGetBlockNumber(otid);
 	buffer = ReadBuffer(relation, block);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * Before locking the buffer, pin the visibility map page if it appears to
@@ -4110,17 +4113,22 @@ l2:
 	oldtup.t_data->t_ctid = heaptup->t_self;
 
 	/* clear PD_ALL_VISIBLE flags */
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(BufferGetPage(buffer, NULL, NULL,
+									   BGP_NO_SNAPSHOT_TEST)))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(BufferGetPage(buffer, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
 							vmbuffer);
 	}
-	if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
+	if (newbuf != buffer &&
+		PageIsAllVisible(BufferGetPage(newbuf, NULL, NULL,
+									   BGP_NO_SNAPSHOT_TEST)))
 	{
 		all_visible_cleared_new = true;
-		PageClearAllVisible(BufferGetPage(newbuf));
+		PageClearAllVisible(BufferGetPage(newbuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST));
 		visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
 							vmbuffer_new);
 	}
@@ -4151,9 +4159,12 @@ l2:
 								 all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
-			PageSetLSN(BufferGetPage(newbuf), recptr);
+			PageSetLSN(BufferGetPage(newbuf, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST),
+					   recptr);
 		}
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				   recptr);
 	}
 
 	END_CRIT_SECTION();
@@ -4517,7 +4528,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
 	*buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
 	LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
 
-	page = BufferGetPage(*buffer);
+	page = BufferGetPage(*buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 	Assert(ItemIdIsNormal(lp));
 
@@ -5695,7 +5706,8 @@ l4:
 		{
 			xl_heap_lock_updated xlrec;
 			XLogRecPtr	recptr;
-			Page		page = BufferGetPage(buf);
+			Page		page = BufferGetPage(buf, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 
 			XLogBeginInsert();
 			XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
@@ -5802,7 +5814,7 @@ heap_finish_speculative(Relation relation, HeapTuple tuple)
 
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
 	if (PageGetMaxOffsetNumber(page) >= offnum)
@@ -5896,7 +5908,7 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 
 	block = ItemPointerGetBlockNumber(tid);
 	buffer = ReadBuffer(relation, block);
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
@@ -6043,7 +6055,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
 
 	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
 	if (PageGetMaxOffsetNumber(page) >= offnum)
@@ -7298,7 +7310,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	uint16		prefixlen = 0,
 				suffixlen = 0;
 	XLogRecPtr	recptr;
-	Page		page = BufferGetPage(newbuf);
+	Page		page = BufferGetPage(newbuf, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
 	bool		init;
 	int			bufflags;
@@ -7747,7 +7760,8 @@ heap_xlog_clean(XLogReaderState *record)
 										   &buffer);
 	if (action == BLK_NEEDS_REDO)
 	{
-		Page		page = (Page) BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		OffsetNumber *end;
 		OffsetNumber *redirected;
 		OffsetNumber *nowdead;
@@ -7853,7 +7867,7 @@ heap_xlog_visible(XLogReaderState *record)
 		 * XLOG record's LSN, we mustn't mark the page all-visible, because
 		 * the subsequent update won't be replayed to clear the flag.
 		 */
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		PageSetAllVisible(page);
 
@@ -7879,7 +7893,8 @@ heap_xlog_visible(XLogReaderState *record)
 	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
 									  &vmbuffer) == BLK_NEEDS_REDO)
 	{
-		Page		vmpage = BufferGetPage(vmbuffer);
+		Page		vmpage = BufferGetPage(vmbuffer, NULL, NULL,
+										   BGP_NO_SNAPSHOT_TEST);
 		Relation	reln;
 
 		/* initialize the page if it was read as zeros */
@@ -7946,7 +7961,8 @@ heap_xlog_freeze_page(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		xl_heap_freeze_tuple *tuples;
 
 		tuples = (xl_heap_freeze_tuple *) XLogRecGetBlockData(record, 0, NULL);
@@ -8033,7 +8049,7 @@ heap_xlog_delete(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageGetMaxOffsetNumber(page) >= xlrec->offnum)
 			lp = PageGetItemId(page, xlrec->offnum);
@@ -8116,7 +8132,7 @@ heap_xlog_insert(XLogReaderState *record)
 	if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE)
 	{
 		buffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageInit(page, BufferGetPageSize(buffer), 0);
 		action = BLK_NEEDS_REDO;
 	}
@@ -8127,7 +8143,7 @@ heap_xlog_insert(XLogReaderState *record)
 		Size		datalen;
 		char	   *data;
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageGetMaxOffsetNumber(page) + 1 < xlrec->offnum)
 			elog(PANIC, "invalid max offset number");
@@ -8232,7 +8248,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	if (isinit)
 	{
 		buffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageInit(page, BufferGetPageSize(buffer), 0);
 		action = BLK_NEEDS_REDO;
 	}
@@ -8248,7 +8264,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		tupdata = XLogRecGetBlockData(record, 0, &len);
 		endptr = tupdata + len;
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		for (i = 0; i < xlrec->ntuples; i++)
 		{
@@ -8399,7 +8415,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 									  &obuffer);
 	if (oldaction == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(obuffer);
+		page = BufferGetPage(obuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		offnum = xlrec->old_offnum;
 		if (PageGetMaxOffsetNumber(page) >= offnum)
 			lp = PageGetItemId(page, offnum);
@@ -8446,7 +8462,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	else if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE)
 	{
 		nbuffer = XLogInitBufferForRedo(record, 0);
-		page = (Page) BufferGetPage(nbuffer);
+		page = BufferGetPage(nbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageInit(page, BufferGetPageSize(nbuffer), 0);
 		newaction = BLK_NEEDS_REDO;
 	}
@@ -8479,7 +8495,7 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		recdata = XLogRecGetBlockData(record, 0, &datalen);
 		recdata_end = recdata + datalen;
 
-		page = BufferGetPage(nbuffer);
+		page = BufferGetPage(nbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->new_offnum;
 		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
@@ -8609,7 +8625,7 @@ heap_xlog_confirm(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->offnum;
 		if (PageGetMaxOffsetNumber(page) >= offnum)
@@ -8645,7 +8661,7 @@ heap_xlog_lock(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->offnum;
 		if (PageGetMaxOffsetNumber(page) >= offnum)
@@ -8695,7 +8711,7 @@ heap_xlog_lock_updated(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->offnum;
 		if (PageGetMaxOffsetNumber(page) >= offnum)
@@ -8734,7 +8750,7 @@ heap_xlog_inplace(XLogReaderState *record)
 	{
 		char	   *newtup = XLogRecGetBlockData(record, 0, &newlen);
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		offnum = xlrec->offnum;
 		if (PageGetMaxOffsetNumber(page) >= offnum)
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 8140418..c1d30bb 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -48,7 +48,7 @@ RelationPutHeapTuple(Relation relation,
 	Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
 
 	/* Add the tuple to the page */
-	pageHeader = BufferGetPage(buffer);
+	pageHeader = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
 						 tuple->t_len, InvalidOffsetNumber, false, true);
@@ -132,10 +132,13 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 	while (1)
 	{
 		/* Figure out which pins we need but don't have. */
-		need_to_pin_buffer1 = PageIsAllVisible(BufferGetPage(buffer1))
+		need_to_pin_buffer1 =
+			PageIsAllVisible(BufferGetPage(buffer1, NULL, NULL,
+										   BGP_NO_SNAPSHOT_TEST))
 			&& !visibilitymap_pin_ok(block1, *vmbuffer1);
 		need_to_pin_buffer2 = buffer2 != InvalidBuffer
-			&& PageIsAllVisible(BufferGetPage(buffer2))
+			&& PageIsAllVisible(BufferGetPage(buffer2, NULL, NULL,
+											  BGP_NO_SNAPSHOT_TEST))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
 		if (!need_to_pin_buffer1 && !need_to_pin_buffer2)
 			return;
@@ -327,7 +330,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		{
 			/* easy case */
 			buffer = ReadBufferBI(relation, targetBlock, bistate);
-			if (PageIsAllVisible(BufferGetPage(buffer)))
+			if (PageIsAllVisible(BufferGetPage(buffer, NULL, NULL,
+											   BGP_NO_SNAPSHOT_TEST)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
@@ -335,7 +339,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		{
 			/* also easy case */
 			buffer = otherBuffer;
-			if (PageIsAllVisible(BufferGetPage(buffer)))
+			if (PageIsAllVisible(BufferGetPage(buffer, NULL, NULL,
+											   BGP_NO_SNAPSHOT_TEST)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
@@ -343,7 +348,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		{
 			/* lock other buffer first */
 			buffer = ReadBuffer(relation, targetBlock);
-			if (PageIsAllVisible(BufferGetPage(buffer)))
+			if (PageIsAllVisible(BufferGetPage(buffer, NULL, NULL,
+											   BGP_NO_SNAPSHOT_TEST)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -352,7 +358,8 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		{
 			/* lock target buffer first */
 			buffer = ReadBuffer(relation, targetBlock);
-			if (PageIsAllVisible(BufferGetPage(buffer)))
+			if (PageIsAllVisible(BufferGetPage(buffer, NULL, NULL,
+											   BGP_NO_SNAPSHOT_TEST)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 			LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
@@ -392,7 +399,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		 * Now we can check to see if there's enough free space here. If so,
 		 * we're done.
 		 */
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		pageFreeSpace = PageGetHeapFreeSpace(page);
 		if (len + saveFreeSpace <= pageFreeSpace)
 		{
@@ -477,7 +484,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
 	 * is empty (this should never happen, but if it does we don't want to
 	 * risk wiping out valid data).
 	 */
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (!PageIsNew(page))
 		elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 59beadd..19201b0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -74,7 +74,7 @@ static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
 void
 heap_page_prune_opt(Relation relation, Buffer buffer)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	Size		minfree;
 	TransactionId OldestXmin;
 
@@ -174,7 +174,7 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 				bool report_stats, TransactionId *latestRemovedXid)
 {
 	int			ndeleted = 0;
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber offnum,
 				maxoff;
 	PruneState	prstate;
@@ -261,7 +261,8 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
 									prstate.nowunused, prstate.nunused,
 									prstate.latestRemovedXid);
 
-			PageSetLSN(BufferGetPage(buffer), recptr);
+			PageSetLSN(BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST), recptr);
 		}
 	}
 	else
@@ -347,7 +348,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 				 PruneState *prstate)
 {
 	int			ndeleted = 0;
-	Page		dp = (Page) BufferGetPage(buffer);
+	Page		dp = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	TransactionId priorXmax = InvalidTransactionId;
 	ItemId		rootlp;
 	HeapTupleHeader htup;
@@ -673,7 +674,8 @@ heap_page_prune_execute(Buffer buffer,
 						OffsetNumber *nowdead, int ndead,
 						OffsetNumber *nowunused, int nunused)
 {
-	Page		page = (Page) BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber *offnum;
 	int			i;
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index eaab4be..694d784 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -179,7 +179,8 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer buf)
 		elog(ERROR, "wrong buffer passed to visibilitymap_clear");
 
 	LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
-	map = PageGetContents(BufferGetPage(buf));
+	map = PageGetContents(BufferGetPage(buf, NULL, NULL,
+										BGP_NO_SNAPSHOT_TEST));
 
 	if (map[mapByte] & mask)
 	{
@@ -287,7 +288,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
 		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
 
-	page = BufferGetPage(vmBuf);
+	page = BufferGetPage(vmBuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	map = (uint8 *)PageGetContents(page);
 	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
 
@@ -312,7 +313,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 				 */
 				if (XLogHintBitIsNeeded())
 				{
-					Page		heapPage = BufferGetPage(heapBuf);
+					Page		heapPage = BufferGetPage(heapBuf, NULL, NULL,
+														 BGP_NO_SNAPSHOT_TEST);
 
 					/* caller is expected to set PD_ALL_VISIBLE first */
 					Assert(PageIsAllVisible(heapPage));
@@ -377,7 +379,8 @@ visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *buf)
 			return false;
 	}
 
-	map = PageGetContents(BufferGetPage(*buf));
+	map = PageGetContents(BufferGetPage(*buf, NULL, NULL,
+										BGP_NO_SNAPSHOT_TEST));
 
 	/*
 	 * A single byte read is atomic.  There could be memory-ordering effects
@@ -426,7 +429,8 @@ visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_fro
 		 * immediately stale anyway if anyone is concurrently setting or
 		 * clearing bits, and we only really need an approximate value.
 		 */
-		map = (unsigned char *) PageGetContents(BufferGetPage(mapBuffer));
+		map = (unsigned char *) PageGetContents(BufferGetPage
+			(mapBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST));
 
 		for (i = 0; i < MAPSIZE; i++)
 		{
@@ -493,7 +497,7 @@ visibilitymap_truncate(Relation rel, BlockNumber nheapblocks)
 			return;
 		}
 
-		page = BufferGetPage(mapBuffer);
+		page = BufferGetPage(mapBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		map = PageGetContents(page);
 
 		LockBuffer(mapBuffer, BUFFER_LOCK_EXCLUSIVE);
@@ -587,8 +591,9 @@ vm_readbuf(Relation rel, BlockNumber blkno, bool extend)
 	 */
 	buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,
 							 RBM_ZERO_ON_ERROR, NULL);
-	if (PageIsNew(BufferGetPage(buf)))
-		PageInit(BufferGetPage(buf), BLCKSZ, 0);
+	if (PageIsNew(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST)))
+		PageInit(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				 BLCKSZ, 0);
 	return buf;
 }
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..31449b0 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -255,7 +255,7 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
 
 	InitDirtySnapshot(SnapshotDirty);
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	maxoff = PageGetMaxOffsetNumber(page);
 
@@ -464,7 +464,7 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
 			{
 				nblkno = opaque->btpo_next;
 				nbuf = _bt_relandgetbuf(rel, nbuf, nblkno, BT_READ);
-				page = BufferGetPage(nbuf);
+				page = BufferGetPage(nbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 				if (!P_IGNORE(opaque))
 					break;
@@ -538,7 +538,7 @@ _bt_findinsertloc(Relation rel,
 				  Relation heapRel)
 {
 	Buffer		buf = *bufptr;
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	Size		itemsz;
 	BTPageOpaque lpageop;
 	bool		movedright,
@@ -638,7 +638,7 @@ _bt_findinsertloc(Relation rel,
 		for (;;)
 		{
 			rbuf = _bt_relandgetbuf(rel, rbuf, rblkno, BT_WRITE);
-			page = BufferGetPage(rbuf);
+			page = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
 
 			/*
@@ -734,7 +734,7 @@ _bt_insertonpg(Relation rel,
 	OffsetNumber firstright = InvalidOffsetNumber;
 	Size		itemsz;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	lpageop = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	/* child buffer must be given iff inserting on an internal page */
@@ -816,7 +816,7 @@ _bt_insertonpg(Relation rel,
 			Assert(!P_ISLEAF(lpageop));
 
 			metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
-			metapg = BufferGetPage(metabuf);
+			metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			metad = BTPageGetMeta(metapg);
 
 			if (metad->btm_fastlevel >= lpageop->btpo.level)
@@ -846,7 +846,7 @@ _bt_insertonpg(Relation rel,
 		/* clear INCOMPLETE_SPLIT flag on child if inserting a downlink */
 		if (BufferIsValid(cbuf))
 		{
-			Page		cpage = BufferGetPage(cbuf);
+			Page		cpage = BufferGetPage(cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			BTPageOpaque cpageop = (BTPageOpaque) PageGetSpecialPointer(cpage);
 
 			Assert(P_INCOMPLETE_SPLIT(cpageop));
@@ -914,7 +914,8 @@ _bt_insertonpg(Relation rel,
 			}
 			if (BufferIsValid(cbuf))
 			{
-				PageSetLSN(BufferGetPage(cbuf), recptr);
+				PageSetLSN(BufferGetPage(cbuf, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST), recptr);
 			}
 
 			PageSetLSN(page, recptr);
@@ -987,9 +988,9 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 	 * possibly-confusing junk behind, we are careful to rewrite rightpage as
 	 * zeroes before throwing any error.
 	 */
-	origpage = BufferGetPage(buf);
+	origpage = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	leftpage = PageGetTempPage(origpage);
-	rightpage = BufferGetPage(rbuf);
+	rightpage = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	origpagenumber = BufferGetBlockNumber(buf);
 	rightpagenumber = BufferGetBlockNumber(rbuf);
@@ -1178,7 +1179,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 	if (!P_RIGHTMOST(oopaque))
 	{
 		sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE);
-		spage = BufferGetPage(sbuf);
+		spage = BufferGetPage(sbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		sopaque = (BTPageOpaque) PageGetSpecialPointer(spage);
 		if (sopaque->btpo_prev != origpagenumber)
 		{
@@ -1248,7 +1249,8 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 	 */
 	if (!isleaf)
 	{
-		Page		cpage = BufferGetPage(cbuf);
+		Page		cpage = BufferGetPage(cbuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST);
 		BTPageOpaque cpageop = (BTPageOpaque) PageGetSpecialPointer(cpage);
 
 		cpageop->btpo_flags &= ~BTP_INCOMPLETE_SPLIT;
@@ -1335,7 +1337,8 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 		}
 		if (!isleaf)
 		{
-			PageSetLSN(BufferGetPage(cbuf), recptr);
+			PageSetLSN(BufferGetPage(cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+					   recptr);
 		}
 	}
 
@@ -1658,7 +1661,7 @@ _bt_insert_parent(Relation rel,
 	{
 		BlockNumber bknum = BufferGetBlockNumber(buf);
 		BlockNumber rbknum = BufferGetBlockNumber(rbuf);
-		Page		page = BufferGetPage(buf);
+		Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		IndexTuple	new_item;
 		BTStackData fakestack;
 		IndexTuple	ritem;
@@ -1733,7 +1736,7 @@ _bt_insert_parent(Relation rel,
 void
 _bt_finish_split(Relation rel, Buffer lbuf, BTStack stack)
 {
-	Page		lpage = BufferGetPage(lbuf);
+	Page		lpage = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BTPageOpaque lpageop = (BTPageOpaque) PageGetSpecialPointer(lpage);
 	Buffer		rbuf;
 	Page		rpage;
@@ -1745,7 +1748,7 @@ _bt_finish_split(Relation rel, Buffer lbuf, BTStack stack)
 
 	/* Lock right sibling, the one missing the downlink */
 	rbuf = _bt_getbuf(rel, lpageop->btpo_next, BT_WRITE);
-	rpage = BufferGetPage(rbuf);
+	rpage = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	rpageop = (BTPageOpaque) PageGetSpecialPointer(rpage);
 
 	/* Could this be a root split? */
@@ -1757,7 +1760,7 @@ _bt_finish_split(Relation rel, Buffer lbuf, BTStack stack)
 
 		/* acquire lock on the metapage */
 		metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
-		metapg = BufferGetPage(metabuf);
+		metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		metad = BTPageGetMeta(metapg);
 
 		was_root = (metad->btm_root == BufferGetBlockNumber(lbuf));
@@ -1805,7 +1808,7 @@ _bt_getstackbuf(Relation rel, BTStack stack, int access)
 		BTPageOpaque opaque;
 
 		buf = _bt_getbuf(rel, blkno, access);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		if (access == BT_WRITE && P_INCOMPLETE_SPLIT(opaque))
@@ -1931,17 +1934,17 @@ _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf)
 
 	lbkno = BufferGetBlockNumber(lbuf);
 	rbkno = BufferGetBlockNumber(rbuf);
-	lpage = BufferGetPage(lbuf);
+	lpage = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
 
 	/* get a new root page */
 	rootbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);
-	rootpage = BufferGetPage(rootbuf);
+	rootpage = BufferGetPage(rootbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	rootblknum = BufferGetBlockNumber(rootbuf);
 
 	/* acquire lock on the metapage */
 	metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
-	metapg = BufferGetPage(metabuf);
+	metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metad = BTPageGetMeta(metapg);
 
 	/*
@@ -2165,7 +2168,7 @@ _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel)
 	OffsetNumber offnum,
 				minoff,
 				maxoff;
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	/*
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 67755d7..36b1804 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -130,7 +130,7 @@ _bt_getroot(Relation rel, int access)
 		rootlevel = metad->btm_fastlevel;
 
 		rootbuf = _bt_getbuf(rel, rootblkno, BT_READ);
-		rootpage = BufferGetPage(rootbuf);
+		rootpage = BufferGetPage(rootbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		rootopaque = (BTPageOpaque) PageGetSpecialPointer(rootpage);
 
 		/*
@@ -156,7 +156,7 @@ _bt_getroot(Relation rel, int access)
 	}
 
 	metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
-	metapg = BufferGetPage(metabuf);
+	metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metaopaque = (BTPageOpaque) PageGetSpecialPointer(metapg);
 	metad = BTPageGetMeta(metapg);
 
@@ -213,7 +213,7 @@ _bt_getroot(Relation rel, int access)
 		 */
 		rootbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);
 		rootblkno = BufferGetBlockNumber(rootbuf);
-		rootpage = BufferGetPage(rootbuf);
+		rootpage = BufferGetPage(rootbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		rootopaque = (BTPageOpaque) PageGetSpecialPointer(rootpage);
 		rootopaque->btpo_prev = rootopaque->btpo_next = P_NONE;
 		rootopaque->btpo_flags = (BTP_LEAF | BTP_ROOT);
@@ -295,7 +295,7 @@ _bt_getroot(Relation rel, int access)
 		for (;;)
 		{
 			rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
-			rootpage = BufferGetPage(rootbuf);
+			rootpage = BufferGetPage(rootbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			rootopaque = (BTPageOpaque) PageGetSpecialPointer(rootpage);
 
 			if (!P_IGNORE(rootopaque))
@@ -360,7 +360,7 @@ _bt_gettrueroot(Relation rel)
 	rel->rd_amcache = NULL;
 
 	metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
-	metapg = BufferGetPage(metabuf);
+	metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	metaopaque = (BTPageOpaque) PageGetSpecialPointer(metapg);
 	metad = BTPageGetMeta(metapg);
 
@@ -397,7 +397,7 @@ _bt_gettrueroot(Relation rel)
 	for (;;)
 	{
 		rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
-		rootpage = BufferGetPage(rootbuf);
+		rootpage = BufferGetPage(rootbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		rootopaque = (BTPageOpaque) PageGetSpecialPointer(rootpage);
 
 		if (!P_IGNORE(rootopaque))
@@ -446,7 +446,7 @@ _bt_getrootheight(Relation rel)
 		BTPageOpaque metaopaque;
 
 		metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
-		metapg = BufferGetPage(metabuf);
+		metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		metaopaque = (BTPageOpaque) PageGetSpecialPointer(metapg);
 		metad = BTPageGetMeta(metapg);
 
@@ -501,7 +501,7 @@ _bt_getrootheight(Relation rel)
 void
 _bt_checkpage(Relation rel, Buffer buf)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * ReadBuffer verifies that every newly-read page passes
@@ -616,7 +616,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			buf = ReadBuffer(rel, blkno);
 			if (ConditionalLockBuffer(buf))
 			{
-				page = BufferGetPage(buf);
+				page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				if (_bt_page_recyclable(page))
 				{
 					/*
@@ -674,7 +674,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			UnlockRelationForExtension(rel, ExclusiveLock);
 
 		/* Initialize the new page before returning it */
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		Assert(PageIsNew(page));
 		_bt_pageinit(page, BufferGetPageSize(buf));
 	}
@@ -789,7 +789,7 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
 					OffsetNumber *itemnos, int nitems,
 					BlockNumber lastBlockVacuumed)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BTPageOpaque opaque;
 
 	/* No ereport(ERROR) until changes are logged */
@@ -862,7 +862,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
 					OffsetNumber *itemnos, int nitems,
 					Relation heapRel)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BTPageOpaque opaque;
 
 	/* Shouldn't be called unless there's something to do */
@@ -931,7 +931,7 @@ _bt_is_page_halfdead(Relation rel, BlockNumber blk)
 	bool		result;
 
 	buf = _bt_getbuf(rel, blk, BT_READ);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	result = P_ISHALFDEAD(opaque);
@@ -991,7 +991,7 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
 	parent = stack->bts_blkno;
 	poffset = stack->bts_offset;
 
-	page = BufferGetPage(pbuf);
+	page = BufferGetPage(pbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	maxoff = PageGetMaxOffsetNumber(page);
 
@@ -1035,7 +1035,7 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
 				BTPageOpaque lopaque;
 
 				lbuf = _bt_getbuf(rel, leftsib, BT_READ);
-				lpage = BufferGetPage(lbuf);
+				lpage = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
 
 				/*
@@ -1126,7 +1126,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 
 	for (;;)
 	{
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		/*
@@ -1231,7 +1231,7 @@ _bt_pagedel(Relation rel, Buffer buf)
 					Page		lpage;
 
 					lbuf = _bt_getbuf(rel, leftsib, BT_READ);
-					lpage = BufferGetPage(lbuf);
+					lpage = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 					lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
 
 					/*
@@ -1332,7 +1332,7 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
 	IndexTuple	itup;
 	IndexTupleData trunctuple;
 
-	page = BufferGetPage(leafbuf);
+	page = BufferGetPage(leafbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	Assert(!P_RIGHTMOST(opaque) && !P_ISROOT(opaque) && !P_ISDELETED(opaque) &&
@@ -1385,7 +1385,7 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
 	 * contents.  The test on the next-child downlink is known to sometimes
 	 * fail in the field, though.
 	 */
-	page = BufferGetPage(topparent);
+	page = BufferGetPage(topparent, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 #ifdef USE_ASSERT_CHECKING
@@ -1417,7 +1417,7 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
 	 * to copy the right sibling's downlink over the target downlink, and then
 	 * delete the following item.
 	 */
-	page = BufferGetPage(topparent);
+	page = BufferGetPage(topparent, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	itemid = PageGetItemId(page, topoff);
@@ -1432,7 +1432,7 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
 	 * highest internal page in the branch we're deleting.  We use the tid of
 	 * the high key to store it.
 	 */
-	page = BufferGetPage(leafbuf);
+	page = BufferGetPage(leafbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	opaque->btpo_flags |= BTP_HALF_DEAD;
 
@@ -1469,7 +1469,7 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
 		XLogRegisterBuffer(0, leafbuf, REGBUF_WILL_INIT);
 		XLogRegisterBuffer(1, topparent, REGBUF_STANDARD);
 
-		page = BufferGetPage(leafbuf);
+		page = BufferGetPage(leafbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		xlrec.leftblk = opaque->btpo_prev;
 		xlrec.rightblk = opaque->btpo_next;
@@ -1478,9 +1478,9 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
 
 		recptr = XLogInsert(RM_BTREE_ID, XLOG_BTREE_MARK_PAGE_HALFDEAD);
 
-		page = BufferGetPage(topparent);
+		page = BufferGetPage(topparent, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
-		page = BufferGetPage(leafbuf);
+		page = BufferGetPage(leafbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
 	}
 
@@ -1525,7 +1525,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	ItemPointer leafhikey;
 	BlockNumber nextchild;
 
-	page = BufferGetPage(leafbuf);
+	page = BufferGetPage(leafbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	Assert(P_ISLEAF(opaque) && P_ISHALFDEAD(opaque));
@@ -1551,7 +1551,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 
 		/* fetch the block number of the topmost parent's left sibling */
 		buf = _bt_getbuf(rel, target, BT_READ);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		leftsib = opaque->btpo_prev;
 		targetlevel = opaque->btpo.level;
@@ -1589,7 +1589,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	if (leftsib != P_NONE)
 	{
 		lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
-		page = BufferGetPage(lbuf);
+		page = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		while (P_ISDELETED(opaque) || opaque->btpo_next != target)
 		{
@@ -1603,7 +1603,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 				return false;
 			}
 			lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
-			page = BufferGetPage(lbuf);
+			page = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 	}
@@ -1616,7 +1616,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 * empty page.
 	 */
 	LockBuffer(buf, BT_WRITE);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	/*
@@ -1660,7 +1660,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 */
 	rightsib = opaque->btpo_next;
 	rbuf = _bt_getbuf(rel, rightsib, BT_WRITE);
-	page = BufferGetPage(rbuf);
+	page = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	if (opaque->btpo_prev != target)
 		elog(ERROR, "right sibling's left-link doesn't match: "
@@ -1684,13 +1684,13 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 */
 	if (leftsib == P_NONE && rightsib_is_rightmost)
 	{
-		page = BufferGetPage(rbuf);
+		page = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_RIGHTMOST(opaque))
 		{
 			/* rightsib will be the only one left on the level */
 			metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
-			metapg = BufferGetPage(metabuf);
+			metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			metad = BTPageGetMeta(metapg);
 
 			/*
@@ -1721,12 +1721,12 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 */
 	if (BufferIsValid(lbuf))
 	{
-		page = BufferGetPage(lbuf);
+		page = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		Assert(opaque->btpo_next == target);
 		opaque->btpo_next = rightsib;
 	}
-	page = BufferGetPage(rbuf);
+	page = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->btpo_prev == target);
 	opaque->btpo_prev = leftsib;
@@ -1754,7 +1754,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 	 * will continue to do so, holding back RecentGlobalXmin, for the duration
 	 * of that scan.
 	 */
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	opaque->btpo_flags &= ~BTP_HALF_DEAD;
 	opaque->btpo_flags |= BTP_DELETED;
@@ -1826,18 +1826,18 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
 		{
 			PageSetLSN(metapg, recptr);
 		}
-		page = BufferGetPage(rbuf);
+		page = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		PageSetLSN(page, recptr);
 		if (BufferIsValid(lbuf))
 		{
-			page = BufferGetPage(lbuf);
+			page = BufferGetPage(lbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			PageSetLSN(page, recptr);
 		}
 		if (target != leafblkno)
 		{
-			page = BufferGetPage(leafbuf);
+			page = BufferGetPage(leafbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			PageSetLSN(page, recptr);
 		}
 	}
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index f2905cb..a1b5798 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -913,7 +913,7 @@ restart:
 	buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 							 info->strategy);
 	LockBuffer(buf, BT_READ);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	if (!PageIsNew(page))
 	{
 		_bt_checkpage(rel, buf);
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 14dffe0..761e014 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -127,7 +127,7 @@ _bt_search(Relation rel, int keysz, ScanKey scankey, bool nextkey,
 							  BT_READ);
 
 		/* if this is a leaf page, we're done */
-		page = BufferGetPage(*bufP);
+		page = BufferGetPage(*bufP, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_ISLEAF(opaque))
 			break;
@@ -231,7 +231,7 @@ _bt_moveright(Relation rel,
 
 	for (;;)
 	{
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		if (P_RIGHTMOST(opaque))
@@ -319,7 +319,7 @@ _bt_binsrch(Relation rel,
 	int32		result,
 				cmpval;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	low = P_FIRSTDATAKEY(opaque);
@@ -1141,7 +1141,7 @@ _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
 	 */
 	Assert(BufferIsValid(so->currPos.buf));
 
-	page = BufferGetPage(so->currPos.buf);
+	page = BufferGetPage(so->currPos.buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	minoff = P_FIRSTDATAKEY(opaque);
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -1335,7 +1335,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			/* step right one page */
 			so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
 			/* check for deleted page */
-			page = BufferGetPage(so->currPos.buf);
+			page = BufferGetPage(so->currPos.buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1408,7 +1408,7 @@ _bt_steppage(IndexScanDesc scan, ScanDirection dir)
 			 * it's not half-dead and contains matching tuples. Else loop back
 			 * and do it all again.
 			 */
-			page = BufferGetPage(so->currPos.buf);
+			page = BufferGetPage(so->currPos.buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 			if (!P_IGNORE(opaque))
 			{
@@ -1447,7 +1447,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 	Page		page;
 	BTPageOpaque opaque;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	for (;;)
@@ -1471,7 +1471,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 		/* check for interrupts while we're not holding any buffer lock */
 		CHECK_FOR_INTERRUPTS();
 		buf = _bt_getbuf(rel, blkno, BT_READ);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		/*
@@ -1497,13 +1497,13 @@ _bt_walk_left(Relation rel, Buffer buf)
 				break;
 			blkno = opaque->btpo_next;
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
-			page = BufferGetPage(buf);
+			page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
 		/* Return to the original page to see what's up */
 		buf = _bt_relandgetbuf(rel, buf, obknum, BT_READ);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		if (P_ISDELETED(opaque))
 		{
@@ -1520,7 +1520,7 @@ _bt_walk_left(Relation rel, Buffer buf)
 						 RelationGetRelationName(rel));
 				blkno = opaque->btpo_next;
 				buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
-				page = BufferGetPage(buf);
+				page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 				opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 				if (!P_ISDELETED(opaque))
 					break;
@@ -1579,7 +1579,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 	if (!BufferIsValid(buf))
 		return InvalidBuffer;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 
 	for (;;)
@@ -1598,7 +1598,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 				elog(ERROR, "fell off the end of index \"%s\"",
 					 RelationGetRelationName(rel));
 			buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
-			page = BufferGetPage(buf);
+			page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 		}
 
@@ -1619,7 +1619,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
 		blkno = ItemPointerGetBlockNumber(&(itup->t_tid));
 
 		buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	}
 
@@ -1665,7 +1665,7 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
 	}
 
 	PredicateLockPage(rel, BufferGetBlockNumber(buf), scan->xs_snapshot);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	Assert(P_ISLEAF(opaque));
 
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 83c553c..edd36f9 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1756,7 +1756,7 @@ _bt_killitems(IndexScanDesc scan)
 		 */
 		LockBuffer(so->currPos.buf, BT_READ);
 
-		page = BufferGetPage(so->currPos.buf);
+		page = BufferGetPage(so->currPos.buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	}
 	else
 	{
@@ -1769,7 +1769,7 @@ _bt_killitems(IndexScanDesc scan)
 		if (!BufferIsValid(buf))
 			return;
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		if (PageGetLSN(page) == so->currPos.lsn)
 			so->currPos.buf = buf;
 		else
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 0d094ca..c4ade48 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -89,7 +89,7 @@ _bt_restore_meta(XLogReaderState *record, uint8 block_id)
 	Assert(len == sizeof(xl_btree_metadata));
 	Assert(BufferGetBlockNumber(metabuf) == BTREE_METAPAGE);
 	xlrec = (xl_btree_metadata *) ptr;
-	metapg = BufferGetPage(metabuf);
+	metapg = BufferGetPage(metabuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	_bt_pageinit(metapg, BufferGetPageSize(metabuf));
 
@@ -130,7 +130,8 @@ _bt_clear_incomplete_split(XLogReaderState *record, uint8 block_id)
 
 	if (XLogReadBufferForRedo(record, block_id, &buf) == BLK_NEEDS_REDO)
 	{
-		Page		page = (Page) BufferGetPage(buf);
+		Page		page = BufferGetPage(buf, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		BTPageOpaque pageop = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		Assert((pageop->btpo_flags & BTP_INCOMPLETE_SPLIT) != 0);
@@ -167,7 +168,7 @@ btree_xlog_insert(bool isleaf, bool ismeta, XLogReaderState *record)
 		Size		datalen;
 		char	   *datapos = XLogRecGetBlockData(record, 0, &datalen);
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageAddItem(page, (Item) datapos, datalen, xlrec->offnum,
 						false, false) == InvalidOffsetNumber)
@@ -224,7 +225,7 @@ btree_xlog_split(bool onleft, bool isroot, XLogReaderState *record)
 	/* Reconstruct right (new) sibling page from scratch */
 	rbuf = XLogInitBufferForRedo(record, 1);
 	datapos = XLogRecGetBlockData(record, 1, &datalen);
-	rpage = (Page) BufferGetPage(rbuf);
+	rpage = BufferGetPage(rbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	_bt_pageinit(rpage, BufferGetPageSize(rbuf));
 	ropaque = (BTPageOpaque) PageGetSpecialPointer(rpage);
@@ -266,7 +267,8 @@ btree_xlog_split(bool onleft, bool isroot, XLogReaderState *record)
 		 * but it helps debugging.  See also _bt_restore_page(), which does
 		 * the same for the right page.
 		 */
-		Page		lpage = (Page) BufferGetPage(lbuf);
+		Page		lpage = BufferGetPage(lbuf, NULL, NULL,
+										  BGP_NO_SNAPSHOT_TEST);
 		BTPageOpaque lopaque = (BTPageOpaque) PageGetSpecialPointer(lpage);
 		OffsetNumber off;
 		Item		newitem = NULL;
@@ -368,7 +370,8 @@ btree_xlog_split(bool onleft, bool isroot, XLogReaderState *record)
 
 		if (XLogReadBufferForRedo(record, 2, &buffer) == BLK_NEEDS_REDO)
 		{
-			Page		page = (Page) BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 			BTPageOpaque pageop = (BTPageOpaque) PageGetSpecialPointer(page);
 
 			pageop->btpo_prev = rightsib;
@@ -471,7 +474,7 @@ btree_xlog_vacuum(XLogReaderState *record)
 
 		ptr = XLogRecGetBlockData(record, 0, &len);
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (len > 0)
 		{
@@ -565,7 +568,7 @@ btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
 	if (!BufferIsValid(ibuffer))
 		return InvalidTransactionId;
 	LockBuffer(ibuffer, BT_READ);
-	ipage = (Page) BufferGetPage(ibuffer);
+	ipage = BufferGetPage(ibuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * Loop through the deleted index items to obtain the TransactionId from
@@ -592,7 +595,7 @@ btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
 			return InvalidTransactionId;
 		}
 		LockBuffer(hbuffer, BUFFER_LOCK_SHARE);
-		hpage = (Page) BufferGetPage(hbuffer);
+		hpage = BufferGetPage(hbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/*
 		 * Look up the heap tuple header that the index tuple points at by
@@ -688,7 +691,7 @@ btree_xlog_delete(XLogReaderState *record)
 	 */
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (XLogRecGetDataLen(record) > SizeOfBtreeDelete)
 		{
@@ -740,7 +743,7 @@ btree_xlog_mark_page_halfdead(uint8 info, XLogReaderState *record)
 		OffsetNumber nextoffset;
 		BlockNumber rightsib;
 
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		pageop = (BTPageOpaque) PageGetSpecialPointer(page);
 
 		poffset = xlrec->poffset;
@@ -764,7 +767,7 @@ btree_xlog_mark_page_halfdead(uint8 info, XLogReaderState *record)
 
 	/* Rewrite the leaf page as a halfdead page */
 	buffer = XLogInitBufferForRedo(record, 0);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	_bt_pageinit(page, BufferGetPageSize(buffer));
 	pageop = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -820,7 +823,7 @@ btree_xlog_unlink_page(uint8 info, XLogReaderState *record)
 	/* Fix left-link of right sibling */
 	if (XLogReadBufferForRedo(record, 2, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		pageop = (BTPageOpaque) PageGetSpecialPointer(page);
 		pageop->btpo_prev = leftsib;
 
@@ -835,7 +838,7 @@ btree_xlog_unlink_page(uint8 info, XLogReaderState *record)
 	{
 		if (XLogReadBufferForRedo(record, 1, &buffer) == BLK_NEEDS_REDO)
 		{
-			page = (Page) BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			pageop = (BTPageOpaque) PageGetSpecialPointer(page);
 			pageop->btpo_next = rightsib;
 
@@ -848,7 +851,7 @@ btree_xlog_unlink_page(uint8 info, XLogReaderState *record)
 
 	/* Rewrite target page as empty deleted page */
 	buffer = XLogInitBufferForRedo(record, 0);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	_bt_pageinit(page, BufferGetPageSize(buffer));
 	pageop = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -877,7 +880,7 @@ btree_xlog_unlink_page(uint8 info, XLogReaderState *record)
 		IndexTupleData trunctuple;
 
 		buffer = XLogInitBufferForRedo(record, 3);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		_bt_pageinit(page, BufferGetPageSize(buffer));
 		pageop = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -921,7 +924,7 @@ btree_xlog_newroot(XLogReaderState *record)
 	Size		len;
 
 	buffer = XLogInitBufferForRedo(record, 0);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	_bt_pageinit(page, BufferGetPageSize(buffer));
 	pageop = (BTPageOpaque) PageGetSpecialPointer(page);
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index f090ca5..b780bfe 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -451,7 +451,7 @@ moveLeafs(Relation index, SpGistState *state,
 	/* Find a leaf page that will hold them */
 	nbuf = SpGistGetBuffer(index, GBUF_LEAF | (isNulls ? GBUF_NULLS : 0),
 						   size, &xlrec.newPage);
-	npage = BufferGetPage(nbuf);
+	npage = BufferGetPage(nbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	nblkno = BufferGetBlockNumber(nbuf);
 	Assert(nblkno != current->blkno);
 
@@ -1037,7 +1037,8 @@ doPickSplit(Relation index, SpGistState *state,
 		nodePageSelect = (uint8 *) palloc(sizeof(uint8) * out.nNodes);
 
 		curspace = currentFreeSpace;
-		newspace = PageGetExactFreeSpace(BufferGetPage(newLeafBuffer));
+		newspace = PageGetExactFreeSpace
+			(BufferGetPage(newLeafBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST));
 		for (i = 0; i < out.nNodes; i++)
 		{
 			if (leafSizes[i] <= curspace)
@@ -1070,7 +1071,9 @@ doPickSplit(Relation index, SpGistState *state,
 
 			/* Repeat the node assignment process --- should succeed now */
 			curspace = currentFreeSpace;
-			newspace = PageGetExactFreeSpace(BufferGetPage(newLeafBuffer));
+			newspace = PageGetExactFreeSpace
+				(BufferGetPage(newLeafBuffer, NULL, NULL,
+							   BGP_NO_SNAPSHOT_TEST));
 			for (i = 0; i < out.nNodes; i++)
 			{
 				if (leafSizes[i] <= curspace)
@@ -1201,7 +1204,9 @@ doPickSplit(Relation index, SpGistState *state,
 			it->nextOffset = InvalidOffsetNumber;
 
 		/* Insert it on page */
-		newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
+		newoffset = SpGistPageAddNewItem(state,
+										 BufferGetPage(leafBuffer, NULL, NULL,
+													   BGP_NO_SNAPSHOT_TEST),
 										 (Item) it, it->size,
 										 &startOffsets[leafPageSelect[i]],
 										 false);
@@ -1275,7 +1280,8 @@ doPickSplit(Relation index, SpGistState *state,
 		/* Repoint "current" at the new inner tuple */
 		current->buffer = newInnerBuffer;
 		current->blkno = BufferGetBlockNumber(current->buffer);
-		current->page = BufferGetPage(current->buffer);
+		current->page = BufferGetPage(current->buffer, NULL, NULL,
+									  BGP_NO_SNAPSHOT_TEST);
 		xlrec.offnumInner = current->offnum =
 			SpGistPageAddNewItem(state, current->page,
 								 (Item) innerTuple, innerTuple->size,
@@ -1391,24 +1397,22 @@ doPickSplit(Relation index, SpGistState *state,
 		/* Update page LSNs on all affected pages */
 		if (newLeafBuffer != InvalidBuffer)
 		{
-			Page		page = BufferGetPage(newLeafBuffer);
-
+			Page		page = BufferGetPage(newLeafBuffer, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 			PageSetLSN(page, recptr);
 		}
 
 		if (saveCurrent.buffer != InvalidBuffer)
 		{
-			Page		page = BufferGetPage(saveCurrent.buffer);
-
+			Page		page = BufferGetPage(saveCurrent.buffer, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 			PageSetLSN(page, recptr);
 		}
 
 		PageSetLSN(current->page, recptr);
 
 		if (parent->buffer != InvalidBuffer)
-		{
 			PageSetLSN(parent->page, recptr);
-		}
 	}
 
 	END_CRIT_SECTION();
@@ -1578,7 +1582,8 @@ spgAddNodeAction(Relation index, SpGistState *state,
 									newInnerTuple->size + sizeof(ItemIdData),
 										  &xlrec.newPage);
 		current->blkno = BufferGetBlockNumber(current->buffer);
-		current->page = BufferGetPage(current->buffer);
+		current->page = BufferGetPage(current->buffer, NULL, NULL,
+									  BGP_NO_SNAPSHOT_TEST);
 
 		/*
 		 * Let's just make real sure new current isn't same as old.  Right now
@@ -1793,7 +1798,9 @@ spgSplitNodeAction(Relation index, SpGistState *state,
 	{
 		postfixBlkno = BufferGetBlockNumber(newBuffer);
 		xlrec.offnumPostfix = postfixOffset =
-			SpGistPageAddNewItem(state, BufferGetPage(newBuffer),
+			SpGistPageAddNewItem(state,
+								 BufferGetPage(newBuffer, NULL, NULL,
+											   BGP_NO_SNAPSHOT_TEST),
 								 (Item) postfixTuple, postfixTuple->size,
 								 NULL, false);
 		MarkBufferDirty(newBuffer);
@@ -1840,7 +1847,8 @@ spgSplitNodeAction(Relation index, SpGistState *state,
 
 		if (newBuffer != InvalidBuffer)
 		{
-			PageSetLSN(BufferGetPage(newBuffer), recptr);
+			PageSetLSN(BufferGetPage(newBuffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST), recptr);
 		}
 	}
 
@@ -1984,7 +1992,8 @@ spgdoinsert(Relation index, SpGistState *state,
 			/* inner tuple can be stored on the same page as parent one */
 			current.buffer = parent.buffer;
 		}
-		current.page = BufferGetPage(current.buffer);
+		current.page = BufferGetPage(current.buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 
 		/* should not arrive at a page of the wrong type */
 		if (isnull ? !SpGistPageStoresNulls(current.page) :
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index 44fd644..3e16b51 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -92,7 +92,8 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 
 	START_CRIT_SECTION();
 
-	SpGistInitMetapage(BufferGetPage(metabuffer));
+	SpGistInitMetapage(BufferGetPage(metabuffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST));
 	MarkBufferDirty(metabuffer);
 	SpGistInitBuffer(rootbuffer, SPGIST_LEAF);
 	MarkBufferDirty(rootbuffer);
@@ -115,9 +116,12 @@ spgbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 
 		recptr = XLogInsert(RM_SPGIST_ID, XLOG_SPGIST_CREATE_INDEX);
 
-		PageSetLSN(BufferGetPage(metabuffer), recptr);
-		PageSetLSN(BufferGetPage(rootbuffer), recptr);
-		PageSetLSN(BufferGetPage(nullbuffer), recptr);
+		PageSetLSN(BufferGetPage(metabuffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST), recptr);
+		PageSetLSN(BufferGetPage(rootbuffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST), recptr);
+		PageSetLSN(BufferGetPage(nullbuffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST), recptr);
 	}
 
 	END_CRIT_SECTION();
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 8aa28ec..c656cdb 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -341,7 +341,7 @@ redirect:
 		}
 		/* else new pointer points to the same page, no work needed */
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		isnull = SpGistPageStoresNulls(page) ? true : false;
 
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 201203f..f4bcbee 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -126,7 +126,8 @@ spgGetCache(Relation index)
 		metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
 		LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
 
-		metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
+		metadata = SpGistPageGetMeta
+			(BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST));
 
 		if (metadata->magicNumber != SPGIST_MAGIC_NUMBER)
 			elog(ERROR, "index \"%s\" is not an SP-GiST index",
@@ -206,7 +207,8 @@ SpGistNewBuffer(Relation index)
 		 */
 		if (ConditionalLockBuffer(buffer))
 		{
-			Page		page = BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL,
+											 BGP_NO_SNAPSHOT_TEST);
 
 			if (PageIsNew(page))
 				return buffer;	/* OK to use, if never initialized */
@@ -256,7 +258,8 @@ SpGistUpdateMetaPage(Relation index)
 
 		if (ConditionalLockBuffer(metabuffer))
 		{
-			metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
+			metadata = SpGistPageGetMeta
+				(BufferGetPage(metabuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST));
 			metadata->lastUsedPages = cache->lastUsedPages;
 
 			MarkBufferDirty(metabuffer);
@@ -333,7 +336,9 @@ allocNewBuffer(Relation index, int flags)
 					blkFlags |= GBUF_NULLS;
 				cache->lastUsedPages.cachedPage[blkFlags].blkno = blkno;
 				cache->lastUsedPages.cachedPage[blkFlags].freeSpace =
-					PageGetExactFreeSpace(BufferGetPage(buffer));
+					PageGetExactFreeSpace
+						(BufferGetPage(buffer, NULL, NULL,
+									   BGP_NO_SNAPSHOT_TEST));
 				UnlockReleaseBuffer(buffer);
 			}
 		}
@@ -401,7 +406,7 @@ SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
 			return allocNewBuffer(index, flags);
 		}
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageIsNew(page) || SpGistPageIsDeleted(page) || PageIsEmpty(page))
 		{
@@ -460,7 +465,7 @@ SpGistSetLastUsedPage(Relation index, Buffer buffer)
 	SpGistCache *cache = spgGetCache(index);
 	SpGistLastUsedPage *lup;
 	int			freeSpace;
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BlockNumber blkno = BufferGetBlockNumber(buffer);
 	int			flags;
 
@@ -508,7 +513,7 @@ void
 SpGistInitBuffer(Buffer b, uint16 f)
 {
 	Assert(BufferGetPageSize(b) == BLCKSZ);
-	SpGistInitPage(BufferGetPage(b), f);
+	SpGistInitPage(BufferGetPage(b, NULL, NULL, BGP_NO_SNAPSHOT_TEST), f);
 }
 
 /*
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 15b867f..6b57790 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -125,7 +125,8 @@ static void
 vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 			   bool forPending)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	spgxlogVacuumLeaf xlrec;
 	OffsetNumber toDead[MaxIndexTuplesPerPage];
 	OffsetNumber toPlaceholder[MaxIndexTuplesPerPage];
@@ -405,7 +406,8 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 static void
 vacuumLeafRoot(spgBulkDeleteState *bds, Relation index, Buffer buffer)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	spgxlogVacuumRoot xlrec;
 	OffsetNumber toDelete[MaxIndexTuplesPerPage];
 	OffsetNumber i,
@@ -490,7 +492,8 @@ vacuumLeafRoot(spgBulkDeleteState *bds, Relation index, Buffer buffer)
 static void
 vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 	SpGistPageOpaque opaque = SpGistPageGetOpaque(page);
 	OffsetNumber i,
 				max = PageGetMaxOffsetNumber(page),
@@ -615,7 +618,7 @@ spgvacuumpage(spgBulkDeleteState *bds, BlockNumber blkno)
 	buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
 								RBM_NORMAL, bds->info->strategy);
 	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (PageIsNew(page))
 	{
@@ -696,7 +699,7 @@ spgprocesspending(spgBulkDeleteState *bds)
 		buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
 									RBM_NORMAL, bds->info->strategy);
 		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-		page = (Page) BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageIsNew(page) || SpGistPageIsDeleted(page))
 		{
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 01a4e0f..b5fc266 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -79,7 +79,7 @@ spgRedoCreateIndex(XLogReaderState *record)
 
 	buffer = XLogInitBufferForRedo(record, 0);
 	Assert(BufferGetBlockNumber(buffer) == SPGIST_METAPAGE_BLKNO);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	SpGistInitMetapage(page);
 	PageSetLSN(page, lsn);
 	MarkBufferDirty(buffer);
@@ -88,7 +88,7 @@ spgRedoCreateIndex(XLogReaderState *record)
 	buffer = XLogInitBufferForRedo(record, 1);
 	Assert(BufferGetBlockNumber(buffer) == SPGIST_ROOT_BLKNO);
 	SpGistInitBuffer(buffer, SPGIST_LEAF);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	PageSetLSN(page, lsn);
 	MarkBufferDirty(buffer);
 	UnlockReleaseBuffer(buffer);
@@ -96,7 +96,7 @@ spgRedoCreateIndex(XLogReaderState *record)
 	buffer = XLogInitBufferForRedo(record, 2);
 	Assert(BufferGetBlockNumber(buffer) == SPGIST_NULL_BLKNO);
 	SpGistInitBuffer(buffer, SPGIST_LEAF | SPGIST_NULLS);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	PageSetLSN(page, lsn);
 	MarkBufferDirty(buffer);
 	UnlockReleaseBuffer(buffer);
@@ -136,7 +136,7 @@ spgRedoAddLeaf(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/* insert new tuple */
 		if (xldata->offnumLeaf != xldata->offnumHeadLeaf)
@@ -183,7 +183,7 @@ spgRedoAddLeaf(XLogReaderState *record)
 
 			XLogRecGetBlockTag(record, 0, NULL, NULL, &blknoLeaf);
 
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			tuple = (SpGistInnerTuple) PageGetItem(page,
 								  PageGetItemId(page, xldata->offnumParent));
@@ -249,7 +249,7 @@ spgRedoMoveLeafs(XLogReaderState *record)
 	{
 		int			i;
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		for (i = 0; i < nInsert; i++)
 		{
@@ -278,7 +278,7 @@ spgRedoMoveLeafs(XLogReaderState *record)
 	/* Delete tuples from the source page, inserting a redirection pointer */
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		spgPageIndexMultiDelete(&state, page, toDelete, xldata->nMoves,
 						state.isBuild ? SPGIST_PLACEHOLDER : SPGIST_REDIRECT,
@@ -297,7 +297,7 @@ spgRedoMoveLeafs(XLogReaderState *record)
 	{
 		SpGistInnerTuple tuple;
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		tuple = (SpGistInnerTuple) PageGetItem(page,
 								  PageGetItemId(page, xldata->offnumParent));
@@ -338,7 +338,7 @@ spgRedoAddNode(XLogReaderState *record)
 		Assert(xldata->parentBlk == -1);
 		if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 		{
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			PageIndexTupleDelete(page, xldata->offnum);
 			if (PageAddItem(page, (Item) innerTuple, innerTupleHdr.size,
@@ -381,7 +381,7 @@ spgRedoAddNode(XLogReaderState *record)
 			action = XLogReadBufferForRedo(record, 1, &buffer);
 		if (action == BLK_NEEDS_REDO)
 		{
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			addOrReplaceTuple(page, (Item) innerTuple,
 							  innerTupleHdr.size, xldata->offnumNew);
@@ -410,7 +410,7 @@ spgRedoAddNode(XLogReaderState *record)
 		{
 			SpGistDeadTuple dt;
 
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			if (state.isBuild)
 				dt = spgFormDeadTuple(&state, SPGIST_PLACEHOLDER,
@@ -462,7 +462,7 @@ spgRedoAddNode(XLogReaderState *record)
 			{
 				SpGistInnerTuple parentTuple;
 
-				page = BufferGetPage(buffer);
+				page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 				parentTuple = (SpGistInnerTuple) PageGetItem(page,
 								  PageGetItemId(page, xldata->offnumParent));
@@ -522,7 +522,7 @@ spgRedoSplitTuple(XLogReaderState *record)
 			action = XLogReadBufferForRedo(record, 1, &buffer);
 		if (action == BLK_NEEDS_REDO)
 		{
-			page = BufferGetPage(buffer);
+			page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			addOrReplaceTuple(page, (Item) postfixTuple,
 							  postfixTupleHdr.size, xldata->offnumPostfix);
@@ -537,7 +537,7 @@ spgRedoSplitTuple(XLogReaderState *record)
 	/* now handle the original page */
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		PageIndexTupleDelete(page, xldata->offnumPrefix);
 		if (PageAddItem(page, (Item) prefixTuple, prefixTupleHdr.size,
@@ -608,7 +608,7 @@ spgRedoPickSplit(XLogReaderState *record)
 	{
 		/* just re-init the source page */
 		srcBuffer = XLogInitBufferForRedo(record, 0);
-		srcPage = (Page) BufferGetPage(srcBuffer);
+		srcPage = BufferGetPage(srcBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		SpGistInitBuffer(srcBuffer,
 					 SPGIST_LEAF | (xldata->storesNulls ? SPGIST_NULLS : 0));
@@ -625,7 +625,7 @@ spgRedoPickSplit(XLogReaderState *record)
 		srcPage = NULL;
 		if (XLogReadBufferForRedo(record, 0, &srcBuffer) == BLK_NEEDS_REDO)
 		{
-			srcPage = BufferGetPage(srcBuffer);
+			srcPage = BufferGetPage(srcBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			/*
 			 * We have it a bit easier here than in doPickSplit(), because we
@@ -661,7 +661,8 @@ spgRedoPickSplit(XLogReaderState *record)
 	{
 		/* just re-init the dest page */
 		destBuffer = XLogInitBufferForRedo(record, 1);
-		destPage = (Page) BufferGetPage(destBuffer);
+		destPage = BufferGetPage(destBuffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST);
 
 		SpGistInitBuffer(destBuffer,
 					 SPGIST_LEAF | (xldata->storesNulls ? SPGIST_NULLS : 0));
@@ -674,7 +675,8 @@ spgRedoPickSplit(XLogReaderState *record)
 		 * full-page-image case, but for safety let's hold it till later.
 		 */
 		if (XLogReadBufferForRedo(record, 1, &destBuffer) == BLK_NEEDS_REDO)
-			destPage = (Page) BufferGetPage(destBuffer);
+			destPage = BufferGetPage(destBuffer, NULL, NULL,
+									 BGP_NO_SNAPSHOT_TEST);
 		else
 			destPage = NULL;	/* don't do any page updates */
 	}
@@ -722,7 +724,7 @@ spgRedoPickSplit(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(innerBuffer);
+		page = BufferGetPage(innerBuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		addOrReplaceTuple(page, (Item) innerTuple, innerTupleHdr.size,
 						  xldata->offnumInner);
@@ -762,7 +764,8 @@ spgRedoPickSplit(XLogReaderState *record)
 		{
 			SpGistInnerTuple parent;
 
-			page = BufferGetPage(parentBuffer);
+			page = BufferGetPage(parentBuffer, NULL, NULL,
+								 BGP_NO_SNAPSHOT_TEST);
 
 			parent = (SpGistInnerTuple) PageGetItem(page,
 								  PageGetItemId(page, xldata->offnumParent));
@@ -813,7 +816,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		spgPageIndexMultiDelete(&state, page,
 								toDead, xldata->nDead,
@@ -876,7 +879,7 @@ spgRedoVacuumRoot(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		/* The tuple numbers are in order */
 		PageIndexMultiDelete(page, toDelete, xldata->nDelete);
@@ -917,7 +920,8 @@ spgRedoVacuumRedirect(XLogReaderState *record)
 
 	if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
 	{
-		Page		page = BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 		SpGistPageOpaque opaque = SpGistPageGetOpaque(page);
 		int			i;
 
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index c37003a..1e336ed 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -228,7 +228,7 @@ XLogRegisterBuffer(uint8 block_id, Buffer buffer, uint8 flags)
 	regbuf = &registered_buffers[block_id];
 
 	BufferGetTag(buffer, &regbuf->rnode, &regbuf->forkno, &regbuf->block);
-	regbuf->page = BufferGetPage(buffer);
+	regbuf->page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	regbuf->flags = flags;
 	regbuf->rdata_tail = (XLogRecData *) &regbuf->rdata_head;
 	regbuf->rdata_len = 0;
@@ -825,7 +825,7 @@ XLogCheckBufferNeedsBackup(Buffer buffer)
 
 	GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
 
-	page = BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (doPageWrites && PageGetLSN(page) <= RedoRecPtr)
 		return true;			/* buffer requires backup */
@@ -896,7 +896,7 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
 		if (buffer_std)
 		{
 			/* Assume we can omit data between pd_lower and pd_upper */
-			Page		page = BufferGetPage(buffer);
+			Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 			uint16		lower = ((PageHeader) page)->pd_lower;
 			uint16		upper = ((PageHeader) page)->pd_upper;
 
@@ -973,7 +973,7 @@ log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
 XLogRecPtr
 log_newpage_buffer(Buffer buffer, bool page_std)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	RelFileNode rnode;
 	ForkNumber	forkNum;
 	BlockNumber blkno;
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index f6ca2b9..c3213ac 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -358,7 +358,7 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
 	{
 		*buf = XLogReadBufferExtended(rnode, forknum, blkno,
 		   get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK);
-		page = BufferGetPage(*buf);
+		page = BufferGetPage(*buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		if (!RestoreBlockImage(record, block_id, page))
 			elog(ERROR, "failed to restore block image");
 
@@ -396,7 +396,8 @@ XLogReadBufferForRedoExtended(XLogReaderState *record,
 				else
 					LockBuffer(*buf, BUFFER_LOCK_EXCLUSIVE);
 			}
-			if (lsn <= PageGetLSN(BufferGetPage(*buf)))
+			if (lsn <= PageGetLSN(BufferGetPage(*buf, NULL, NULL,
+												BGP_NO_SNAPSHOT_TEST)))
 				return BLK_DONE;
 			else
 				return BLK_NEEDS_REDO;
@@ -502,7 +503,8 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum,
 	if (mode == RBM_NORMAL)
 	{
 		/* check that page has been initialized */
-		Page		page = (Page) BufferGetPage(buffer);
+		Page		page = BufferGetPage(buffer, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST);
 
 		/*
 		 * We assume that PageIsNew is safe without a lock. During recovery,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..f8398dd 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2306,7 +2306,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		 */
 		if (scan->rs_cblock != root_blkno)
 		{
-			Page		page = BufferGetPage(scan->rs_cbuf);
+			Page		page = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 			heap_get_root_tuples(page, root_offsets);
@@ -3016,7 +3016,7 @@ validate_index_heapscan(Relation heapRelation,
 		 */
 		if (scan->rs_cblock != root_blkno)
 		{
-			Page		page = BufferGetPage(scan->rs_cbuf);
+			Page		page = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 			heap_get_root_tuples(page, root_offsets);
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8a5f07c..849ebee 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1025,7 +1025,7 @@ acquire_sample_rows(Relation onerel, int elevel,
 		targbuffer = ReadBufferExtended(onerel, MAIN_FORKNUM, targblock,
 										RBM_NORMAL, vac_strategy);
 		LockBuffer(targbuffer, BUFFER_LOCK_SHARE);
-		targpage = BufferGetPage(targbuffer);
+		targpage = BufferGetPage(targbuffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		maxoffset = PageGetMaxOffsetNumber(targpage);
 
 		/* Inner loop over all tuples on the selected page */
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index c98f981..f38126f 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -337,7 +337,7 @@ fill_seq_with_data(Relation rel, HeapTuple tuple)
 	buf = ReadBuffer(rel, P_NEW);
 	Assert(BufferGetBlockNumber(buf) == 0);
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	PageInit(page, BufferGetPageSize(buf), sizeof(sequence_magic));
 	sm = (sequence_magic *) PageGetSpecialPointer(page);
@@ -462,7 +462,7 @@ AlterSequence(AlterSeqStmt *stmt)
 	{
 		xl_seq_rec	xlrec;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buf);
+		Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		XLogBeginInsert();
 		XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
@@ -584,7 +584,7 @@ nextval_internal(Oid relid)
 
 	/* lock page' buffer and read tuple */
 	seq = read_seq_tuple(elm, seqrel, &buf, &seqtuple);
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	last = next = result = seq->last_value;
 	incby = seq->increment_by;
@@ -923,7 +923,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
 	{
 		xl_seq_rec	xlrec;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buf);
+		Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		XLogBeginInsert();
 		XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
@@ -1115,7 +1115,7 @@ read_seq_tuple(SeqTable elm, Relation rel, Buffer *buf, HeapTuple seqtuple)
 	*buf = ReadBuffer(rel, 0);
 	LockBuffer(*buf, BUFFER_LOCK_EXCLUSIVE);
 
-	page = BufferGetPage(*buf);
+	page = BufferGetPage(*buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	sm = (sequence_magic *) PageGetSpecialPointer(page);
 
 	if (sm->magic != SEQ_MAGIC)
@@ -1591,7 +1591,7 @@ seq_redo(XLogReaderState *record)
 		elog(PANIC, "seq_redo: unknown op code %u", info);
 
 	buffer = XLogInitBufferForRedo(record, 0);
-	page = (Page) BufferGetPage(buffer);
+	page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * We always reinit the page.  However, since this WAL record type is also
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 5429aab..f7f21e4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2798,7 +2798,7 @@ ltrmark:;
 		 */
 		LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
-		page = BufferGetPage(buffer);
+		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
 
 		Assert(ItemIdIsNormal(lp));
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 52e19b3..3f48ef4 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -803,7 +803,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 
 		vacrelstats->scanned_pages++;
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageIsNew(page))
 		{
@@ -1378,7 +1378,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 									&vmbuffer);
 
 		/* Now that we've compacted the page, record its available space */
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		freespace = PageGetHeapFreeSpace(page);
 
 		UnlockReleaseBuffer(buf);
@@ -1414,7 +1414,7 @@ static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
 {
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
@@ -1511,7 +1511,7 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 static bool
 lazy_check_needs_freeze(Buffer buf, bool *hastup)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	OffsetNumber offnum,
 				maxoff;
 	HeapTupleHeader tupleheader;
@@ -1863,7 +1863,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 		/* In this phase we only need shared access to the buffer */
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-		page = BufferGetPage(buf);
+		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 		if (PageIsNew(page) || PageIsEmpty(page))
 		{
@@ -2031,7 +2031,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid,
 						 bool *all_frozen)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	BlockNumber blockno = BufferGetBlockNumber(buf);
 	OffsetNumber offnum,
 				maxoff;
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 449aacb..b7a2ca7 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -257,7 +257,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 * Okay to fetch the tuple
 		 */
 		targoffset = scan->rs_vistuples[scan->rs_cindex];
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
+		dp = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		lp = PageGetItemId(dp, targoffset);
 		Assert(ItemIdIsNormal(lp));
 
@@ -375,7 +375,7 @@ bitgetpage(HeapScanDesc scan, TBMIterateResult *tbmres)
 		 * Bitmap is lossy, so we must examine each item pointer on the page.
 		 * But we can ignore HOT chains, since we'll check each tuple anyway.
 		 */
-		Page		dp = (Page) BufferGetPage(buffer);
+		Page		dp = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		OffsetNumber maxoff = PageGetMaxOffsetNumber(dp);
 		OffsetNumber offnum;
 
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 9ce7c02..e12b424 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -435,7 +435,7 @@ tablesample_getnext(SampleScanState *scanstate)
 	if (!pagemode)
 		LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-	page = (Page) BufferGetPage(scan->rs_cbuf);
+	page = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
 	maxoffset = PageGetMaxOffsetNumber(page);
 
@@ -546,7 +546,7 @@ tablesample_getnext(SampleScanState *scanstate)
 		if (!pagemode)
 			LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
 
-		page = (Page) BufferGetPage(scan->rs_cbuf);
+		page = BufferGetPage(scan->rs_cbuf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 		all_visible = PageIsAllVisible(page) && !snapshot->takenDuringRecovery;
 		maxoffset = PageGetMaxOffsetNumber(page);
 	}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6dd7c6e..fe48aab 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2734,7 +2734,7 @@ XLogRecPtr
 BufferGetLSNAtomic(Buffer buffer)
 {
 	BufferDesc *bufHdr = GetBufferDescriptor(buffer - 1);
-	char	   *page = BufferGetPage(buffer);
+	char	   *page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	XLogRecPtr	lsn;
 
 	/*
@@ -3269,7 +3269,7 @@ void
 MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
 {
 	BufferDesc *bufHdr;
-	Page		page = BufferGetPage(buffer);
+	Page		page = BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (!BufferIsValid(buffer))
 		elog(ERROR, "bad buffer ID: %d", buffer);
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2631080..02ab2d9 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -211,7 +211,7 @@ XLogRecordPageWithFreeSpace(RelFileNode rnode, BlockNumber heapBlk,
 	buf = XLogReadBufferExtended(rnode, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR);
 	LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	if (PageIsNew(page))
 		PageInit(page, BLCKSZ, 0);
 
@@ -238,7 +238,8 @@ GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk)
 	buf = fsm_readbuf(rel, addr, false);
 	if (!BufferIsValid(buf))
 		return 0;
-	cat = fsm_get_avail(BufferGetPage(buf), slot);
+	cat = fsm_get_avail(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+						slot);
 	ReleaseBuffer(buf);
 
 	return fsm_space_cat_to_avail(cat);
@@ -285,7 +286,9 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks)
 		if (!BufferIsValid(buf))
 			return;				/* nothing to do; the FSM was already smaller */
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
-		fsm_truncate_avail(BufferGetPage(buf), first_removed_slot);
+		fsm_truncate_avail(BufferGetPage(buf, NULL, NULL,
+										 BGP_NO_SNAPSHOT_TEST),
+						   first_removed_slot);
 		MarkBufferDirtyHint(buf, false);
 		UnlockReleaseBuffer(buf);
 
@@ -535,8 +538,9 @@ fsm_readbuf(Relation rel, FSMAddress addr, bool extend)
 	 * headers, for example.
 	 */
 	buf = ReadBufferExtended(rel, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL);
-	if (PageIsNew(BufferGetPage(buf)))
-		PageInit(BufferGetPage(buf), BLCKSZ, 0);
+	if (PageIsNew(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST)))
+		PageInit(BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
+				 BLCKSZ, 0);
 	return buf;
 }
 
@@ -615,7 +619,7 @@ fsm_set_and_search(Relation rel, FSMAddress addr, uint16 slot,
 	buf = fsm_readbuf(rel, addr, true);
 	LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	if (fsm_set_avail(page, slot, newValue))
 		MarkBufferDirtyHint(buf, false);
@@ -659,7 +663,9 @@ fsm_search(Relation rel, uint8 min_cat)
 									(addr.level == FSM_BOTTOM_LEVEL),
 									false);
 			if (slot == -1)
-				max_avail = fsm_get_max_avail(BufferGetPage(buf));
+				max_avail =
+					fsm_get_max_avail(BufferGetPage(buf, NULL, NULL,
+													BGP_NO_SNAPSHOT_TEST));
 			UnlockReleaseBuffer(buf);
 		}
 		else
@@ -741,7 +747,7 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
 	else
 		*eof_p = false;
 
-	page = BufferGetPage(buf);
+	page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 
 	/*
 	 * Recurse into children, and fix the information stored about them at
@@ -768,14 +774,17 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, bool *eof_p)
 			if (fsm_get_avail(page, slot) != child_avail)
 			{
 				LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
-				fsm_set_avail(BufferGetPage(buf), slot, child_avail);
+				fsm_set_avail(BufferGetPage(buf, NULL, NULL,
+											BGP_NO_SNAPSHOT_TEST),
+							  slot, child_avail);
 				MarkBufferDirtyHint(buf, false);
 				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 			}
 		}
 	}
 
-	max_avail = fsm_get_max_avail(BufferGetPage(buf));
+	max_avail = fsm_get_max_avail(BufferGetPage(buf, NULL, NULL,
+												BGP_NO_SNAPSHOT_TEST));
 
 	/*
 	 * Reset the next slot pointer. This encourages the use of low-numbered
diff --git a/src/backend/storage/freespace/fsmpage.c b/src/backend/storage/freespace/fsmpage.c
index 535a471..baceee7 100644
--- a/src/backend/storage/freespace/fsmpage.c
+++ b/src/backend/storage/freespace/fsmpage.c
@@ -158,7 +158,7 @@ int
 fsm_search_avail(Buffer buf, uint8 minvalue, bool advancenext,
 				 bool exclusive_lock_held)
 {
-	Page		page = BufferGetPage(buf);
+	Page		page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
 	FSMPage		fsmpage = (FSMPage) PageGetContents(page);
 	int			nodeno;
 	int			target;
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 7d57c04..9e31fef 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -45,6 +45,19 @@ typedef enum
 								 * replay; otherwise same as RBM_NORMAL */
 } ReadBufferMode;
 
+/*
+ * Forced choice for whether BufferGetPage() must check snapshot age
+ *
+ * A scan must test for old snapshot, unless the test would be redundant (for
+ * example, to tests already made at a lower level on all code paths).
+ * Positioning for DML or vacuuming does not need this sort of test.
+ */
+typedef enum
+{
+	BGP_NO_SNAPSHOT_TEST,		/* Not used for scan, or is redundant */
+	BGP_TEST_FOR_OLD_SNAPSHOT	/* Test for old snapshot is needed */
+} BufferGetPageAgeTest;
+
 /* forward declared, to avoid having to expose buf_internals.h here */
 struct WritebackContext;
 
@@ -165,7 +178,7 @@ extern PGDLLIMPORT int32 *LocalRefCount;
  * BufferGetPage
  *		Returns the page associated with a buffer.
  */
-#define BufferGetPage(buffer) ((Page)BufferGetBlock(buffer))
+#define BufferGetPage(buffer, snapshot, relation, agetest) ((Page)BufferGetBlock(buffer)) 
 
 /*
  * prototypes for functions in bufmgr.c

#46

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Kevin Grittner (#45)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

Attached is what I think you're talking about for the first patch.
AFAICS this should generate identical executable code to unpatched.
Then the patch to actually implement the feature would, instead
of adding 30-some lines with TestForOldSnapshot() would implement
that as the behavior for the other enum value, and alter those
30-some BufferGetPage() calls.

ï¿½lvaro and Michael, is this what you were looking for?

Yes, this is what I was thinking, thanks.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Michael Paquier

michael.paquier@gmail.com

almost 10 years ago

In reply to: Alvaro Herrera (#46)

Re: snapshot too old, configured by time

On Fri, Apr 1, 2016 at 11:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

Attached is what I think you're talking about for the first patch.
AFAICS this should generate identical executable code to unpatched.
Then the patch to actually implement the feature would, instead
of adding 30-some lines with TestForOldSnapshot() would implement
that as the behavior for the other enum value, and alter those
30-some BufferGetPage() calls.

Álvaro and Michael, is this what you were looking for?

Yes, this is what I was thinking, thanks.

A small thing:
$ git diff master --check
src/include/storage/bufmgr.h:181: trailing whitespace.
+#define BufferGetPage(buffer, snapshot, relation, agetest)
((Page)BufferGetBlock(buffer))

-   Page        page = BufferGetPage(buf);
+   Page        page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
Having a BufferGetPageExtended() with some flags and a default
corresponding to NO_SNAPSHOT_TEST would reduce the diff impact. And as
long as the check is integrated with BufferGetPage[Extended]() I would
not complain, the patch as proposed being 174kB...
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Michael Paquier (#47)

Re: snapshot too old, configured by time

On Sat, Apr 2, 2016 at 7:12 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 1, 2016 at 11:45 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

Attached is what I think you're talking about for the first patch.
AFAICS this should generate identical executable code to unpatched.
Then the patch to actually implement the feature would, instead
of adding 30-some lines with TestForOldSnapshot() would implement
that as the behavior for the other enum value, and alter those
30-some BufferGetPage() calls.

Álvaro and Michael, is this what you were looking for?

Yes, this is what I was thinking, thanks.

A small thing:
$ git diff master --check
src/include/storage/bufmgr.h:181: trailing whitespace.
+#define BufferGetPage(buffer, snapshot, relation, agetest)
((Page)BufferGetBlock(buffer))
-   Page        page = BufferGetPage(buf);
+   Page        page = BufferGetPage(buf, NULL, NULL, BGP_NO_SNAPSHOT_TEST);
Having a BufferGetPageExtended() with some flags and a default
corresponding to NO_SNAPSHOT_TEST would reduce the diff impact. And as
long as the check is integrated with BufferGetPage[Extended]() I would
not complain, the patch as proposed being 174kB...

If you are saying that the 450 places that don't need the check
would remain unchanged, and the only difference would be to use
BufferGetPageExtended() instead of BufferGetPage() followed by
TestForOldSnapshot() in the 30-some places that need the check, I
don't see the point. That would eliminate the "forced choice"
aspect of what Álvaro is asking for, and it seems to me that it
would do next to nothing to prevent the errors of omission that are
the concern here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Kevin Grittner (#36)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'm not sure if this is operating as expected.

I set the value to 1min.

I set up a test like this:

pgbench -i

pgbench -c4 -j4 -T 3600 &

### watch the size of branches table
while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &

### set up a long lived snapshot.
psql -c 'begin; set transaction isolation level repeatable read;
select sum(bbalance) from pgbench_branches; select pg_sleep(300);
select sum(bbalance) from pgbench_branches;'

As this runs, I can see the size of the pgbench_branches bloating once
the snapshot is taken, and continues bloating at a linear rate for the
full 300 seconds.

Once the 300 second pg_sleep is up, the long-lived snapshot holder
receives an error once it tries to access the table again, and then
the bloat stops increasing. But shouldn't the bloat have stopped
increasing as soon as the snapshot became doomed, which would be after
a minute or so?

This is actually operating as intended, not a bug. Try running a
manual VACUUM command about two minutes after the snapshot is taken
and you should get a handle on what's going on. The old tuples
become eligible for vacuuming after one minute, but that doesn't
necessarily mean that autovacuum jumps in and that the space starts
getting reused.

I can verify that a manual vacuum does stop the bloat from continuing
to increase. But I don't see why autovacuum is not already stopping
the bloat. It is running often enough that it really ought to do so
(as verified by setting log_autovacuum_min_duration = 0 and looking in
the log files to see that it is vacuuming the table once per nap-time,
although it is not accomplishing much by doing so as no tuples can be
removed.)

Also, HOT-cleanup should stop the bloat increase once the snapshot
crosses the old_snapshot_threshold without even needing to wait until
the next autovac runs.

Does the code intentionally only work for manual vacuums? If so, that
seems quite surprising. Or perhaps I am missing something else here.

Thanks,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Kevin Grittner (#37)

Re: snapshot too old, configured by time

On Wed, Mar 30, 2016 at 12:46 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan <pg@heroku.com> wrote:

[Does the patch allow dangling page pointers?]

Again, I don't want to prejudice anyone against your patch, which I
haven't read.

I don't believe that the way the patch does its business opens any
new vulnerabilities of this type. If you see such after looking at
the patch, let me know.

Okay, let me be more concrete about this. The patch does this:

--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* need to use the horizon that includes slots, otherwise the data-only
* horizon can be used. Note that the toast relation of user defined
* relations are *not* considered catalog relations.
+    *
+    * It is OK to apply the old snapshot limit before acquiring the cleanup
+    * lock because the worst that can happen is that we are not quite as
+    * aggressive about the cleanup (by however many transaction IDs are
+    * consumed between this point and acquiring the lock).  This allows us to
+    * save significant overhead in the case where the page is found not to be
+    * prunable.
*/
if (IsCatalogRelation(relation) ||
RelationIsAccessibleInLogicalDecoding(relation))
OldestXmin = RecentGlobalXmin;
else
-       OldestXmin = RecentGlobalDataXmin;
+       OldestXmin =
+               TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+                                                   relation);

This new intermediary function TransactionIdLimitedForOldSnapshots()
is called to decide what OldestXmin actually gets to be above, based
in part on the new GUC old_snapshot_threshold:

+/*
+ * TransactionIdLimitedForOldSnapshots
+ *
+ * Apply old snapshot limit, if any.  This is intended to be called for page
+ * pruning and table vacuuming, to allow old_snapshot_threshold to override
+ * the normal global xmin value.  Actual testing for snapshot too old will be
+ * based on whether a snapshot timestamp is prior to the threshold timestamp
+ * set in this function.
+ */
+TransactionId
+TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+                                   Relation relation)

It might not be RecentGlobalDataXmin that is usually returned as
OldestXmin as it is today, which is exactly the point of this patch:
VACUUM can be more aggressive in cleaning up bloat, not unlike the
non-catalog logical decoding case, on the theory that we can reliably
detect when that causes failures for old snapshots, and just raise a
"snapshot too old" error. (RecentGlobalDataXmin is morally about the
same as RecentGlobalXmin, as far as this patch goes).

So far, so good. It's okay that _bt_page_recyclable() never got the
memo about any of this...:

/*
* _bt_page_recyclable() -- Is an existing page recyclable?
*
* This exists to make sure _bt_getbuf and btvacuumscan have the same
* policy about whether a page is safe to re-use.
*/
bool
_bt_page_recyclable(Page page)
{
BTPageOpaque opaque;

...

/*
* Otherwise, recycle if deleted and too old to have any processes
* interested in it.
*/
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISDELETED(opaque) &&
TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
return true;
return false;
}

...because this patch does nothing to advance RecentGlobalXmin (or
RecentGlobalDataXmin) itself more aggressively. It does make
vacuum_set_xid_limits() get a more aggressive cutoff point, but we
don't see that being passed back down by lazy vacuum here; within
_bt_page_recyclable(), we rely on the more conservative
RecentGlobalXmin, which is not subject to any clever optimization in
the patch.

Fortunately, this seems correct, since index scans will always succeed
in finding a deleted page, per nbtree README notes on
RecentGlobalXmin. Unfortunately, this does stop recycling from
happening early for B-Tree pages, even though that's probably safe in
principle. This is probably not so bad -- it just needs to be
considered when reviewing this patch (the same is true of logical
decoding's RecentGlobalDataXmin; it also doesn't appear in
_bt_page_recyclable(), and I guess that that was never a problem).
Index relations will not get smaller in some important cases, but they
will be made less bloated by VACUUM in a sense that's still probably
very useful. Maybe that explains some of what Jeff talked about.

I think another part of the problems that Jeff mentioned (with
pruning) could be this existing code within heap_hot_search_buffer():

/*
* If we can't see it, maybe no one else can either. At caller
* request, check whether all chain members are dead to all
* transactions.
*/
if (all_dead && *all_dead &&
!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
*all_dead = false;

This is used within routines like btgettuple(), to do the LP_DEAD
thing to kill index tuples (not HOT chain pruning).

Aside: Not sure offhand why it might be okay, performance-wise, that
this code doesn't care about RecentGlobalDataXmin; pruning was a big
part of why RecentGlobalDataXmin was added for logical decoding, I
thought, although I guess the _bt_killitems() stuff doesn't count as
pruning.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Jeff Janes (#49)

Re: snapshot too old, configured by time

On Sun, Apr 3, 2016 at 2:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Also, HOT-cleanup should stop the bloat increase once the snapshot
crosses the old_snapshot_threshold without even needing to wait until
the next autovac runs.

Does the code intentionally only work for manual vacuums? If so, that
seems quite surprising. Or perhaps I am missing something else here.

What proportion of the statements in your simulated workload were
updates? Per my last mail to this thread, I'm interested in knowing if
this was a delete heavy workload.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Peter Geoghegan (#51)

Re: snapshot too old, configured by time

On Mon, Apr 4, 2016 at 8:38 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Sun, Apr 3, 2016 at 2:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Also, HOT-cleanup should stop the bloat increase once the snapshot
crosses the old_snapshot_threshold without even needing to wait until
the next autovac runs.

Does the code intentionally only work for manual vacuums? If so, that
seems quite surprising. Or perhaps I am missing something else here.

What proportion of the statements in your simulated workload were
updates? Per my last mail to this thread, I'm interested in knowing if
this was a delete heavy workload.

It was pgbench's built in TPC-B-like, so 3 UPDATE, 1 SELECT, 1 INSERT
per transaction.

So I would say that it is ridiculously update heavy compared to almost
any real-world use patterns.

That is the active workload. The long-term snapshot holder just does
a sum(abalance) at the repeatable read level in order to force a
snapshot to be taken and held, and then goes idle for a long time.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Jeff Janes (#49)

1 attachment(s)

Re: snapshot too old, configured by time

On Sun, Apr 3, 2016 at 4:09 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I set the value to 1min.

I set up a test like this:

pgbench -i

pgbench -c4 -j4 -T 3600 &

### watch the size of branches table
while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &

### set up a long lived snapshot.
psql -c 'begin; set transaction isolation level repeatable read;
select sum(bbalance) from pgbench_branches; select pg_sleep(300);
select sum(bbalance) from pgbench_branches;'

As this runs, I can see the size of the pgbench_branches bloating once
the snapshot is taken, and continues bloating at a linear rate for the
full 300 seconds.

I'm not seeing that on my i7 box.

Once the 300 second pg_sleep is up, the long-lived snapshot holder
receives an error once it tries to access the table again, and then
the bloat stops increasing. But shouldn't the bloat have stopped
increasing as soon as the snapshot became doomed, which would be after
a minute or so?

It will, limited by how well your autovacuum can keep up with the
workload on your system. See attached graph. I ran 5 times each
in 3 configurations and graphed the median and average of be each.

master: development checkout from today, no config changes

patch: patch with no config changes except:
old_snapshot_threshold = '1min'

patch + av: patch with these config changes:
old_snapshot_threshold = '1min'
autovacuum_max_workers = 8
autovacuum_vacuum_cost_limit = 2000
autovacuum_naptime = '10s'
autovacuum_work_mem = '1GB'

As expected, differences are minimal at first, then the patch
starts to win, and wins even better with more aggressive
autovacuum.

I can verify that a manual vacuum does stop the bloat from continuing
to increase. But I don't see why autovacuum is not already stopping
the bloat. It is running often enough that it really ought to do so
(as verified by setting log_autovacuum_min_duration = 0 and looking in
the log files to see that it is vacuuming the table once per nap-time,
although it is not accomplishing much by doing so as no tuples can be
removed.)

Perhaps the CPUs I have or the way I have my machine tuned allows
autovacuum to be more effective in the face of the pgbench load
than on yours?

Also, HOT-cleanup should stop the bloat increase once the snapshot
crosses the old_snapshot_threshold without even needing to wait until
the next autovac runs.

It should help some, but you really need a vacuum in there to take
care things thoroughly.

Does the code intentionally only work for manual vacuums? If so, that
seems quite surprising. Or perhaps I am missing something else here.

Perhaps it is that VACUUM tries harder to get the work done, while
autovacuum steps out of the way when it detects that it is blocking
something. This is a pretty small table (it never gets to 1MB even
when bloating) with multiple clients pounding on it. It might just
be that your system doesn't allow much autovacuum activity before
it find itself blocking a pgbench process.

FWIW, our customer's 30-day test runs were on databases of hundreds
of GB and showed similar benefits -- linear growth indefinitely
without the patch, settling in to a pretty steady state after a few
hours with the patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#54

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Peter Geoghegan (#50)

Re: snapshot too old, configured by time

On Mon, Apr 4, 2016 at 9:15 PM, Peter Geoghegan <pg@heroku.com> wrote:

The patch does this:

--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* need to use the horizon that includes slots, otherwise the data-only
* horizon can be used. Note that the toast relation of user defined
* relations are *not* considered catalog relations.
+    *
+    * It is OK to apply the old snapshot limit before acquiring the cleanup
+    * lock because the worst that can happen is that we are not quite as
+    * aggressive about the cleanup (by however many transaction IDs are
+    * consumed between this point and acquiring the lock).  This allows us to
+    * save significant overhead in the case where the page is found not to be
+    * prunable.
*/
if (IsCatalogRelation(relation) ||
RelationIsAccessibleInLogicalDecoding(relation))
OldestXmin = RecentGlobalXmin;
else
-       OldestXmin = RecentGlobalDataXmin;
+       OldestXmin =
+               TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+                                                   relation);

This new intermediary function TransactionIdLimitedForOldSnapshots()
is called to decide what OldestXmin actually gets to be above, based
in part on the new GUC old_snapshot_threshold:

+/*
+ * TransactionIdLimitedForOldSnapshots
+ *
+ * Apply old snapshot limit, if any.  This is intended to be called for page
+ * pruning and table vacuuming, to allow old_snapshot_threshold to override
+ * the normal global xmin value.  Actual testing for snapshot too old will be
+ * based on whether a snapshot timestamp is prior to the threshold timestamp
+ * set in this function.
+ */
+TransactionId
+TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
+                                   Relation relation)

It might not be RecentGlobalDataXmin that is usually returned as
OldestXmin as it is today, which is exactly the point of this patch:

Right.

VACUUM can be more aggressive in cleaning up bloat, [...], on the
theory that we can reliably detect when that causes failures for
old snapshots, and just raise a "snapshot too old" error.

Right.

...because this patch does nothing to advance RecentGlobalXmin (or
RecentGlobalDataXmin) itself more aggressively. It does make
vacuum_set_xid_limits() get a more aggressive cutoff point, but we
don't see that being passed back down by lazy vacuum here; within
_bt_page_recyclable(), we rely on the more conservative
RecentGlobalXmin, which is not subject to any clever optimization in
the patch.

Right.

Fortunately, this seems correct, since index scans will always succeed
in finding a deleted page, per nbtree README notes on
RecentGlobalXmin.

Right.

Unfortunately, this does stop recycling from
happening early for B-Tree pages, even though that's probably safe in
principle. This is probably not so bad -- it just needs to be
considered when reviewing this patch (the same is true of logical
decoding's RecentGlobalDataXmin; it also doesn't appear in
_bt_page_recyclable(), and I guess that that was never a problem).
Index relations will not get smaller in some important cases, but they
will be made less bloated by VACUUM in a sense that's still probably
very useful.

As I see it, if the long-running transaction(s) have written (and
thus acquired transaction IDs), we can't safely advance the global
Xmin values until they complete. If the long-running transactions
with the old snapshots don't have transaction IDs, the bloat will
be contained.

Maybe that explains some of what Jeff talked about.

I think he just didn't have autovacuum configured to where it was
being very effective on the tiny tables involved. See my reply to
Jeff and the graphs from running his test on my system. I don't
think the lines could get much more flat than what I'm seeing with
the patch.

It may be that this general approach could be made more aggressive
and effective by pushing things in the direction you suggest, but
we are far past the time to consider that sort of change for 9.6.
This patch has been through several rounds of 30-day testing; a
change like you propose would require that those tests be redone.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Kevin Grittner (#54)

Re: snapshot too old, configured by time

On Thu, Apr 7, 2016 at 3:56 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Unfortunately, this does stop recycling from
happening early for B-Tree pages, even though that's probably safe in
principle. This is probably not so bad -- it just needs to be
considered when reviewing this patch (the same is true of logical
decoding's RecentGlobalDataXmin; it also doesn't appear in
_bt_page_recyclable(), and I guess that that was never a problem).
Index relations will not get smaller in some important cases, but they
will be made less bloated by VACUUM in a sense that's still probably
very useful.

As I see it, if the long-running transaction(s) have written (and
thus acquired transaction IDs), we can't safely advance the global
Xmin values until they complete. If the long-running transactions
with the old snapshots don't have transaction IDs, the bloat will
be contained.

I'm not really that concerned about it. I'm mostly just explaining my
thought process.

Maybe that explains some of what Jeff talked about.

I think he just didn't have autovacuum configured to where it was
being very effective on the tiny tables involved. See my reply to
Jeff and the graphs from running his test on my system. I don't
think the lines could get much more flat than what I'm seeing with
the patch.

It may be that this general approach could be made more aggressive
and effective by pushing things in the direction you suggest, but
we are far past the time to consider that sort of change for 9.6.
This patch has been through several rounds of 30-day testing; a
change like you propose would require that those tests be redone.

I think that there is a good argument in favor of this patch that you
may have failed to make yourself, which is: it limits bloat in a way
that's analogous to how RecentGlobalDataXmin can do so for logical
decoding (i.e. where wal_level = logical, and RecentGlobalXmin and
RecentGlobalDataXmin could actually differ). Therefore, it benefits to
a significant degree from the testing that Andres did to make sure
logical decoding doesn't cause excessive bloat when RecentGlobalXmin
is pinned to make historic MVCC catalog snapshots work (he did so at
my insistence at the time; pruning turned out to be very important for
many common workloads, and Andres got that right). I can't really
imagine a way that what you have here could be any less effective than
what Andres did for logical decoding. This is reassuring, since that
mechanism has to be pretty well battle-hardened by now.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Kevin Grittner

kgrittn@gmail.com

almost 10 years ago

In reply to: Peter Geoghegan (#55)

Re: snapshot too old, configured by time

On Thu, Apr 7, 2016 at 6:12 PM, Peter Geoghegan <pg@heroku.com> wrote:

I think that there is a good argument in favor of this patch that you
may have failed to make yourself, which is: it limits bloat in a way
that's analogous to how RecentGlobalDataXmin can do so for logical
decoding (i.e. where wal_level = logical, and RecentGlobalXmin and
RecentGlobalDataXmin could actually differ). Therefore, it benefits to
a significant degree from the testing that Andres did to make sure
logical decoding doesn't cause excessive bloat when RecentGlobalXmin
is pinned to make historic MVCC catalog snapshots work (he did so at
my insistence at the time; pruning turned out to be very important for
many common workloads, and Andres got that right). I can't really
imagine a way that what you have here could be any less effective than
what Andres did for logical decoding. This is reassuring, since that
mechanism has to be pretty well battle-hardened by now.

Interesting. I had not noticed that relationship.

Anyway, pushed as two patches -- the no-op patch to create the "forced
choice" on whether to do the test at each BufferGetPage point, and the
actual feature.

Sadly, I forgot to include the reviewer information when writing the
commit messages. :-(

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Peter Geoghegan

pg@heroku.com

almost 10 years ago

In reply to: Kevin Grittner (#56)

Re: snapshot too old, configured by time

On Fri, Apr 8, 2016 at 12:55 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Sadly, I forgot to include the reviewer information when writing the
commit messages. :-(

Oh well. I'm just glad we got the patch over the line. I think that
there are some types of users that will very significantly benefit
from this patch.

I am reminded of this blog post, written by a friend and former
co-worker: https://brandur.org/postgres-queues

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

David Steele

david@pgmasters.net

almost 10 years ago

In reply to: Peter Geoghegan (#57)

Re: snapshot too old, configured by time

On 4/8/16 4:30 PM, Peter Geoghegan wrote:

On Fri, Apr 8, 2016 at 12:55 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Sadly, I forgot to include the reviewer information when writing the
commit messages. :-(

Oh well. I'm just glad we got the patch over the line. I think that
there are some types of users that will very significantly benefit
from this patch.

I'm also very happy to see this go in. While I used to dread the
"snapshot too old" error back in my O****e architect days it's nice to
have the option, especially when it can be configured by time rather
than by size.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Kevin Grittner (#45)

Re: snapshot too old, configured by time

Kevin Grittner <kgrittn@gmail.com> writes:

On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

Attached is what I think you're talking about for the first patch.
AFAICS this should generate identical executable code to unpatched.
Then the patch to actually implement the feature would, instead
of adding 30-some lines with TestForOldSnapshot() would implement
that as the behavior for the other enum value, and alter those
30-some BufferGetPage() calls.

Álvaro and Michael, is this what you were looking for?

Is everyone else OK with this approach?

After struggling with back-patching a GIN bug fix, I wish to offer up the
considered opinion that this was an impressively bad idea. It's inserted
450 or so pain points for back-patching, which we will have to deal with
for the next five years. Moreover, I do not believe that it will do a
damn thing for ensuring that future calls of BufferGetPage think about
what to do; they'll most likely be copied-and-pasted from nearby calls,
just as people have always done. With luck, the nearby calls will have
the right semantics, but this change won't help very much at all if they
don't.

I think we should revert BufferGetPage to be what it was before (with
no snapshot test) and invent BufferGetPageExtended or similar to be
used in the small number of places that need a snapshot test.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Tom Lane (#59)

Re: snapshot too old, configured by time

On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera> <alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

Attached is what I think you're talking about for the first patch.
AFAICS this should generate identical executable code to unpatched.
Then the patch to actually implement the feature would, instead
of adding 30-some lines with TestForOldSnapshot() would implement
that as the behavior for the other enum value, and alter those
30-some BufferGetPage() calls.

Álvaro and Michael, is this what you were looking for?

Is everyone else OK with this approach?

After struggling with back-patching a GIN bug fix, I wish to offer up the
considered opinion that this was an impressively bad idea. It's inserted
450 or so pain points for back-patching, which we will have to deal with
for the next five years. Moreover, I do not believe that it will do a
damn thing for ensuring that future calls of BufferGetPage think about
what to do; they'll most likely be copied-and-pasted from nearby calls,
just as people have always done. With luck, the nearby calls will have
the right semantics, but this change won't help very much at all if they
don't.

I think we should revert BufferGetPage to be what it was before (with
no snapshot test) and invent BufferGetPageExtended or similar to be
used in the small number of places that need a snapshot test.

I'm not sure what BufferGetPageExtended() buys us over simply
inserting TestForOldSnapshot() where it is needed. Other than that
question, I have no objections to the course outlined, but figure I
should not jump on it without allowing at least a couple days for
discussion. That also may give me time to perform the benchmarks I
wanted -- VPN issues have blocked me from the big test machines so
far. I think I see where the time may be going when the feature is
disabled, and if I'm right I have a fix; but without a big NUMA
machine there is no way to confirm it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Kevin Grittner (#60)

Re: snapshot too old, configured by time

On Mon, Apr 18, 2016 at 9:52 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Sun, Apr 17, 2016 at 5:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Wed, Mar 30, 2016 at 3:26 PM, Alvaro Herrera> <alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

I think we should revert BufferGetPage to be what it was before (with
no snapshot test) and invent BufferGetPageExtended or similar to be
used in the small number of places that need a snapshot test.

I'm not sure what BufferGetPageExtended() buys us over simply
inserting TestForOldSnapshot() where it is needed. Other than that
question, I have no objections to the course outlined, but figure I
should not jump on it without allowing at least a couple days for
discussion. That also may give me time to perform the benchmarks I
wanted -- VPN issues have blocked me from the big test machines so
far. I think I see where the time may be going when the feature is
disabled, and if I'm right I have a fix; but without a big NUMA
machine there is no way to confirm it.

TBH, BufferGetPageExtended() still looks like a good idea to me.
Backpatching those code paths is going to make the maintenance far
harder, on top of the compilation of extensions for perhaps no good
reason. Even if this is a low-level change, if this feature goes in
with 9.6, it would be really good to mention as well that callers of
BufferGetPage should update their calls accordingly if they care about
the checks with the old snapshot. This is a routine used a lot in many
plugins and extensions. Usually such low-level things are not
mentioned in the release notes, but this time I think that's really
important to say it loudly.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Tom Lane (#59)

Re: snapshot too old, configured by time

Tom Lane wrote:

After struggling with back-patching a GIN bug fix, I wish to offer up the
considered opinion that this was an impressively bad idea. It's inserted
450 or so pain points for back-patching, which we will have to deal with
for the next five years. Moreover, I do not believe that it will do a
damn thing for ensuring that future calls of BufferGetPage think about
what to do; they'll most likely be copied-and-pasted from nearby calls,
just as people have always done. With luck, the nearby calls will have
the right semantics, but this change won't help very much at all if they
don't.

I disagree. A developer that sees an unadorned BufferGetPage() call
doesn't stop to think twice about whether they need to add a snapshot
test. Many reviewers will miss the necessary addition also. A
developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least
consider the idea that the flag might be right; if that developer
doesn't think about it, some reviewer may notice a new call with the
flag and consider the idea that the flag may be wrong.

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Alvaro Herrera (#62)

Re: snapshot too old, configured by time

On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Tom Lane wrote:

After struggling with back-patching a GIN bug fix, I wish to offer up the
considered opinion that this was an impressively bad idea. It's inserted
450 or so pain points for back-patching, which we will have to deal with
for the next five years.

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

The other possibility would be to backpatch the no-op patch which
just uses the new syntax without any change in semantics.

I'm not arguing for that; just putting it on the table....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Kevin Grittner (#63)

Re: snapshot too old, configured by time

Kevin Grittner <kgrittn@gmail.com> writes:

On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

The other possibility would be to backpatch the no-op patch which
just uses the new syntax without any change in semantics.

That would break 3rd-party extensions in a minor release, wouldn't it?
Or do I misunderstand your suggestion?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Alvaro Herrera (#62)

Re: snapshot too old, configured by time

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I disagree. A developer that sees an unadorned BufferGetPage() call
doesn't stop to think twice about whether they need to add a snapshot
test. Many reviewers will miss the necessary addition also. A
developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least
consider the idea that the flag might be right; if that developer
doesn't think about it, some reviewer may notice a new call with the
flag and consider the idea that the flag may be wrong.

I'm unconvinced ...

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

I merely point out that the problem came up less than ten days after
that patch hit the tree. If that does not give you pause about the
size of the back-patching problem we've just introduced, it should.

TBH, there is nothing that I like about this feature: not the underlying
concept, not the invasiveness of the implementation, nothing. I would
dearly like to see it reverted altogether. I do not think it is worth
the pain that the current implementation will impose, both on developers
and on potential users. Surely there was another way to get a similar
end result without mucking with things at the level of BufferGetPage.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Tom Lane (#64)

Re: snapshot too old, configured by time

On Mon, Apr 18, 2016 at 8:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

The other possibility would be to backpatch the no-op patch which
just uses the new syntax without any change in semantics.

That would break 3rd-party extensions in a minor release, wouldn't it?
Or do I misunderstand your suggestion?

With a little bit of a change to the headers I think we could avoid
that breakage.

The original no-op patch didn't change the executable code, but it
would have interfered with 3rd-party compiles; but with a minor
adjustment (using a modified name for the BufferGetPage with the
extra parameters), we could avoid that problem. That would seem to
address Álvaro's concern while avoiding five years of backpatch
nightmares.

I don't claim it's an *elegant* solution, but it might be a workable compromise.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tom Lane (#59)

Re: snapshot too old, configured by time

On Sun, Apr 17, 2016 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After struggling with back-patching a GIN bug fix, I wish to offer up the
considered opinion that this was an impressively bad idea. It's inserted
450 or so pain points for back-patching, which we will have to deal with
for the next five years. Moreover, I do not believe that it will do a
damn thing for ensuring that future calls of BufferGetPage think about
what to do; they'll most likely be copied-and-pasted from nearby calls,
just as people have always done. With luck, the nearby calls will have
the right semantics, but this change won't help very much at all if they
don't.

I hit this problem over the weekend, too, when I tried to rebase a
patch a colleague of mine is working on. So I tend to agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Tom Lane (#65)

Re: snapshot too old, configured by time

Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I disagree. A developer that sees an unadorned BufferGetPage() call
doesn't stop to think twice about whether they need to add a snapshot
test. Many reviewers will miss the necessary addition also. A
developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least
consider the idea that the flag might be right; if that developer
doesn't think about it, some reviewer may notice a new call with the
flag and consider the idea that the flag may be wrong.

I'm unconvinced ...

Well, nobody opposed this when I proposed it originally. Robert just
stated that it caused a problem for him while backpatching but didn't
state opinion on reverting that change or not. Maybe we should call for
a vote here.

I understand the backpatching pain argument, but my opinion was the
contrary of yours even so.

I merely point out that the problem came up less than ten days after
that patch hit the tree. If that does not give you pause about the
size of the back-patching problem we've just introduced, it should.

Undersootd. Kevin's idea of applying a no-op syntax change is on the
table. I don't like it either, but ISTM better than the other options
so far.

TBH, there is nothing that I like about this feature: not the underlying
concept, not the invasiveness of the implementation, nothing. I would
dearly like to see it reverted altogether. I do not think it is worth
the pain that the current implementation will impose, both on developers
and on potential users. Surely there was another way to get a similar
end result without mucking with things at the level of BufferGetPage.

Ah well, that's a completely different angle, and perhaps we should
explore this before doing anything in the back branches.

So it seems to me we have these options

1) revert the whole feature
2) revert the BufferGetPage syntax change
3) apply a no-op syntax change so that BufferGetPage looks the same on
backbranches as it does on master today, keeping API and ABI
compatibility with existing code
4) do nothing

Any others? (If we decide to call for a vote, I suggest we open a new
thread)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Tom Lane (#65)

Re: snapshot too old, configured by time

On Mon, Apr 18, 2016 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Surely there was another way to get a similar end result without
mucking with things at the level of BufferGetPage.

To get the feature that some customers have been demanding, a check
has to be made somewhere near where any page is read in a scan. It
didn't take me long in working on this to notice that grepping for
BufferGetPage() calls was a good way to find candidate spots to
insert the check (even if only 7% of BufferGetPage() calls need to
be followed by such a check) -- but the BufferGetPage() itself
clearly does *not* need to be modified to implement the feature.

We could:

(1) Add calls to a check function where needed, and just document
that addition of a BufferGetPage() call should be considered a clue
that a new check might be needed. (original plan)

(2) Replace the 7% of the BufferGetPage() calls that need to check
the age of the snapshot with something that wraps the two function
calls, and does nothing but call one and then the other. (favored
by Michael)

(3) Add parameters to BufferGetPage() to specify whether the check
is needed and provide sufficient information to perform the check
if it is. (current master)

(4) Replace BufferGetPage() with some other function name having
the characteristics of (3) to minimize back-patch pain.
(grudgingly favored by Álvaro)

(5) Revert from community code, leaving it as an EDB value-add
Advanced Server feature.

Does someone see another (better) alternative?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Kevin Grittner (#69)

Re: snapshot too old, configured by time

Kevin Grittner <kgrittn@gmail.com> writes:

On Mon, Apr 18, 2016 at 8:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Surely there was another way to get a similar end result without
mucking with things at the level of BufferGetPage.

To get the feature that some customers have been demanding, a check
has to be made somewhere near where any page is read in a scan.

I'm not really convinced that we need to define it exactly like that,
though. In particular, why not just kill transactions as soon as their
oldest snapshot is too old? That might not work exactly like this does,
but it would have some pretty substantial benefits --- for one, that the
timeout could be configured locally per session rather than having to be
PGC_POSTMASTER. And it would likely be far easier to limit the
performance side-effects.

I complained about this back when the feature was first discussed,
and you insisted that that answer was no good, and I figured I'd hold
my nose and look the other way as long as the final patch wasn't too
invasive. Well, now we've seen the end result, and it's very invasive
and has got performance issues as well. It's time to reconsider.

Or in short: this is a whole lot further than I'm prepared to go to
satisfy one customer with a badly-designed application. And from what
I can tell from the Feb 2015 discussion, that's what this has been
written for.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tom Lane (#70)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 3:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or in short: this is a whole lot further than I'm prepared to go to
satisfy one customer with a badly-designed application. And from what
I can tell from the Feb 2015 discussion, that's what this has been
written for.

This holds true. I imagine that a lot of people at least on this list
have already spent some time in tracking down long-running
transactions in someone's application and actually tuned the
application so as the bloat gets reduced and things perform better for
other transactions taking a shorter time. Without the need of this
feature.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Michael Paquier (#71)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 1:23 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Apr 19, 2016 at 3:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Or in short: this is a whole lot further than I'm prepared to go to
satisfy one customer with a badly-designed application. And from what
I can tell from the Feb 2015 discussion, that's what this has been
written for.

This holds true. I imagine that a lot of people at least on this list
have already spent some time in tracking down long-running
transactions in someone's application and actually tuned the
application so as the bloat gets reduced and things perform better for
other transactions taking a shorter time. Without the need of this
feature.

So I don't want to be too vociferous in defending a feature that (a)
was written by a colleague and (b) obviously isn't perfect, but I will
point out that:

1. There was a surprising amount of positive reaction when Kevin first
proposed this. I expected a lot more people to say this kind of thing
at the beginning, when Kevin first brought this up, but in fact a
number of people wrote into say they'd really like to have this.
Those positive reaction shouldn't be forgotten just because those
people aren't wading into a discussion about the merits of adding
arguments to BufferGetPage.

2. Without this feature, you can kill sessions or transactions to
control bloat, but this feature is properly thought of as a way to
avoid bloat *without* killing sessions or transactions. You can let
the session live, without having it generate bloat, just so long as it
doesn't try to touch any data that has been recently modified. We
have no other feature in PostgreSQL that does something like that.

At the moment, what I see happening is that Tom, the one person who
has hated this feature since the beginning, still hates it, and we're
on the verge of asking Kevin to revert it because (1) Tom hates it and
(2) Kevin changed the BufferGetPage stuff in the way that Alvaro
requested. I think that's not quite fair. If we want to demand that
this feature be reverted because it causes a performance loss even
when turned off, I get that. If we think that it's badly implemented,
fine, I get that, too. But asking somebody to revert a patch because
the author adjusted things to match what the reviewer wanted is not
fair. The right thing to do about that is just change it back to the
way Kevin had it originally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Robert Haas (#72)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The right thing to do about that is just change it back to the
way Kevin had it originally.

Since this change to BufferGetPage() has caused severe back-patch
pain for at least two committers so far, I will revert that (very
recent) change to this patch later today unless I hear an
objections.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Kevin Grittner (#73)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The right thing to do about that is just change it back to the
way Kevin had it originally.

Since this change to BufferGetPage() has caused severe back-patch
pain for at least two committers so far, I will revert that (very
recent) change to this patch later today unless I hear an
objections.

I vote for back-patching a no-op change instead, as discussed elsewhere.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Alvaro Herrera (#74)

Re: snapshot too old, configured by time

On 2016-04-19 12:03:22 -0300, Alvaro Herrera wrote:

Kevin Grittner wrote:

On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The right thing to do about that is just change it back to the
way Kevin had it originally.

Since this change to BufferGetPage() has caused severe back-patch
pain for at least two committers so far, I will revert that (very
recent) change to this patch later today unless I hear an
objections.

I vote for back-patching a no-op change instead, as discussed elsewhere.

What about Tom's argument that that'd be problematic for external code?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Alvaro Herrera (#74)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 11:03 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Kevin Grittner wrote:

On Tue, Apr 19, 2016 at 6:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The right thing to do about that is just change it back to the
way Kevin had it originally.

Since this change to BufferGetPage() has caused severe back-patch
pain for at least two committers so far, I will revert that (very
recent) change to this patch later today unless I hear an
objections.

I vote for back-patching a no-op change instead, as discussed elsewhere.

That wouldn't have fixed my problem, which involved rebasing a patch.
I really think it's also a bad precedent to back-patch things into
older branches that are not themselves bug fixes. Users count on us
not to destabilize older branches, and that means being minimalist
about what we put into them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Robert Haas (#76)

Re: snapshot too old, configured by time

Andres Freund wrote:

On 2016-04-19 12:03:22 -0300, Alvaro Herrera wrote:

Since this change to BufferGetPage() has caused severe back-patch
pain for at least two committers so far, I will revert that (very
recent) change to this patch later today unless I hear an
objections.

I vote for back-patching a no-op change instead, as discussed elsewhere.

What about Tom's argument that that'd be problematic for external code?

Kevin offered to code it in a way that maintains ABI and API
compatibility with some trickery.

Robert Haas wrote:

That wouldn't have fixed my problem, which involved rebasing a patch.

True. I note that it's possible to munge a patch mechanically to sort
out this situation.

I really think it's also a bad precedent to back-patch things into
older branches that are not themselves bug fixes. Users count on us
not to destabilize older branches, and that means being minimalist
about what we put into them.

Well, this wouldn't change the inner working of the code at all, only
how it looks, so it wouldn't affect users. I grant that it would affect
developers of forks.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: CA+TgmoYNLPJk-b4waDg5rAXopWatk37JXYc5ZFrprFAes3A@mail.gmail.com20160419150548.ud6u5kdgr7gtaw3a@alap3.anarazel.de | Resolved by subject fallback

#78

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Alvaro Herrera (#77)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 11:02 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Andres Freund wrote:

On 2016-04-19 12:03:22 -0300, Alvaro Herrera wrote:

Since this change to BufferGetPage() has caused severe back-patch
pain for at least two committers so far, I will revert that (very
recent) change to this patch later today unless I hear an
objections.

I vote for back-patching a no-op change instead, as discussed elsewhere.

What about Tom's argument that that'd be problematic for external code?

Kevin offered to code it in a way that maintains ABI and API
compatibility with some trickery.

I pointed out that it would be possible to do so, but specifically
said I wasn't arguing for that. We would need to create a new name
for what BufferGetPage() does on master, and have that call the old
BufferGetPage() on back-branches. That seems pretty ugly.

I tend to think that the original approach, while it puts the
burden on coders to recognize when TestForOldSnapshot() must be
called, is no more onerous than many existing issues coders much
worry about -- like whether to add something to outfuncs.c, as an
example. I have been skeptical of the nanny approach all along,
and after seeing the impact of having it in the tree for a few
days, I really am inclined to pull back and put this on the same
footing as the other things hackers need to learn and tend to as
they code.

Robert Haas wrote:

That wouldn't have fixed my problem, which involved rebasing a patch.

True. I note that it's possible to munge a patch mechanically to sort
out this situation.

I admit it is possible. I'm becoming more convinced with each post
that it's the wrong approach. I feel like I have been in the
modern version of an Æsop fable here:

http://www.bartleby.com/17/1/62.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Kevin Grittner (#78)

Re: snapshot too old, configured by time

Kevin Grittner wrote:

On Tue, Apr 19, 2016 at 11:02 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

That wouldn't have fixed my problem, which involved rebasing a patch.

True. I note that it's possible to munge a patch mechanically to sort
out this situation.

I admit it is possible. I'm becoming more convinced with each post
that it's the wrong approach. I feel like I have been in the
modern version of an ï¿½sop fable here:

http://www.bartleby.com/17/1/62.html

LOL.

Well, it seems I'm outvoted. I leave you to do with your donkey as you
please.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Alvaro Herrera (#79)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 12:49 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Well, it seems I'm outvoted.

The no-op change to attempt to force an explicit choice of whether
to test for snapshot age after calling BufferGetPage() has been
reverted. This eliminates about 500 back-patching pain points in
65 files.

In case anyone notices some code left at the bottom of bufmgr.h
related to inline functions, that was left on purpose, because I am
pretty sure that the fix for the performance regression observed
when the "snapshot too old" feature is disabled will involve making
at least part of TestForOldSnapshot() an inline function -- so it
seemed dumb to rip that out now only to put it back again right
away.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#81

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Kevin Grittner (#80)

Re: snapshot too old, configured by time

On Wed, Apr 20, 2016 at 8:50 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

In case anyone notices some code left at the bottom of bufmgr.h
related to inline functions, that was left on purpose, because I am
pretty sure that the fix for the performance regression observed
when the "snapshot too old" feature is disabled will involve making
at least part of TestForOldSnapshot() an inline function -- so it
seemed dumb to rip that out now only to put it back again right
away.

I pushed something along those lines. I didn't want to inline the
whole function because IsCatalogRelation() and
RelationIsAccessibleInLogicalDecoding() seemed kinda big to inline
and require rel.h to be included; so bringing them into bufmgr.h
would have spread that around too far. Putting the quick tests in
an inline function which calls a non-inlined _impl function seemed
like the best compromise.

My connectivity problems to our big NUMA machines have not yet
been resolved, so I didn't have a better test case for this than
200 read-only clients at saturation on my single-socket i7, which
was only a 2.2% to 2.3% regression -- so I encourage anyone who was
able to create something more significant with
old_snapshot_threshold = -1 to try with the latest and report the
impact for your environment. I'm not sure whether any more is
needed here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82

Bruce Momjian

bruce@momjian.us

over 9 years ago

In reply to: Robert Haas (#72)

Re: snapshot too old, configured by time

On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote:

2. Without this feature, you can kill sessions or transactions to
control bloat, but this feature is properly thought of as a way to
avoid bloat *without* killing sessions or transactions. You can let
the session live, without having it generate bloat, just so long as it
doesn't try to touch any data that has been recently modified. We
have no other feature in PostgreSQL that does something like that.

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice. I can see why the patch author worked hard
to do that.

How does/did Oracle handle this? I assume we can't find a way to set
this per session, right?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Bruce Momjian (#82)

Re: snapshot too old, configured by time

On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Apr 19, 2016 at 07:38:04AM -0400, Robert Haas wrote:

2. Without this feature, you can kill sessions or transactions to
control bloat, but this feature is properly thought of as a way to
avoid bloat *without* killing sessions or transactions. You can let
the session live, without having it generate bloat, just so long as it
doesn't try to touch any data that has been recently modified. We
have no other feature in PostgreSQL that does something like that.

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice. I can see why the patch author worked hard
to do that.

How does/did Oracle handle this?

IIRC then Oracle gives this error when the space in undo tablespace (aka
rollback segment) is low. When the rollback segment gets full, it
overwrites the changed data which might be required by some old snapshot
and when that old snapshot statement tries to access the data (which is
already overwritten), it gets "snapshot too old" error. Assuming there is
enough space in rollback segment, Oracle seems to provide a way via Alter
System set undo_retention = <time_in_secs>.

Now, if the above understanding of mine is correct, then I think the
current implementation done by Kevin is closer to what Oracle provides.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#84

Bruce Momjian

bruce@momjian.us

over 9 years ago

In reply to: Amit Kapila (#83)

Re: snapshot too old, configured by time

On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:

On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice.ï¿½ I can see why the patch author worked hard
to do that.

How does/did Oracle handle this?

IIRC then Oracle gives this error when the space in undo tablespace (aka
rollback segment) is low.ï¿½ When the rollback segment gets full, it overwrites
the changed data which might be required by some old snapshot and when that old
snapshot statement tries to access the data (which is already overwritten), it
gets "snapshot too old" error.ï¿½ Assuming there is enough space in rollback
segment, Oracle seems to provide a way via Alter System set undo_retention =
<time_in_secs>.ï¿½

Now, if the above understanding of mine is correct, then I think the current
implementation done by Kevin is closer to what Oracle provides.

But does the rollback only happen if the long-running Oracle transaction
tries to _access_ specific data that was in the undo segment, or _any_
data that potentially could have been in the undo segment? If the
later, it seems Kevin's approach is better because you would have to
actually need to access old data that was there to be canceled, not just
any data that could have been overwritten based on the xid.

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied? If the former, does
Kevin's new code allow us to do the later?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Bruce Momjian (#84)

1 attachment(s)

Re: snapshot too old, configured by time

On Sat, Apr 23, 2016 at 7:50 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:

On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice. I can see why the patch author worked hard
to do that.

How does/did Oracle handle this?

IIRC then Oracle gives this error when the space in undo tablespace (aka
rollback segment) is low. When the rollback segment gets full, it

overwrites

the changed data which might be required by some old snapshot and when

that old

snapshot statement tries to access the data (which is already

overwritten), it

gets "snapshot too old" error. Assuming there is enough space in

rollback

segment, Oracle seems to provide a way via Alter System set

undo_retention =

<time_in_secs>.

Now, if the above understanding of mine is correct, then I think the

current

implementation done by Kevin is closer to what Oracle provides.

But does the rollback only happen if the long-running Oracle transaction
tries to _access_ specific data that was in the undo segment, or _any_
data that potentially could have been in the undo segment?

It does when long running transaction tries to access specific data. If
you want to know in more detail then you can read slides 7~29 from the
attached presentation (with focus on slides 28 and 29).

If the
later, it seems Kevin's approach is better because you would have to
actually need to access old data that was there to be canceled, not just
any data that could have been overwritten based on the xid.

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied?

As per my understanding, the error is given when any transaction tries to
access the data.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#86

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Bruce Momjian (#84)

Re: snapshot too old, configured by time

On Sat, Apr 23, 2016 at 5:20 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:

On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice. I can see why the patch author worked hard
to do that.

How does/did Oracle handle this?

IIRC then Oracle gives this error when the space in undo tablespace (aka
rollback segment) is low. When the rollback segment gets full, it

overwrites

the changed data which might be required by some old snapshot and when

that old

snapshot statement tries to access the data (which is already

overwritten), it

gets "snapshot too old" error. Assuming there is enough space in

rollback

segment, Oracle seems to provide a way via Alter System set

undo_retention =

<time_in_secs>.

Now, if the above understanding of mine is correct, then I think the

current

implementation done by Kevin is closer to what Oracle provides.

But does the rollback only happen if the long-running Oracle transaction
tries to _access_ specific data that was in the undo segment, or _any_
data that potentially could have been in the undo segment? If the
later, it seems Kevin's approach is better because you would have to
actually need to access old data that was there to be canceled, not just
any data that could have been overwritten based on the xid.

I'm not sure that we should rely that much on Oracle behavior. It has very
different MVCC model.
Thus we can't apply same features one-by-one: they would have different pro
and cons for us.

Also, it seems we have similar behavior already in applying WAL on the

standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied? If the former, does
Kevin's new code allow us to do the later?

That makes sense for me. If we could improve read-only queries on slaves
this way, Kevin's new code becomes much more justified.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#87

Bruce Momjian

bruce@momjian.us

over 9 years ago

In reply to: Bruce Momjian (#84)

Re: snapshot too old, configured by time

On Sat, Apr 23, 2016 at 10:20:19AM -0400, Bruce Momjian wrote:

On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:

On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice.ï¿½ I can see why the patch author worked hard
to do that.

As I understand it, a transaction trying to access a shared buffer
aborts if there was a cleanup on the page that removed rows it might be
interested in. How does this handle cases where vacuum removes _pages_
from the table? Does vacuum avoid this when there are running
transactions?

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied? If the former, does
Kevin's new code allow us to do the later?

Is this a TODO item?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Bruce Momjian (#87)

Re: snapshot too old, configured by time

On Sun, May 1, 2016 at 11:54 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Sat, Apr 23, 2016 at 10:20:19AM -0400, Bruce Momjian wrote:

On Sat, Apr 23, 2016 at 12:48:08PM +0530, Amit Kapila wrote:

On Sat, Apr 23, 2016 at 8:34 AM, Bruce Momjian <bruce@momjian.us> wrote:

I kind of agreed with Tom about just aborting transactions that held
snapshots for too long, and liked the idea this could be set per
session, but the idea that we abort only if a backend actually touches
the old data is very nice. I can see why the patch author worked hard
to do that.

As I understand it, a transaction trying to access a shared buffer
aborts if there was a cleanup on the page that removed rows it might be
interested in. How does this handle cases where vacuum removes _pages_
from the table?

(1) When the "snapshot too old" feature is enabled
(old_snapshot_threshold >= 0) relations are not truncated, so pages
cannot be removed that way. This mainly protects against a seq
scan having the problem you describe.

(2) Other than a seq scan, you could miss a page when descending
through an index or following sibling pointers within an index. In
either case you can't remove a page without modifying the page
pointing to it to no longer do so, so the modified LSN on the
parent or sibling will trigger the error.

Note that a question has recently been raised regarding hash
indexes (which should perhaps be generalized to any non-WAL-logged
index on a permanent table. Since that is a correctness issue,
versus a performance issue affecting only how many people will find
the feature useful, I will add that to the release blockers list
and prioritize it ahead of those issues only affecting how many
people will find it useful.

Does vacuum avoid this when there are running transactions?

I'm not sure I understand the question.

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied? If the former, does
Kevin's new code allow us to do the later?

Is this a TODO item?

I'm not aware of any TODO items existing or needed here. The
feature operates by adjusting the xmin used by vacuum and pruning,
and leaving all the other mechanisms functioning as they were.
That looked to me like it should interact with replication streams
correctly. If someone sees something that needs adjustment please
speak up Real Soon Now.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89

Bruce Momjian

bruce@momjian.us

over 9 years ago

In reply to: Kevin Grittner (#88)

Re: snapshot too old, configured by time

On Mon, May 2, 2016 at 03:50:36PM -0500, Kevin Grittner wrote:

Also, it seems we have similar behavior already in applying WAL on the
standby --- we delay WAL replay when there is a long-running
transaction. Once the time expires, we apply the WAL. Do we cancel the
long-running transaction at that time, or wait for the long-running
transaction to touch some WAL we just applied? If the former, does
Kevin's new code allow us to do the later?

Is this a TODO item?

I'm not aware of any TODO items existing or needed here. The
feature operates by adjusting the xmin used by vacuum and pruning,
and leaving all the other mechanisms functioning as they were.
That looked to me like it should interact with replication streams
correctly. If someone sees something that needs adjustment please
speak up Real Soon Now.

My question is whether this method could also be used to avoid read-only
query cancel when we force replay of a conflicting wal record. Could we
wait for the read-only query to try to _access_ some old data before
cancelling it?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers