fixing old_snapshot_threshold&#39;s time-&gt;xid mapping

robertmhaas@gmail.com

over 5 years ago

In reply to: Andres Freund (#2)

Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn. What did you
think of my suggestion of making head_timestamp artificially move
backward to simulate the passage of time?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Robert Haas (#3)

Re: fixing old_snapshot_threshold's time->xid mapping

Hi,

On 2020-04-16 13:34:39 -0400, Robert Haas wrote:

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn.

Given the number of things that need to be addressed around the feature,
I am not too concerned about that.

What did you think of my suggestion of making head_timestamp
artificially move backward to simulate the passage of time?

I don't think it allows to exercise the various cases well enough. We
need to be able to test this feature both interactively as well as in a
scripted manner. Edge cases like wrapping around in the time mapping imo
can not easily be tested by moving the head timestamp back.

Greetings,

Andres Freund

thomas.munro@gmail.com

over 5 years ago

In reply to: Andres Freund (#4)

1 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Fri, Apr 17, 2020 at 5:46 AM Andres Freund <andres@anarazel.de> wrote:

On 2020-04-16 13:34:39 -0400, Robert Haas wrote:

On Thu, Apr 16, 2020 at 1:14 PM Andres Freund <andres@anarazel.de> wrote:

I still think we need a way to test this without waiting for hours to
hit various edge cases. You argued against a fixed binning of
old_snapshot_threshold/100 arguing its too coarse. How about a 1000 or
so? For 60 days, the current max for old_snapshot_threshold, that'd be a
granularity of 01:26:24, which seems fine. The best way I can think of
that'd keep current GUC values sensible is to change
old_snapshot_threshold to be float. Ugly, but ...?

Yeah, 1000 would be a lot better. However, if we switch to a fixed
number of bins, it's going to be a lot more code churn.

Given the number of things that need to be addressed around the feature,
I am not too concerned about that.

What did you think of my suggestion of making head_timestamp
artificially move backward to simulate the passage of time?

I don't think it allows to exercise the various cases well enough. We
need to be able to test this feature both interactively as well as in a
scripted manner. Edge cases like wrapping around in the time mapping imo
can not easily be tested by moving the head timestamp back.

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.
Perhaps you could abuse that, like so, from test scripts:

postgres=# select * from pg_old_snapshot_time_mapping();
array_offset | end_timestamp | newest_xmin
--------------+------------------------+-------------
0 | 3000-01-01 13:00:00+13 | 490
(1 row)

postgres=# select pg_clobber_current_snapshot_timestamp('3000-01-01 00:01:00Z');
pg_clobber_current_snapshot_timestamp
---------------------------------------

(1 row)

postgres=# select * from pg_old_snapshot_time_mapping();
array_offset | end_timestamp | newest_xmin
--------------+------------------------+-------------
0 | 3000-01-01 13:01:00+13 | 490
1 | 3000-01-01 13:02:00+13 | 490
(2 rows)

Attachments:

0003-Add-pg_clobber_current_snapshot_timestamp.patchtext/x-patch; charset=US-ASCII; name=0003-Add-pg_clobber_current_snapshot_timestamp.patchDownload

From 76fe56b732cdc420aeb7cb3b2adcc1e45343b0f7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 17 Apr 2020 14:10:35 +1200
Subject: [PATCH 3/3] Add pg_clobber_current_snapshot_timestamp().

---
 contrib/old_snapshot/old_snapshot--1.0.sql |  5 +++++
 contrib/old_snapshot/time_mapping.c        | 13 +++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
index 9ebb8829e3..aacf1704b5 100644
--- a/contrib/old_snapshot/old_snapshot--1.0.sql
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -11,4 +11,9 @@ RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
 LANGUAGE C STRICT;
 
+CREATE FUNCTION pg_clobber_current_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'pg_clobber_current_snapshot_timestamp'
+LANGUAGE C STRICT;
+
 -- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
index 37e0055a00..8728c4ddb5 100644
--- a/contrib/old_snapshot/time_mapping.c
+++ b/contrib/old_snapshot/time_mapping.c
@@ -36,6 +36,7 @@ typedef struct
 
 PG_MODULE_MAGIC;
 PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+PG_FUNCTION_INFO_V1(pg_clobber_current_snapshot_timestamp);
 
 static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
 static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
@@ -157,3 +158,15 @@ MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mappi
 
 	return heap_form_tuple(tupdesc, values, nulls);
 }
+
+Datum
+pg_clobber_current_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	PG_RETURN_NULL();
+}
-- 
2.20.1

thomas.munro@gmail.com

over 5 years ago

In reply to: Thomas Munro (#5)

6 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.

Here's a draft TAP test that uses that technique successfully, as a
POC. It should probably be extended to cover more cases, but I
thought I'd check what people thought of the concept first before
going further. I didn't see a way to do overlapping transactions with
PostgresNode.pm, so I invented one (please excuse the bad perl); am I
missing something? Maybe it'd be better to do 002 with an isolation
test instead, but I suppose 001 can't be in an isolation test, since
it needs to connect to multiple databases, and it seemed better to do
them both the same way. It's also not entirely clear to me that
isolation tests can expect a database to be fresh and then mess with
dangerous internal state, whereas TAP tests set up and tear down a
cluster each time.

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

Attachments:

v2-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Expose-oldSnapshotControl.patchDownload

From b78644a0f9580934b136ca8413366de91198203f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v2 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +----------------------
 src/include/utils/old_snapshot.h | 75 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 1c063c592c..abaaea569a 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 0000000000..284af7d508
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
+	TransactionId latest_xmin;	/* latest snapshot xmin */
+	TimestampTz next_map_update;	/* latest snapshot valid up to */
+	slock_t		mutex_threshold;	/* protect threshold fields */
+	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;	/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
+	 * is full and the head must be advanced to add new entries.  Use
+	 * timestamps aligned to minute boundaries, since that seems less
+	 * surprising than aligning based on the first usage timestamp.  The
+	 * latest bucket is effectively stored within latest_xmin.  The circular
+	 * buffer is updated when we get a new xmin value that doesn't fall into
+	 * the same interval.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;	/* subscript of oldest tracked time */
+	TimestampTz head_timestamp; /* time corresponding to head xid */
+	int			count_used;		/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotControlData;
+
+extern volatile OldSnapshotControlData *oldSnapshotControl;
+
+#endif
-- 
2.20.1

v2-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v2-0002-contrib-old_snapshot-time-xid-mapping.patchDownload

From e23d46f8482561fc2369525d5625058dad0ded5c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:14:32 -0400
Subject: [PATCH v2 2/6] contrib/old_snapshot: time->xid mapping.

---
 contrib/Makefile                           |   1 +
 contrib/old_snapshot/Makefile              |  24 ++++
 contrib/old_snapshot/old_snapshot--1.0.sql |  14 ++
 contrib/old_snapshot/old_snapshot.control  |   5 +
 contrib/old_snapshot/time_mapping.c        | 159 +++++++++++++++++++++
 5 files changed, 203 insertions(+)
 create mode 100644 contrib/old_snapshot/Makefile
 create mode 100644 contrib/old_snapshot/old_snapshot--1.0.sql
 create mode 100644 contrib/old_snapshot/old_snapshot.control
 create mode 100644 contrib/old_snapshot/time_mapping.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..452ade0782 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		old_snapshot	\
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
new file mode 100644
index 0000000000..091231f25f
--- /dev/null
+++ b/contrib/old_snapshot/Makefile
@@ -0,0 +1,24 @@
+# contrib/old_snapshot/Makefile
+
+MODULE_big = old_snapshot
+OBJS = \
+	$(WIN32RES) \
+	time_mapping.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = old_snapshot
+DATA = old_snapshot--1.0.sql
+PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
+
+REGRESS = old_snapshot
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/old_snapshot
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
new file mode 100644
index 0000000000..9ebb8829e3
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -0,0 +1,14 @@
+/* contrib/old_snapshot/old_snapshot--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION old_snapshot" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_old_snapshot_time_mapping(array_offset OUT int4,
+											 end_timestamp OUT timestamptz,
+											 newest_xmin OUT xid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
+LANGUAGE C STRICT;
+
+-- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/old_snapshot.control b/contrib/old_snapshot/old_snapshot.control
new file mode 100644
index 0000000000..491eec536c
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot.control
@@ -0,0 +1,5 @@
+# old_snapshot extension
+comment = 'utilities in support of old_snapshot_threshold'
+default_version = '1.0'
+module_pathname = '$libdir/old_snapshot'
+relocatable = true
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
new file mode 100644
index 0000000000..37e0055a00
--- /dev/null
+++ b/contrib/old_snapshot/time_mapping.c
@@ -0,0 +1,159 @@
+/*-------------------------------------------------------------------------
+ *
+ * time_mapping.c
+ *	  time to XID mapping information
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  contrib/old_snapshot/time_mapping.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+/*
+ * Backend-private copy of the information from oldSnapshotControl which relates
+ * to the time to XID mapping, plus an index so that we can iterate.
+ *
+ * Note that the length of the xid_by_minute array is given by
+ * OLD_SNAPSHOT_TIME_MAP_ENTRIES (which is not a compile-time constant).
+ */
+typedef struct
+{
+	int				current_index;
+	int				head_offset;
+	TimestampTz		head_timestamp;
+	int				count_used;
+	TransactionId	xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotTimeMapping;
+
+#define NUM_TIME_MAPPING_COLUMNS 3
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+
+static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
+static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
+static HeapTuple MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc,
+												 OldSnapshotTimeMapping *mapping);
+
+/*
+ * SQL-callable set-returning function.
+ */
+Datum
+pg_old_snapshot_time_mapping(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	OldSnapshotTimeMapping *mapping;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+		mapping = GetOldSnapshotTimeMapping();
+		funcctx->user_fctx = mapping;
+		funcctx->tuple_desc = MakeOldSnapshotTimeMappingTupleDesc();
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	mapping = (OldSnapshotTimeMapping *) funcctx->user_fctx;
+
+	while (mapping->current_index < mapping->count_used)
+	{
+		HeapTuple	tuple;
+
+		tuple = MakeOldSnapshotTimeMappingTuple(funcctx->tuple_desc, mapping);
+		++mapping->current_index;
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Get the old snapshot time mapping data from shared memory.
+ */
+static OldSnapshotTimeMapping *
+GetOldSnapshotTimeMapping(void)
+{
+	OldSnapshotTimeMapping *mapping;
+
+	mapping = palloc(offsetof(OldSnapshotTimeMapping, xid_by_minute)
+					 + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
+	mapping->current_index = 0;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	mapping->head_offset = oldSnapshotControl->head_offset;
+	mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+	mapping->count_used = oldSnapshotControl->count_used;
+	for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+		mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	return mapping;
+}
+
+/*
+ * Build a tuple descriptor for the pg_old_snapshot_time_mapping() SRF.
+ */
+static TupleDesc
+MakeOldSnapshotTimeMappingTupleDesc(void)
+{
+	TupleDesc	tupdesc;
+
+	tupdesc = CreateTemplateTupleDesc(NUM_TIME_MAPPING_COLUMNS);
+
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "array_offset",
+					   INT4OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_timestamp",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "newest_xmin",
+					   XIDOID, -1, 0);
+
+	return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Convert one entry from the old snapshot time mapping to a HeapTuple.
+ */
+static HeapTuple
+MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mapping)
+{
+	Datum	values[NUM_TIME_MAPPING_COLUMNS];
+	bool	nulls[NUM_TIME_MAPPING_COLUMNS];
+	int		array_position;
+	TimestampTz	timestamp;
+
+	/*
+	 * Figure out the array position corresponding to the current index.
+	 *
+	 * Index 0 means the oldest entry in the mapping, which is stored at
+	 * mapping->head_offset. Index 1 means the next-oldest entry, which is a the
+	 * following index, and so on. We wrap around when we reach the end of the array.
+	 */
+	array_position = (mapping->head_offset + mapping->current_index)
+		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+
+	/*
+	 * No explicit timestamp is stored for any entry other than the oldest one,
+	 * but each entry corresponds to 1-minute period, so we can just add.
+	 */
+	timestamp = TimestampTzPlusMilliseconds(mapping->head_timestamp,
+											mapping->current_index * 60000);
+
+	/* Initialize nulls and values arrays. */
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = Int32GetDatum(array_position);
+	values[1] = TimestampTzGetDatum(timestamp);
+	values[2] = TransactionIdGetDatum(mapping->xid_by_minute[array_position]);
+
+	return heap_form_tuple(tupdesc, values, nulls);
+}
-- 
2.20.1

v2-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload

From 73d5a1c55b9c3f0166623d7a89b7fb1a73160c42 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:15:57 -0400
Subject: [PATCH v2 3/6] Fix bugs in MaintainOldSnapshotTimeMapping.

---
 src/backend/utils/time/snapmgr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index abaaea569a..72b2c61a07 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1926,10 +1926,32 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	else
 	{
 		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
+		int			distance_to_new_tail;
+		int			distance_to_current_tail;
+		int			advance;
 
-		oldSnapshotControl->head_timestamp = ts;
+		/*
+		 * Our goal is for the new "tail" of the mapping, that is, the entry
+		 * which is newest and thus furthest from the "head" entry, to
+		 * correspond to "ts". Since there's one entry per minute, the
+		 * distance between the current head and the new tail is just the
+		 * number of minutes of difference between ts and the current
+		 * head_timestamp.
+		 *
+		 * The distance from the current head to the current tail is one
+		 * less than the number of entries in the mapping, because the
+		 * entry at the head_offset is for 0 minutes after head_timestamp.
+		 *
+		 * The difference between these two values is the number of minutes
+		 * by which we need to advance the mapping, either adding new entries
+		 * or rotating old ones out.
+		 */
+		distance_to_new_tail =
+			(ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE;
+		distance_to_current_tail =
+			oldSnapshotControl->count_used - 1;
+		advance = distance_to_new_tail - distance_to_current_tail;
+		Assert(advance > 0);
 
 		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
 		{
@@ -1937,6 +1959,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 			oldSnapshotControl->head_offset = 0;
 			oldSnapshotControl->count_used = 1;
 			oldSnapshotControl->xid_by_minute[0] = xmin;
+			oldSnapshotControl->head_timestamp = ts;
 		}
 		else
 		{
@@ -1955,6 +1978,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 					else
 						oldSnapshotControl->head_offset = old_head + 1;
 					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+					oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
 				}
 				else
 				{
-- 
2.20.1

v2-0004-Add-pg_clobber_current_snapshot_timestamp.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Add-pg_clobber_current_snapshot_timestamp.patchDownload

From a594fb6f551ee545f66182c7f2098c73e196416c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 17 Apr 2020 14:10:35 +1200
Subject: [PATCH v2 4/6] Add pg_clobber_current_snapshot_timestamp().

---
 contrib/old_snapshot/old_snapshot--1.0.sql |  5 +++++
 contrib/old_snapshot/time_mapping.c        | 13 +++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
index 9ebb8829e3..aacf1704b5 100644
--- a/contrib/old_snapshot/old_snapshot--1.0.sql
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -11,4 +11,9 @@ RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
 LANGUAGE C STRICT;
 
+CREATE FUNCTION pg_clobber_current_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'pg_clobber_current_snapshot_timestamp'
+LANGUAGE C STRICT;
+
 -- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
index 37e0055a00..8728c4ddb5 100644
--- a/contrib/old_snapshot/time_mapping.c
+++ b/contrib/old_snapshot/time_mapping.c
@@ -36,6 +36,7 @@ typedef struct
 
 PG_MODULE_MAGIC;
 PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+PG_FUNCTION_INFO_V1(pg_clobber_current_snapshot_timestamp);
 
 static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
 static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
@@ -157,3 +158,15 @@ MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mappi
 
 	return heap_form_tuple(tupdesc, values, nulls);
 }
+
+Datum
+pg_clobber_current_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	PG_RETURN_NULL();
+}
-- 
2.20.1

v2-0005-Truncate-old-snapshot-XIDs-before-truncating-CLOG.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Truncate-old-snapshot-XIDs-before-truncating-CLOG.patchDownload

From 81163b9592f7762d83549962ad4e50e2abb80c07 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 17 Apr 2020 15:18:49 +1200
Subject: [PATCH v2 5/6] Truncate old snapshot XIDs before truncating CLOG.

---
 contrib/old_snapshot/Makefile          |  2 +-
 contrib/old_snapshot/t/001_truncate.pl | 80 ++++++++++++++++++++++++++
 src/backend/commands/vacuum.c          |  3 +
 src/backend/utils/time/snapmgr.c       | 21 +++++++
 src/include/utils/snapmgr.h            |  1 +
 5 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 contrib/old_snapshot/t/001_truncate.pl

diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
index 091231f25f..c839d30346 100644
--- a/contrib/old_snapshot/Makefile
+++ b/contrib/old_snapshot/Makefile
@@ -10,7 +10,7 @@ EXTENSION = old_snapshot
 DATA = old_snapshot--1.0.sql
 PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
 
-REGRESS = old_snapshot
+TAP_TESTS = 1
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/old_snapshot/t/001_truncate.pl b/contrib/old_snapshot/t/001_truncate.pl
new file mode 100644
index 0000000000..d6c0def00f
--- /dev/null
+++ b/contrib/old_snapshot/t/001_truncate.pl
@@ -0,0 +1,80 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension old_snapshot');
+
+note "check time map is truncated when CLOG is";
+
+# build up a time map with 4 entries
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:00:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:01:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:02:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:03:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+my $count;
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 4, "expected to have 4 entries in the old snapshot time map");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# we expect all XIDs to have been truncated
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 0, "expected to have 0 entries in the old snapshot time map");
+
+# put two more in the map
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:04:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:05:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 2, "expected to have 2 entries in the old snapshot time map");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes (this tests wrapping around the mapping array, which is of size 10 + 10)...
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:21:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 18, "expected to have 18 entries in the old snapshot time map");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# this should leave just 16
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 16, "expected to have 16 entries in the old snapshot time map");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# we should now be back to empty
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 0, "expected to have 0 entries in the old snapshot time map");
+
+$node->stop;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5a110edb07..37ead45fa5 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1627,6 +1627,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 72b2c61a07..d604e69270 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1998,6 +1998,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b28d13ce84..4f53aad956 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -135,6 +135,7 @@ extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmi
 														 Relation relation);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
-- 
2.20.1

v2-0006-Add-TAP-test-for-snapshot-too-old.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Add-TAP-test-for-snapshot-too-old.patchDownload

From e2ab8832fa476b14b44f7452985b1b553bd44b6a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 18 Apr 2020 16:16:06 +1200
Subject: [PATCH v2 6/6] Add TAP test for snapshot-too-old.

---
 contrib/old_snapshot/t/002_too_old.pl |  75 ++++++++++++++++
 src/test/perl/PsqlSession.pm          | 122 ++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)
 create mode 100644 contrib/old_snapshot/t/002_too_old.pl
 create mode 100644 src/test/perl/PsqlSession.pm

diff --git a/contrib/old_snapshot/t/002_too_old.pl b/contrib/old_snapshot/t/002_too_old.pl
new file mode 100644
index 0000000000..097a464763
--- /dev/null
+++ b/contrib/old_snapshot/t/002_too_old.pl
@@ -0,0 +1,75 @@
+# Simple test of early pruning and snapshot-too-old errors.
+use strict;
+use warnings;
+use PostgresNode;
+use PsqlSession;
+use TestLib;
+use Test::More tests => 5;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->start;
+$node->psql('postgres', 'create extension old_snapshot');
+$node->psql('postgres', 'create table t (i int)');
+$node->psql('postgres', 'insert into t select generate_series(1, 42)');
+
+# start an interactive session that we can use to interleave statements
+my $session = PsqlSession->new($node, "postgres");
+$session->send("\\set PROMPT1 ''\n", 2);
+$session->send("\\set PROMPT2 ''\n", 1);
+
+my @lines;
+my $command_tag;
+my $result;
+
+# begin a session that we can interleave with vacuum activity
+@lines = $session->send("begin transaction isolation level repeatable read;\n", 2);
+shift @lines;
+$command_tag = shift @lines;
+is($command_tag, "BEGIN");
+
+# take a snapshot at time 0
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:00:00Z')");
+@lines = $session->send("select * from t order by i limit 1;\n", 2);
+shift @lines;
+$result = shift @lines;
+is($result, "1");
+
+# advance time by 10 minutes, then UPDATE and VACUUM the table
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:10:00Z')");
+$node->psql('postgres', "update t set i = 1001 where i = 1");
+$node->psql('postgres', "vacuum analyze t");
+
+# our snapshot is not too old yet, so we can still use it
+@lines = $session->send("select * from t order by i limit 1;\n", 2);
+shift @lines;
+$result = shift @lines;
+is($result, "1");
+
+# advance time by 10 more minutes, then UPDATE and VACUUM the table
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:20:00Z')");
+$node->psql('postgres', "update t set i = 1001 where i = 1");
+$node->psql('postgres', "vacuum analyze t");
+
+# our snapshot is not too old yet, so we can still use it
+@lines = $session->send("select * from t order by i limit 1;\n", 2);
+shift @lines;
+$result = shift @lines;
+is($result, "1");
+
+# advance time by just one more minute, then UPDATE and VACUUM the table
+$node->psql('postgres', "select pg_clobber_current_snapshot_timestamp('3000-01-01 00:21:00Z')");
+$node->psql('postgres', "update t set i = 1002 where i = 1");
+$node->psql('postgres', "vacuum analyze t");
+
+# our snapshot is too old!  the thing it wants to see has been removed
+@lines = $session->send("select * from t order by i limit 1;\n", 2);
+shift @lines;
+$result = shift @lines;
+is($result, "ERROR:  snapshot too old");
+
+$session->close;
+$node->stop;
diff --git a/src/test/perl/PsqlSession.pm b/src/test/perl/PsqlSession.pm
new file mode 100644
index 0000000000..b34dcca502
--- /dev/null
+++ b/src/test/perl/PsqlSession.pm
@@ -0,0 +1,122 @@
+=pod
+
+=head1 NAME
+
+PsqlSession - class representing psql connection
+
+=head1 SYNOPSIS
+
+  use PsqlSession;
+
+  my $node = PostgresNode->get_new_node('mynode');
+  my $session = PsqlSession->new($node, "dbname");
+
+  # send simple query and wait for one line response
+  my $result = $session->send("SELECT 42;", 1);
+
+  # close connection
+  $session->close();
+
+=head1 DESCRIPTION
+
+PsqlSession allows for tests of interleaved operations, similar to
+isolation tests.
+
+=cut
+
+package PsqlSession;
+
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use IPC::Run qw(pump finish timer);
+
+our @EXPORT = qw(
+  new
+  send
+  close
+);
+
+=pod
+
+=head1 METHODS
+
+=over
+
+=item PsqlSession::new($class, $node, $dbname)
+
+Create a new PsqlSession instance, connected to a database.
+
+=cut
+
+sub new
+{
+	my ($class, $node, $dbname) = @_;
+	my $timer = timer(5);
+	my $stdin = '';
+	my $stdout = '';
+	my $harness = $node->interactive_psql($dbname, \$stdin, \$stdout, $timer);
+	my $self = {
+		_harness => $harness,
+		_stdin => \$stdin,
+		_stdout => \$stdout,
+		_timer => $timer
+	};
+	bless $self, $class;
+	return $self;
+}
+
+=pod
+
+=item $session->send($input, $lines)
+
+Send the given input to psql, and then wait for the given number of lines
+of output, or a timeout.
+
+=cut
+
+sub count_lines
+{
+	my ($s) = @_;
+	return $s =~ tr/\n//;
+}
+
+sub send
+{
+	my ($self, $statement, $lines) = @_;
+	${$self->{_stdout}} = '';
+	${$self->{_stdin}} .= $statement;
+	$self->{_timer}->start(5);
+	pump $self->{_harness} until count_lines(${$self->{_stdout}}) == $lines || $self->{_timer}->is_expired;
+	die "expected ${lines} lines but after timeout, received only: ${$self->{_stdout}}" if $self->{_timer}->is_expired;
+	my @result = split /\n/, ${$self->{_stdout}};
+	chop(@result);
+	return @result;
+}
+
+=pod
+
+=item $session->close()
+
+Close a PsqlSession connection.
+
+=cut
+
+sub close
+{
+	my ($self) = @_;
+	$self->{_timer}->start(5);
+	${$self->{_stdin}} .= "\\q\n";
+	finish $self->{_harness} or die "psql returned $?";
+	$self->{_timer}->reset;
+}
+
+=pod
+
+=back
+
+=cut
+
+1;
-- 
2.20.1

dilipbalaut@gmail.com

over 5 years ago

In reply to: Thomas Munro (#6)

Re: fixing old_snapshot_threshold's time->xid mapping

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Apr 17, 2020 at 2:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.

Here's a draft TAP test that uses that technique successfully, as a
POC. It should probably be extended to cover more cases, but I
thought I'd check what people thought of the concept first before
going further. I didn't see a way to do overlapping transactions with
PostgresNode.pm, so I invented one (please excuse the bad perl); am I
missing something? Maybe it'd be better to do 002 with an isolation
test instead, but I suppose 001 can't be in an isolation test, since
it needs to connect to multiple databases, and it seemed better to do
them both the same way. It's also not entirely clear to me that
isolation tests can expect a database to be fresh and then mess with
dangerous internal state, whereas TAP tests set up and tear down a
cluster each time.

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

IMHO that doesn't seems to be a problem. Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right. The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Thomas Munro (#5)

Re: fixing old_snapshot_threshold's time->xid mapping

Hi,

On 2020-04-17 14:12:44 +1200, Thomas Munro wrote:

What about a contrib function that lets you clobber
oldSnapshotControl->current_timestamp? It looks like all times in
this system come ultimately from GetSnapshotCurrentTimestamp(), which
uses that variable to make sure that time never goes backwards.

It'd be better than the current test situation, and probably would be
good to have as part of testing anyway (since it'd allow to make the
tests not take long / be racy on slow machines). But I still don't think
it really allows to test the feature in a natural way. It makes it
easier to test for know edge cases / problems, but not really discover
unknown ones. For that I think we need more granular bins.

- Andres

dilipbalaut@gmail.com

over 5 years ago

In reply to: Robert Haas (#1)

Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Apr 16, 2020 at 10:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

Hi,

I'm starting a new thread for this, because the recent discussion of
problems with old_snapshot_threshold[1] touched on a lot of separate
issues, and I think it will be too confusing if we discuss all of them
on one thread. Attached are three patches.

0001 makes oldSnapshotControl "extern" rather than "static" and
exposes the struct definition via a header.

0002 adds a contrib module called old_snapshot which makes it possible
to examine the time->XID mapping via SQL. As Andres said, the comments
are not really adequate in the existing code, and the code itself is
buggy, so it was a little hard to be sure that I was understanding the
intended meaning of the different fields correctly. However, I gave it
a shot.

0003 attempts to fix bugs in MaintainOldSnapshotTimeMapping() so that
it produces a sensible mapping. I encountered and tried to fix two
issues here:

First, as previously discussed, the branch that advances the mapping
should not categorically do "oldSnapshotControl->head_timestamp = ts;"
assuming that the head_timestamp is supposed to be the timestamp for
the oldest bucket rather than the newest one. Rather, there are three
cases: (1) resetting the mapping resets head_timestamp, (2) extending
the mapping by an entry without dropping an entry leaves
head_timestamp alone, and (3) overwriting the previous head with a new
entry advances head_timestamp by 1 minute.

Second, the calculation of the number of entries by which the mapping
should advance is incorrect. It thinks that it should advance by the
number of minutes between the current head_timestamp and the incoming
timestamp. That would be correct if head_timestamp were the most
recent entry in the mapping, but it's actually the oldest. As a
result, without this fix, every time we move into a new minute, we
advance the mapping much further than we actually should. Instead of
advancing by 1, we advance by the number of entries that already exist
in the mapping - which means we now have entries that correspond to
times which are in the future, and don't advance the mapping again
until those future timestamps are in the past.

With these fixes, I seem to get reasonably sensible mappings, at least
in light testing. I tried running this in one window with \watch 10:

select *, age(newest_xmin), clock_timestamp() from
pg_old_snapshot_time_mapping();

And in another window I ran:

pgbench -T 300 -R 10

And the age does in fact advance by ~600 transactions per minute.

I have started reviewing these patches. I think, the fixes looks right to me.

+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);

I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10

thomas.munro@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#7)

2 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

IMHO that doesn't seems to be a problem. Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right. The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

Right, thanks. I got confused about that, and misdiagnosed something
I was seeing.

Here's a new version:

0004: Instead of writing a new kind of TAP test to demonstrate
snapshot-too-old errors, I adjusted the existing isolation tests to
use the same absolute time control technique. Previously I had
invented a way to do isolation tester-like stuff in TAP tests, which
might be interesting but strange new perl is not necessary for this.

0005: Truncates the time map when the CLOG is truncated. Its test is
now under src/test/module/snapshot_too_old/t/001_truncate.sql.

These apply on top of Robert's patches, but the only dependency is on
his patch 0001 "Expose oldSnapshotControl.", because now I have stuff
in src/test/module/snapshot_too_old/test_sto.c that wants to mess with
that object too.

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Attachments:

v3-0004-Rewrite-the-snapshot_too_old-tests-with-absolute-.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Rewrite-the-snapshot_too_old-tests-with-absolute-.patchDownload

From 9640c654e97e8704b041e207f2fd02070d2dd057 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 16:23:02 +1200
Subject: [PATCH v3 4/5] Rewrite the "snapshot_too_old" tests with absolute
 timestamps.

Previously the snapshot too old feature used a special test value for
old_snapshot_threshold.  Instead, use a new approach based on clobbering
the "current" timestamp used in snapshot-too-old book keeping, so that
we can control the timeline precisely without having to resort to
sleeping or special test branches in the code that are different than
what is used in production.
---
 src/backend/utils/time/snapmgr.c              | 24 ------
 src/test/modules/snapshot_too_old/Makefile    | 21 ++---
 .../expected/sto_using_cursor.out             | 75 +++++++----------
 .../expected/sto_using_hash_index.out         | 26 ++++--
 .../expected/sto_using_select.out             | 80 ++++++++++++-------
 .../specs/sto_using_cursor.spec               | 30 ++++---
 .../specs/sto_using_hash_index.spec           | 19 ++++-
 .../specs/sto_using_select.spec               | 32 +++++---
 src/test/modules/snapshot_too_old/sto.conf    |  2 +-
 .../snapshot_too_old/test_sto--1.0.sql        | 14 ++++
 src/test/modules/snapshot_too_old/test_sto.c  | 74 +++++++++++++++++
 .../modules/snapshot_too_old/test_sto.control |  5 ++
 12 files changed, 264 insertions(+), 138 deletions(-)
 create mode 100644 src/test/modules/snapshot_too_old/test_sto--1.0.sql
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.c
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.control

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 72b2c61a07..19e6c52b80 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1739,26 +1739,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		update_ts = oldSnapshotControl->next_map_update;
 		SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
-		/*
-		 * Zero threshold always overrides to latest xmin, if valid.  Without
-		 * some heuristic it will find its own snapshot too old on, for
-		 * example, a simple UPDATE -- which would make it useless for most
-		 * testing, but there is no principled way to ensure that it doesn't
-		 * fail in this way.  Use a five-second delay to try to get useful
-		 * testing behavior, but this may need adjustment.
-		 */
-		if (old_snapshot_threshold == 0)
-		{
-			if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
-				&& TransactionIdFollows(latest_xmin, xlimit))
-				xlimit = latest_xmin;
-
-			ts -= 5 * USECS_PER_SEC;
-			SetOldSnapshotThresholdTimestamp(ts, xlimit);
-
-			return xlimit;
-		}
-
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
 
@@ -1860,10 +1840,6 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	if (!map_update_required)
 		return;
 
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
 	/*
 	 * We don't want to do something stupid with unusual values, but we don't
 	 * want to litter the log with warnings or break otherwise normal
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..be5ad77b7e 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -1,14 +1,20 @@
 # src/test/modules/snapshot_too_old/Makefile
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+MODULE_big = test_sto
+OBJS = \
+	$(WIN32RES) \
+	test_sto.o
+
+EXTENSION = test_sto
+DATA = test_sto--1.0.sql
+PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
-# Disabled because these tests require "old_snapshot_threshold" >= 0, which
-# typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require "old_snapshot_threshold" = 10, which
+# typical installcheck users do not have (e.g. buildfarm clients) and also
+# because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
@@ -21,8 +27,3 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# But it can nonetheless be very helpful to run tests on preexisting
-# installation, allow to do so, but only if requested explicitly.
-installcheck-force:
-	$(pg_isolation_regress_installcheck) $(ISOLATION)
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
index 8cc29ec82f..b007e2dc17 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -1,73 +1,60 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1decl s1f1 t10 s2u s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s1sleep s2u s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
-c              
+               
+step s1f3: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+test_sto_reset_all_state
 
-1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+               
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+starting permutation: t00 s1decl s1f1 t10 s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s2u s1sleep s1f2
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
-
-starting permutation: s1decl s2u s1f1 s1sleep s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
-
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s2u s1decl s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
+               
+step s1f3: FETCH FIRST FROM cursor1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+test_sto_reset_all_state
 
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
index bf94054790..091c212adc 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
@@ -1,15 +1,31 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: noseq s1f1 s2sleep s2u s1f2
+starting permutation: t00 noseq s1f1 t10 s2u s2v1 s1f2 t22 s2v2 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step noseq: SET enable_seqscan = false;
 step s1f1: SELECT c FROM sto1 where c = 1000;
 c              
 
 1000           
-step s2sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1000;
+step s2v1: VACUUM sto1;
 step s1f2: SELECT c FROM sto1 where c = 1001;
+c              
+
+step t22: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2v2: VACUUM sto1;
+step s1f3: SELECT c FROM sto1 where c = 1001;
 ERROR:  snapshot too old
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
index eb15bc23bf..d1402ae8ce 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_select.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -1,55 +1,77 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1f1 t01 s2u s1f2 t10 s1f3 t11 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s1sleep s2u s1f2
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
 
-starting permutation: s1f1 s2u s1sleep s1f2
+               
+
+starting permutation: t00 s1f1 t01 s1f2 t10 s1f3 t11 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+c              
 
-starting permutation: s2u s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
+
+               
+unused step name: t02
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
index eac18ca5b9..3be084b8fe 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using a cursor.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+    CREATE EXTENSION IF NOT EXISTS test_sto;
+    SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,16 +17,29 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+    DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
 step "s1f1"		{ FETCH FIRST FROM cursor1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ FETCH FIRST FROM cursor1; }
+step "s1f3"		{ FETCH FIRST FROM cursor1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t20"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:20 (not before,
+# because we need page pruning to see the xmin level change from 10 minutes earlier)
+permutation "t00" "s1decl" "s1f1" "t10" "s2u" "s1f2" "t20" "s1f3"
+
+# if there's no update, no snapshot too old error at time 00:20
+permutation "t00" "s1decl" "s1f1" "t10"       "s1f2" "t20" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
index 33d91ff852..f90bca3b7a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
@@ -1,8 +1,12 @@
 # This test is like sto_using_select, except that we test access via a
-# hash index.
+# hash index.  Explicit vacuuming is required in this version because
+# there is are no incidental calls to heap_page_prune_opt() that can
+# call SetOldSnapshotThresholdTimestamp().
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
     CREATE INDEX idx_sto1 ON sto1 USING HASH (c);
@@ -15,6 +19,7 @@ setup
 teardown
 {
     DROP TABLE sto1;
+	SELECT test_sto_reset_all_state();
 }
 
 session "s1"
@@ -22,10 +27,18 @@ setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "noseq"	{ SET enable_seqscan = false; }
 step "s1f1"		{ SELECT c FROM sto1 where c = 1000; }
 step "s1f2"		{ SELECT c FROM sto1 where c = 1001; }
+step "s1f3"		{ SELECT c FROM sto1 where c = 1001; }
 teardown		{ ROLLBACK; }
 
 session "s2"
-step "s2sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1000; }
+step "s2v1"		{ VACUUM sto1; }
+step "s2v2"		{ VACUUM sto1; }
 
-permutation "noseq" "s1f1" "s2sleep" "s2u" "s1f2"
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t22"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z'); }
+
+# snapshot too old at t22
+permutation "t00" "noseq" "s1f1" "t10" "s2u" "s2v1" "s1f2" "t22" "s2v2" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
index d7c34f3d89..21f25ca73a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using SELECT statements.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,15 +17,30 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+	DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f3"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f4"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t01"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z'); }
+step "t02"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:02:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t11"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:11
+permutation "t00" "s1f1" "t01" "s2u" "s1f2" "t10" "s1f3" "t11" "s1f4"
+
+# without the update, we get no snapshot too old error at time 00:11
+permutation "t00" "s1f1" "t01"       "s1f2" "t10" "s1f3" "t11" "s1f4"
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
index 7eeaeeb0dc..5ed46b3560 100644
--- a/src/test/modules/snapshot_too_old/sto.conf
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -1,2 +1,2 @@
 autovacuum = off
-old_snapshot_threshold = 0
+old_snapshot_threshold = 10
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
new file mode 100644
index 0000000000..c10afcf23a
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -0,0 +1,14 @@
+/* src/test/modules/snapshot_too_old/test_sto--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_sto" to load this file. \quit
+
+CREATE FUNCTION test_sto_clobber_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_clobber_snapshot_timestamp'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_reset_all_state()
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
new file mode 100644
index 0000000000..f6c9a1a000
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_sto.c
+ *	  Functions to support isolation tests for snapshot too old.
+ *
+ * These functions are not intended for use in a production database and
+ * could cause corruption.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  src/test/modules/snapshot_too_old/test_sto.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
+PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+
+/*
+ * Revert to initial state.  This is not safe except in carefully
+ * controlled tests.
+ */
+Datum
+test_sto_reset_all_state(PG_FUNCTION_ARGS)
+{
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->count_used = 0;
+	oldSnapshotControl->current_timestamp = 0;
+	oldSnapshotControl->head_offset = 0;
+	oldSnapshotControl->head_timestamp = 0;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	oldSnapshotControl->latest_xmin = InvalidTransactionId;
+	oldSnapshotControl->next_map_update = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = 0;
+	oldSnapshotControl->threshold_xid = InvalidTransactionId;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Update the minimum time used in snapshot-too-old code.  If set ahead of the
+ * current wall clock time (for example, the year 3000), this allows testing
+ * with arbitrary times.  This is not safe except in carefully controlled
+ * tests.
+ */
+Datum
+test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/snapshot_too_old/test_sto.control b/src/test/modules/snapshot_too_old/test_sto.control
new file mode 100644
index 0000000000..e497e5450e
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.control
@@ -0,0 +1,5 @@
+# test_sto test module
+comment = 'functions for internal testing of snapshot too old errors'
+default_version = '1.0'
+module_pathname = '$libdir/test_sto'
+relocatable = true
-- 
2.20.1

v3-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload

From c56dc1f15a71a988f84e573f4a54322643236143 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 17:05:42 +1200
Subject: [PATCH v3 5/5] Truncate snapshot-too-old time map when CLOG is
 truncated.

It's not safe to leave xids in the map that have wrapped around.

Reported-by: Andres Freund
---
 src/backend/commands/vacuum.c                 |  3 +
 src/backend/utils/time/snapmgr.c              | 21 +++++
 src/include/utils/snapmgr.h                   |  1 +
 src/test/modules/snapshot_too_old/Makefile    |  2 +
 .../snapshot_too_old/t/001_truncate.pl        | 80 +++++++++++++++++++
 .../snapshot_too_old/test_sto--1.0.sql        |  5 ++
 src/test/modules/snapshot_too_old/test_sto.c  | 16 ++++
 7 files changed, 128 insertions(+)
 create mode 100644 src/test/modules/snapshot_too_old/t/001_truncate.pl

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5a110edb07..37ead45fa5 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1627,6 +1627,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 19e6c52b80..edb47c9664 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1974,6 +1974,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b28d13ce84..4f53aad956 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -135,6 +135,7 @@ extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmi
 														 Relation relation);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index be5ad77b7e..0a69f3a232 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/s
 # because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/snapshot_too_old/t/001_truncate.pl b/src/test/modules/snapshot_too_old/t/001_truncate.pl
new file mode 100644
index 0000000000..849c70c5ec
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/001_truncate.pl
@@ -0,0 +1,80 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension test_sto');
+
+note "check time map is truncated when CLOG is";
+
+# build up a time map with 4 entries
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:02:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:03:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+my $count;
+$node->psql('postgres', "select test_sto_time_map_size()", stdout => \$count);
+is($count, 4, "expected to have 4 entries in the old snapshot time map");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# we expect all XIDs to have been truncated
+$node->psql('postgres', "select test_sto_time_map_size()", stdout => \$count);
+is($count, 0, "expected to have 0 entries in the old snapshot time map");
+
+# put two more in the map
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:04:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:05:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_time_map_size()", stdout => \$count);
+is($count, 2, "expected to have 2 entries in the old snapshot time map");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes (this tests wrapping around the mapping array, which is of size 10 + 10)...
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:21:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_time_map_size()", stdout => \$count);
+is($count, 18, "expected to have 18 entries in the old snapshot time map");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# this should leave just 16
+$node->psql('postgres', "select test_sto_time_map_size()", stdout => \$count);
+is($count, 16, "expected to have 16 entries in the old snapshot time map");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# we should now be back to empty
+$node->psql('postgres', "select test_sto_time_map_size()", stdout => \$count);
+is($count, 0, "expected to have 0 entries in the old snapshot time map");
+
+$node->stop;
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
index c10afcf23a..b4cc970ba4 100644
--- a/src/test/modules/snapshot_too_old/test_sto--1.0.sql
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -12,3 +12,8 @@ CREATE FUNCTION test_sto_reset_all_state()
 RETURNS VOID
 AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
 LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_time_map_size()
+RETURNS int4
+AS 'MODULE_PATHNAME', 'test_sto_time_map_size'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
index f6c9a1a000..5b874a9641 100644
--- a/src/test/modules/snapshot_too_old/test_sto.c
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -22,6 +22,7 @@
 PG_MODULE_MAGIC;
 PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
 PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+PG_FUNCTION_INFO_V1(test_sto_time_map_size);
 
 /*
  * Revert to initial state.  This is not safe except in carefully
@@ -72,3 +73,18 @@ test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
 
 	PG_RETURN_NULL();
 }
+
+/*
+ * How many xids are currently in the xid/time map?  Used only for testing.
+ */
+Datum
+test_sto_time_map_size(PG_FUNCTION_ARGS)
+{
+	int		result;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	result = oldSnapshotControl->count_used;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	PG_RETURN_INT32(result);
+}
-- 
2.20.1

#11

dilipbalaut@gmail.com

over 5 years ago

In reply to: Thomas Munro (#10)

Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I think I found another bug in MaintainOldSnapshotTimeMapping(): if
you make time jump by more than old_snapshot_threshold in one go, then
the map gets cleared and then no early pruning or snapshot-too-old
errors happen. That's why in 002_too_old.pl it currently advances
time by 10 minutes twice, instead of 20 minutes once. To be
continued.

IMHO that doesn't seems to be a problem. Because even if we jump more
than old_snapshot_threshold in one go we don't clean complete map
right. The latest snapshot timestamp will become the headtimestamp.
So in TransactionIdLimitedForOldSnapshots if (current_ts -
old_snapshot_threshold) is still >= head_timestap then we can still do
early pruning.

Right, thanks. I got confused about that, and misdiagnosed something
I was seeing.

Here's a new version:

0004: Instead of writing a new kind of TAP test to demonstrate
snapshot-too-old errors, I adjusted the existing isolation tests to
use the same absolute time control technique. Previously I had
invented a way to do isolation tester-like stuff in TAP tests, which
might be interesting but strange new perl is not necessary for this.

0005: Truncates the time map when the CLOG is truncated. Its test is
now under src/test/module/snapshot_too_old/t/001_truncate.sql.

These apply on top of Robert's patches, but the only dependency is on
his patch 0001 "Expose oldSnapshotControl.", because now I have stuff
in src/test/module/snapshot_too_old/test_sto.c that wants to mess with
that object too.

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch? Basically, in
xid_by_minute we have some xids with the current epoch. Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no. Do you wnated to this case or you had some other
case in mind which you wnated to test?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12

thomas.munro@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#11)

Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch? Basically, in
xid_by_minute we have some xids with the current epoch. Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no. Do you wnated to this case or you had some other
case in mind which you wnated to test?

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot. An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow). For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error. Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

#13

dilipbalaut@gmail.com

over 5 years ago

In reply to: Thomas Munro (#12)

Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Apr 20, 2020 at 6:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Apr 20, 2020 at 11:24 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Apr 18, 2020 at 9:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Apr 18, 2020 at 11:47 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Is this an improvement? I realise that there is still nothing to
actually verify that early pruning has actually happened. I haven't
thought of a good way to do that yet (stats, page inspection, ...).

Could we test the early pruning using xid-burn patch? Basically, in
xid_by_minute we have some xids with the current epoch. Now, we burns
more than 2b xid and then if we try to vacuum we might hit the case of
early pruning no. Do you wnated to this case or you had some other
case in mind which you wnated to test?

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot. An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow). For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error. Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

Okay, got your point. Can we try to implement some test functions
that can just call visibilitymap_get_status function internally? I
agree that we will have to pass the correct block number but that we
can find using TID. Or for testing, we can create a very small
relation that just has 1 block?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#14

robertmhaas@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#9)

Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have started reviewing these patches. I think, the fixes looks right to me.
+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);
I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?

When I did it that way, it complained about "const" and I couldn't
immediately figure out how to fix it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

thomas.munro@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#13)

2 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 8:02 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Apr 20, 2020 at 12:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I mean I want to verify that VACUUM or heap prune actually removed a
tuple that was visible to an old snapshot. An idea I just had: maybe
sto_using_select.spec should check the visibility map (somehow). For
example, the sto_using_select.spec (the version in the patch I just
posted) just checks that after time 00:11, the old snapshot gets a
snapshot-too-old error. Perhaps we could add a VACUUM before that,
and then check that the page has become all visible, meaning that the
dead tuple our snapshot could see has now been removed.

Okay, got your point. Can we try to implement some test functions
that can just call visibilitymap_get_status function internally? I
agree that we will have to pass the correct block number but that we
can find using TID. Or for testing, we can create a very small
relation that just has 1 block?

I think it's enough to check SELECT EVERY(all_visible) FROM
pg_visibility_map('sto1'::regclass). I realised that
src/test/module/snapshot_too_old is allowed to install
contrib/pg_visibility with EXTRA_INSTALL, so here's a new version to
try that idea. It extends sto_using_select.spec to VACUUM and check
the vis map at key times. That allows us to check that early pruning
really happens once the snapshot becomes too old. There are other
ways you could check that but this seems quite "light touch" compared
to something based on page inspection.

I also changed src/test/module/snapshot_too_old/t/001_truncate.pl back
to using Robert's contrib/old_snapshot extension to know the size of
the time/xid map, allowing an introspection function to be dropped
from test_sto.c.

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

Attachments:

v4-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Rewrite-the-snapshot_too_old-tests.patchDownload

From 28f81e93e8058d64c80d2b2b605a57f48ceebc0b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 16:23:02 +1200
Subject: [PATCH v4 4/5] Rewrite the "snapshot_too_old" tests.

Previously the snapshot too old feature used a special test value for
old_snapshot_threshold.  Instead, use a new approach based on explicitly
advancing the "current" timestamp used in snapshot-too-old book keeping,
so that we can control the timeline precisely without having to resort
to sleeping or special test branches in the code.

Also check that early pruning actually happens, by vacuuming and
inspecting the visibility map at key points in the test schedule.
---
 src/backend/utils/time/snapmgr.c              |  24 ---
 src/test/modules/snapshot_too_old/Makefile    |  23 +--
 .../expected/sto_using_cursor.out             |  75 ++++-----
 .../expected/sto_using_hash_index.out         |  26 ++-
 .../expected/sto_using_select.out             | 157 +++++++++++++++---
 .../specs/sto_using_cursor.spec               |  30 ++--
 .../specs/sto_using_hash_index.spec           |  19 ++-
 .../specs/sto_using_select.spec               |  50 ++++--
 src/test/modules/snapshot_too_old/sto.conf    |   2 +-
 .../snapshot_too_old/test_sto--1.0.sql        |  14 ++
 src/test/modules/snapshot_too_old/test_sto.c  |  74 +++++++++
 .../modules/snapshot_too_old/test_sto.control |   5 +
 12 files changed, 366 insertions(+), 133 deletions(-)
 create mode 100644 src/test/modules/snapshot_too_old/test_sto--1.0.sql
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.c
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.control

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 72b2c61a07..19e6c52b80 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1739,26 +1739,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		update_ts = oldSnapshotControl->next_map_update;
 		SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
-		/*
-		 * Zero threshold always overrides to latest xmin, if valid.  Without
-		 * some heuristic it will find its own snapshot too old on, for
-		 * example, a simple UPDATE -- which would make it useless for most
-		 * testing, but there is no principled way to ensure that it doesn't
-		 * fail in this way.  Use a five-second delay to try to get useful
-		 * testing behavior, but this may need adjustment.
-		 */
-		if (old_snapshot_threshold == 0)
-		{
-			if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
-				&& TransactionIdFollows(latest_xmin, xlimit))
-				xlimit = latest_xmin;
-
-			ts -= 5 * USECS_PER_SEC;
-			SetOldSnapshotThresholdTimestamp(ts, xlimit);
-
-			return xlimit;
-		}
-
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
 
@@ -1860,10 +1840,6 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	if (!map_update_required)
 		return;
 
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
 	/*
 	 * We don't want to do something stupid with unusual values, but we don't
 	 * want to litter the log with warnings or break otherwise normal
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..81836e9953 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -1,14 +1,22 @@
 # src/test/modules/snapshot_too_old/Makefile
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+MODULE_big = test_sto
+OBJS = \
+	$(WIN32RES) \
+	test_sto.o
+
+EXTENSION = test_sto
+DATA = test_sto--1.0.sql
+PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
+
+EXTRA_INSTALL = contrib/pg_visibility
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
-# Disabled because these tests require "old_snapshot_threshold" >= 0, which
-# typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require "old_snapshot_threshold" = 10, which
+# typical installcheck users do not have (e.g. buildfarm clients) and also
+# because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
@@ -21,8 +29,3 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# But it can nonetheless be very helpful to run tests on preexisting
-# installation, allow to do so, but only if requested explicitly.
-installcheck-force:
-	$(pg_isolation_regress_installcheck) $(ISOLATION)
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
index 8cc29ec82f..b007e2dc17 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -1,73 +1,60 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1decl s1f1 t10 s2u s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s1sleep s2u s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
-c              
+               
+step s1f3: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+test_sto_reset_all_state
 
-1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+               
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+starting permutation: t00 s1decl s1f1 t10 s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s2u s1sleep s1f2
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
-
-starting permutation: s1decl s2u s1f1 s1sleep s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
-
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s2u s1decl s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
+               
+step s1f3: FETCH FIRST FROM cursor1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+test_sto_reset_all_state
 
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
index bf94054790..091c212adc 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
@@ -1,15 +1,31 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: noseq s1f1 s2sleep s2u s1f2
+starting permutation: t00 noseq s1f1 t10 s2u s2v1 s1f2 t22 s2v2 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step noseq: SET enable_seqscan = false;
 step s1f1: SELECT c FROM sto1 where c = 1000;
 c              
 
 1000           
-step s2sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1000;
+step s2v1: VACUUM sto1;
 step s1f2: SELECT c FROM sto1 where c = 1001;
+c              
+
+step t22: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2v2: VACUUM sto1;
+step s1f3: SELECT c FROM sto1 where c = 1001;
 ERROR:  snapshot too old
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
index eb15bc23bf..201c106754 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_select.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -1,55 +1,164 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t11 s2vac1 s2vis3 s1f3 t12 s1f4 s2vac2 s2vis4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s1sleep s2u s1f2
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+test_sto_reset_all_state
+
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t11 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s2u s1sleep s1f2
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+test_sto_reset_all_state
 
-starting permutation: s2u s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2vis2 s1f2 t11 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+c              
+
+1              
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
index eac18ca5b9..3be084b8fe 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using a cursor.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+    CREATE EXTENSION IF NOT EXISTS test_sto;
+    SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,16 +17,29 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+    DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
 step "s1f1"		{ FETCH FIRST FROM cursor1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ FETCH FIRST FROM cursor1; }
+step "s1f3"		{ FETCH FIRST FROM cursor1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t20"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:20 (not before,
+# because we need page pruning to see the xmin level change from 10 minutes earlier)
+permutation "t00" "s1decl" "s1f1" "t10" "s2u" "s1f2" "t20" "s1f3"
+
+# if there's no update, no snapshot too old error at time 00:20
+permutation "t00" "s1decl" "s1f1" "t10"       "s1f2" "t20" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
index 33d91ff852..f90bca3b7a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
@@ -1,8 +1,12 @@
 # This test is like sto_using_select, except that we test access via a
-# hash index.
+# hash index.  Explicit vacuuming is required in this version because
+# there is are no incidental calls to heap_page_prune_opt() that can
+# call SetOldSnapshotThresholdTimestamp().
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
     CREATE INDEX idx_sto1 ON sto1 USING HASH (c);
@@ -15,6 +19,7 @@ setup
 teardown
 {
     DROP TABLE sto1;
+	SELECT test_sto_reset_all_state();
 }
 
 session "s1"
@@ -22,10 +27,18 @@ setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "noseq"	{ SET enable_seqscan = false; }
 step "s1f1"		{ SELECT c FROM sto1 where c = 1000; }
 step "s1f2"		{ SELECT c FROM sto1 where c = 1001; }
+step "s1f3"		{ SELECT c FROM sto1 where c = 1001; }
 teardown		{ ROLLBACK; }
 
 session "s2"
-step "s2sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1000; }
+step "s2v1"		{ VACUUM sto1; }
+step "s2v2"		{ VACUUM sto1; }
 
-permutation "noseq" "s1f1" "s2sleep" "s2u" "s1f2"
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t22"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z'); }
+
+# snapshot too old at t22
+permutation "t00" "noseq" "s1f1" "t10" "s2u" "s2v1" "s1f2" "t22" "s2v2" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
index d7c34f3d89..cd7a97b742 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -1,19 +1,15 @@
 # This test provokes a "snapshot too old" error using SELECT statements.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	CREATE EXTENSION IF NOT EXISTS pg_visibility;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,15 +18,47 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+	DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f3"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f4"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
 teardown		{ COMMIT; }
 
 session "s2"
+step "s2vis1"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+step "s2vis2"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac1"	{ VACUUM sto1; }
+step "s2vis3"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac2"	{ VACUUM sto1; }
+step "s2vis4"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t01"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z'); }
+step "t11"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z'); }
+step "t12"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z'); }
+
+# If there's an update, we get a snapshot too old error at time 00:12, and
+# VACUUM is allowed to remove the tuple our snapshot could see, which we know
+# because we see that the relation becomes all visible.  The earlier VACUUMs
+# were unable to remove the tuple we could see, which is is obvious because we
+# can see the row with value 1, and from the relation not being all visible
+# after the VACUUM.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t11" "s2vac1" "s2vis3" "s1f3" "t12" "s1f4" "s2vac2" "s2vis4"
+
+# Almost the same schedule, but this time we'll put s2vac2 and s2vis4 before
+# s1f4 just to demonstrate that the early pruning is allowed before the error
+# aborts s1's transaction.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t11" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
+
+# If we run the same schedule as above but without the update, we get no
+# snapshot too old error (even though our snapshot is older than the
+# threshold), and the relation remains all visible.
+permutation "t00" "s2vis1" "s1f1" "t01"       "s2vis2" "s1f2" "t11" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
index 7eeaeeb0dc..5ed46b3560 100644
--- a/src/test/modules/snapshot_too_old/sto.conf
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -1,2 +1,2 @@
 autovacuum = off
-old_snapshot_threshold = 0
+old_snapshot_threshold = 10
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
new file mode 100644
index 0000000000..c10afcf23a
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -0,0 +1,14 @@
+/* src/test/modules/snapshot_too_old/test_sto--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_sto" to load this file. \quit
+
+CREATE FUNCTION test_sto_clobber_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_clobber_snapshot_timestamp'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_reset_all_state()
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
new file mode 100644
index 0000000000..f6c9a1a000
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_sto.c
+ *	  Functions to support isolation tests for snapshot too old.
+ *
+ * These functions are not intended for use in a production database and
+ * could cause corruption.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  src/test/modules/snapshot_too_old/test_sto.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
+PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+
+/*
+ * Revert to initial state.  This is not safe except in carefully
+ * controlled tests.
+ */
+Datum
+test_sto_reset_all_state(PG_FUNCTION_ARGS)
+{
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->count_used = 0;
+	oldSnapshotControl->current_timestamp = 0;
+	oldSnapshotControl->head_offset = 0;
+	oldSnapshotControl->head_timestamp = 0;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	oldSnapshotControl->latest_xmin = InvalidTransactionId;
+	oldSnapshotControl->next_map_update = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = 0;
+	oldSnapshotControl->threshold_xid = InvalidTransactionId;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Update the minimum time used in snapshot-too-old code.  If set ahead of the
+ * current wall clock time (for example, the year 3000), this allows testing
+ * with arbitrary times.  This is not safe except in carefully controlled
+ * tests.
+ */
+Datum
+test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/snapshot_too_old/test_sto.control b/src/test/modules/snapshot_too_old/test_sto.control
new file mode 100644
index 0000000000..e497e5450e
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.control
@@ -0,0 +1,5 @@
+# test_sto test module
+comment = 'functions for internal testing of snapshot too old errors'
+default_version = '1.0'
+module_pathname = '$libdir/test_sto'
+relocatable = true
-- 
2.20.1

v4-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v4-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload

From 1a1168caa9d4bfe5b36ab14ed31ae0ef98d267e8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 17:05:42 +1200
Subject: [PATCH v4 5/5] Truncate snapshot-too-old time map when CLOG is
 truncated.

It's not safe to leave xids in the map that have wrapped around.

Reported-by: Andres Freund
---
 src/backend/commands/vacuum.c                 |  3 +
 src/backend/utils/time/snapmgr.c              | 21 +++++
 src/include/utils/snapmgr.h                   |  1 +
 src/test/modules/snapshot_too_old/Makefile    |  4 +-
 .../snapshot_too_old/t/001_truncate.pl        | 81 +++++++++++++++++++
 5 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/t/001_truncate.pl

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5a110edb07..37ead45fa5 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1627,6 +1627,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 19e6c52b80..edb47c9664 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1974,6 +1974,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b28d13ce84..4f53aad956 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -135,6 +135,7 @@ extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmi
 														 Relation relation);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index 81836e9953..f919944984 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -9,7 +9,7 @@ EXTENSION = test_sto
 DATA = test_sto--1.0.sql
 PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
 
-EXTRA_INSTALL = contrib/pg_visibility
+EXTRA_INSTALL = contrib/pg_visibility contrib/old_snapshot
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
@@ -19,6 +19,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/s
 # because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/snapshot_too_old/t/001_truncate.pl b/src/test/modules/snapshot_too_old/t/001_truncate.pl
new file mode 100644
index 0000000000..c3bff45f65
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/001_truncate.pl
@@ -0,0 +1,81 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension old_snapshot');
+$node->psql('postgres', 'create extension test_sto');
+
+note "check time map is truncated when CLOG is";
+
+# build up a time map with 4 entries
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:02:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:03:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+my $count;
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 4, "expected to have 4 entries in the old snapshot time map");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# we expect all XIDs to have been truncated
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 0, "expected to have 0 entries in the old snapshot time map");
+
+# put two more in the map
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:04:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:05:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 2, "expected to have 2 entries in the old snapshot time map");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes (this tests wrapping around the mapping array, which is of size 10 + 10)...
+$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('3000-01-01 00:21:00Z')");
+$node->psql('postgres', "select pg_current_xact_id()");
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 18, "expected to have 18 entries in the old snapshot time map");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# this should leave just 16
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 16, "expected to have 16 entries in the old snapshot time map");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+$node->psql('postgres', 'vacuum freeze');
+$node->psql('template0', 'vacuum freeze');
+$node->psql('template1', 'vacuum freeze');
+
+# we should now be back to empty
+$node->psql('postgres', "select count(*) from pg_old_snapshot_time_mapping()", stdout => \$count);
+is($count, 0, "expected to have 0 entries in the old snapshot time map");
+
+$node->stop;
-- 
2.20.1

#16

dilipbalaut@gmail.com

over 5 years ago

In reply to: Robert Haas (#14)

Re: fixing old_snapshot_threshold's time->xid mapping

On Mon, Apr 20, 2020 at 11:31 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 20, 2020 at 12:10 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have started reviewing these patches. I think, the fixes looks right to me.
+ LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+ mapping->head_offset = oldSnapshotControl->head_offset;
+ mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+ mapping->count_used = oldSnapshotControl->count_used;
+ for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+ mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+ LWLockRelease(OldSnapshotTimeMapLock);
I think memcpy would be a better choice instead of looping it for all
the entries, since we are doing this under a lock?
When I did it that way, it complained about "const" and I couldn't
immediately figure out how to fix it.

I think we can typecast to (const void *). After below change, I did
not get the warning.

diff --git a/contrib/old_snapshot/time_mapping.c
b/contrib/old_snapshot/time_mapping.c
index 37e0055..cc53bdd 100644
--- a/contrib/old_snapshot/time_mapping.c
+++ b/contrib/old_snapshot/time_mapping.c
@@ -94,8 +94,9 @@ GetOldSnapshotTimeMapping(void)
        mapping->head_offset = oldSnapshotControl->head_offset;
        mapping->head_timestamp = oldSnapshotControl->head_timestamp;
        mapping->count_used = oldSnapshotControl->count_used;
-       for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
-               mapping->xid_by_minute[i] =
oldSnapshotControl->xid_by_minute[i];
+       memcpy(mapping->xid_by_minute,
+                  (const void *) oldSnapshotControl->xid_by_minute,
+                  sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
        LWLockRelease(OldSnapshotTimeMapLock);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#17

thomas.munro@gmail.com

over 5 years ago

In reply to: Thomas Munro (#15)

6 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it. 0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly. Also some minor perl
improvements to 0005. I'll attach 0001-0004 again but they are
unchanged.

Since confusion about head vs tail seems to have been at the root of
the bugs addressed by 0003, I wonder if we should also rename
head_{timestamp,offset} to oldest_{timestamp,offset}.

Attachments:

v5-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Expose-oldSnapshotControl.patchDownload

From 192ff883fd834371cc1f674d6a4cdfee89729cb0 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v5 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +----------------------
 src/include/utils/old_snapshot.h | 75 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 1c063c592c..abaaea569a 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 0000000000..284af7d508
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
+	TransactionId latest_xmin;	/* latest snapshot xmin */
+	TimestampTz next_map_update;	/* latest snapshot valid up to */
+	slock_t		mutex_threshold;	/* protect threshold fields */
+	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;	/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
+	 * is full and the head must be advanced to add new entries.  Use
+	 * timestamps aligned to minute boundaries, since that seems less
+	 * surprising than aligning based on the first usage timestamp.  The
+	 * latest bucket is effectively stored within latest_xmin.  The circular
+	 * buffer is updated when we get a new xmin value that doesn't fall into
+	 * the same interval.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;	/* subscript of oldest tracked time */
+	TimestampTz head_timestamp; /* time corresponding to head xid */
+	int			count_used;		/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotControlData;
+
+extern volatile OldSnapshotControlData *oldSnapshotControl;
+
+#endif
-- 
2.20.1

v5-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v5-0002-contrib-old_snapshot-time-xid-mapping.patchDownload

From 4dd4e86de6b4435f3cc86af57c54a8862b2d5069 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:14:32 -0400
Subject: [PATCH v5 2/6] contrib/old_snapshot: time->xid mapping.

---
 contrib/Makefile                           |   1 +
 contrib/old_snapshot/Makefile              |  24 ++++
 contrib/old_snapshot/old_snapshot--1.0.sql |  14 ++
 contrib/old_snapshot/old_snapshot.control  |   5 +
 contrib/old_snapshot/time_mapping.c        | 159 +++++++++++++++++++++
 5 files changed, 203 insertions(+)
 create mode 100644 contrib/old_snapshot/Makefile
 create mode 100644 contrib/old_snapshot/old_snapshot--1.0.sql
 create mode 100644 contrib/old_snapshot/old_snapshot.control
 create mode 100644 contrib/old_snapshot/time_mapping.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..452ade0782 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		old_snapshot	\
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
new file mode 100644
index 0000000000..091231f25f
--- /dev/null
+++ b/contrib/old_snapshot/Makefile
@@ -0,0 +1,24 @@
+# contrib/old_snapshot/Makefile
+
+MODULE_big = old_snapshot
+OBJS = \
+	$(WIN32RES) \
+	time_mapping.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = old_snapshot
+DATA = old_snapshot--1.0.sql
+PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
+
+REGRESS = old_snapshot
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/old_snapshot
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
new file mode 100644
index 0000000000..9ebb8829e3
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -0,0 +1,14 @@
+/* contrib/old_snapshot/old_snapshot--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION old_snapshot" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_old_snapshot_time_mapping(array_offset OUT int4,
+											 end_timestamp OUT timestamptz,
+											 newest_xmin OUT xid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
+LANGUAGE C STRICT;
+
+-- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/old_snapshot.control b/contrib/old_snapshot/old_snapshot.control
new file mode 100644
index 0000000000..491eec536c
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot.control
@@ -0,0 +1,5 @@
+# old_snapshot extension
+comment = 'utilities in support of old_snapshot_threshold'
+default_version = '1.0'
+module_pathname = '$libdir/old_snapshot'
+relocatable = true
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
new file mode 100644
index 0000000000..37e0055a00
--- /dev/null
+++ b/contrib/old_snapshot/time_mapping.c
@@ -0,0 +1,159 @@
+/*-------------------------------------------------------------------------
+ *
+ * time_mapping.c
+ *	  time to XID mapping information
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  contrib/old_snapshot/time_mapping.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+/*
+ * Backend-private copy of the information from oldSnapshotControl which relates
+ * to the time to XID mapping, plus an index so that we can iterate.
+ *
+ * Note that the length of the xid_by_minute array is given by
+ * OLD_SNAPSHOT_TIME_MAP_ENTRIES (which is not a compile-time constant).
+ */
+typedef struct
+{
+	int				current_index;
+	int				head_offset;
+	TimestampTz		head_timestamp;
+	int				count_used;
+	TransactionId	xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotTimeMapping;
+
+#define NUM_TIME_MAPPING_COLUMNS 3
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+
+static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
+static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
+static HeapTuple MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc,
+												 OldSnapshotTimeMapping *mapping);
+
+/*
+ * SQL-callable set-returning function.
+ */
+Datum
+pg_old_snapshot_time_mapping(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	OldSnapshotTimeMapping *mapping;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+		mapping = GetOldSnapshotTimeMapping();
+		funcctx->user_fctx = mapping;
+		funcctx->tuple_desc = MakeOldSnapshotTimeMappingTupleDesc();
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	mapping = (OldSnapshotTimeMapping *) funcctx->user_fctx;
+
+	while (mapping->current_index < mapping->count_used)
+	{
+		HeapTuple	tuple;
+
+		tuple = MakeOldSnapshotTimeMappingTuple(funcctx->tuple_desc, mapping);
+		++mapping->current_index;
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Get the old snapshot time mapping data from shared memory.
+ */
+static OldSnapshotTimeMapping *
+GetOldSnapshotTimeMapping(void)
+{
+	OldSnapshotTimeMapping *mapping;
+
+	mapping = palloc(offsetof(OldSnapshotTimeMapping, xid_by_minute)
+					 + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
+	mapping->current_index = 0;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	mapping->head_offset = oldSnapshotControl->head_offset;
+	mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+	mapping->count_used = oldSnapshotControl->count_used;
+	for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+		mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	return mapping;
+}
+
+/*
+ * Build a tuple descriptor for the pg_old_snapshot_time_mapping() SRF.
+ */
+static TupleDesc
+MakeOldSnapshotTimeMappingTupleDesc(void)
+{
+	TupleDesc	tupdesc;
+
+	tupdesc = CreateTemplateTupleDesc(NUM_TIME_MAPPING_COLUMNS);
+
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "array_offset",
+					   INT4OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_timestamp",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "newest_xmin",
+					   XIDOID, -1, 0);
+
+	return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Convert one entry from the old snapshot time mapping to a HeapTuple.
+ */
+static HeapTuple
+MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mapping)
+{
+	Datum	values[NUM_TIME_MAPPING_COLUMNS];
+	bool	nulls[NUM_TIME_MAPPING_COLUMNS];
+	int		array_position;
+	TimestampTz	timestamp;
+
+	/*
+	 * Figure out the array position corresponding to the current index.
+	 *
+	 * Index 0 means the oldest entry in the mapping, which is stored at
+	 * mapping->head_offset. Index 1 means the next-oldest entry, which is a the
+	 * following index, and so on. We wrap around when we reach the end of the array.
+	 */
+	array_position = (mapping->head_offset + mapping->current_index)
+		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+
+	/*
+	 * No explicit timestamp is stored for any entry other than the oldest one,
+	 * but each entry corresponds to 1-minute period, so we can just add.
+	 */
+	timestamp = TimestampTzPlusMilliseconds(mapping->head_timestamp,
+											mapping->current_index * 60000);
+
+	/* Initialize nulls and values arrays. */
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = Int32GetDatum(array_position);
+	values[1] = TimestampTzGetDatum(timestamp);
+	values[2] = TransactionIdGetDatum(mapping->xid_by_minute[array_position]);
+
+	return heap_form_tuple(tupdesc, values, nulls);
+}
-- 
2.20.1

v5-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload

From 428330154510850075cfa492340ab46d1df3042e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:15:57 -0400
Subject: [PATCH v5 3/6] Fix bugs in MaintainOldSnapshotTimeMapping.

---
 src/backend/utils/time/snapmgr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index abaaea569a..72b2c61a07 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1926,10 +1926,32 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	else
 	{
 		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
+		int			distance_to_new_tail;
+		int			distance_to_current_tail;
+		int			advance;
 
-		oldSnapshotControl->head_timestamp = ts;
+		/*
+		 * Our goal is for the new "tail" of the mapping, that is, the entry
+		 * which is newest and thus furthest from the "head" entry, to
+		 * correspond to "ts". Since there's one entry per minute, the
+		 * distance between the current head and the new tail is just the
+		 * number of minutes of difference between ts and the current
+		 * head_timestamp.
+		 *
+		 * The distance from the current head to the current tail is one
+		 * less than the number of entries in the mapping, because the
+		 * entry at the head_offset is for 0 minutes after head_timestamp.
+		 *
+		 * The difference between these two values is the number of minutes
+		 * by which we need to advance the mapping, either adding new entries
+		 * or rotating old ones out.
+		 */
+		distance_to_new_tail =
+			(ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE;
+		distance_to_current_tail =
+			oldSnapshotControl->count_used - 1;
+		advance = distance_to_new_tail - distance_to_current_tail;
+		Assert(advance > 0);
 
 		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
 		{
@@ -1937,6 +1959,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 			oldSnapshotControl->head_offset = 0;
 			oldSnapshotControl->count_used = 1;
 			oldSnapshotControl->xid_by_minute[0] = xmin;
+			oldSnapshotControl->head_timestamp = ts;
 		}
 		else
 		{
@@ -1955,6 +1978,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 					else
 						oldSnapshotControl->head_offset = old_head + 1;
 					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+					oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
 				}
 				else
 				{
-- 
2.20.1

v5-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Rewrite-the-snapshot_too_old-tests.patchDownload

From 9bb6110de36c6df7a8117df602bbb08dbc643b71 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 16:23:02 +1200
Subject: [PATCH v5 4/6] Rewrite the "snapshot_too_old" tests.

Previously the snapshot too old feature used a special test value for
old_snapshot_threshold.  Instead, use a new approach based on explicitly
advancing the "current" timestamp used in snapshot-too-old book keeping,
so that we can control the timeline precisely without having to resort
to sleeping or special test branches in the code.

Also check that early pruning actually happens, by vacuuming and
inspecting the visibility map at key points in the test schedule.
---
 src/backend/utils/time/snapmgr.c              |  24 ---
 src/test/modules/snapshot_too_old/Makefile    |  23 +--
 .../expected/sto_using_cursor.out             |  75 ++++-----
 .../expected/sto_using_hash_index.out         |  26 ++-
 .../expected/sto_using_select.out             | 157 +++++++++++++++---
 .../specs/sto_using_cursor.spec               |  30 ++--
 .../specs/sto_using_hash_index.spec           |  19 ++-
 .../specs/sto_using_select.spec               |  50 ++++--
 src/test/modules/snapshot_too_old/sto.conf    |   2 +-
 .../snapshot_too_old/test_sto--1.0.sql        |  14 ++
 src/test/modules/snapshot_too_old/test_sto.c  |  74 +++++++++
 .../modules/snapshot_too_old/test_sto.control |   5 +
 12 files changed, 366 insertions(+), 133 deletions(-)
 create mode 100644 src/test/modules/snapshot_too_old/test_sto--1.0.sql
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.c
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.control

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 72b2c61a07..19e6c52b80 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1739,26 +1739,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		update_ts = oldSnapshotControl->next_map_update;
 		SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
-		/*
-		 * Zero threshold always overrides to latest xmin, if valid.  Without
-		 * some heuristic it will find its own snapshot too old on, for
-		 * example, a simple UPDATE -- which would make it useless for most
-		 * testing, but there is no principled way to ensure that it doesn't
-		 * fail in this way.  Use a five-second delay to try to get useful
-		 * testing behavior, but this may need adjustment.
-		 */
-		if (old_snapshot_threshold == 0)
-		{
-			if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
-				&& TransactionIdFollows(latest_xmin, xlimit))
-				xlimit = latest_xmin;
-
-			ts -= 5 * USECS_PER_SEC;
-			SetOldSnapshotThresholdTimestamp(ts, xlimit);
-
-			return xlimit;
-		}
-
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
 
@@ -1860,10 +1840,6 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	if (!map_update_required)
 		return;
 
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
 	/*
 	 * We don't want to do something stupid with unusual values, but we don't
 	 * want to litter the log with warnings or break otherwise normal
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..81836e9953 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -1,14 +1,22 @@
 # src/test/modules/snapshot_too_old/Makefile
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+MODULE_big = test_sto
+OBJS = \
+	$(WIN32RES) \
+	test_sto.o
+
+EXTENSION = test_sto
+DATA = test_sto--1.0.sql
+PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
+
+EXTRA_INSTALL = contrib/pg_visibility
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
-# Disabled because these tests require "old_snapshot_threshold" >= 0, which
-# typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require "old_snapshot_threshold" = 10, which
+# typical installcheck users do not have (e.g. buildfarm clients) and also
+# because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
@@ -21,8 +29,3 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# But it can nonetheless be very helpful to run tests on preexisting
-# installation, allow to do so, but only if requested explicitly.
-installcheck-force:
-	$(pg_isolation_regress_installcheck) $(ISOLATION)
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
index 8cc29ec82f..b007e2dc17 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -1,73 +1,60 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1decl s1f1 t10 s2u s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s1sleep s2u s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
-c              
+               
+step s1f3: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+test_sto_reset_all_state
 
-1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+               
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+starting permutation: t00 s1decl s1f1 t10 s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s2u s1sleep s1f2
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
-
-starting permutation: s1decl s2u s1f1 s1sleep s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
-
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s2u s1decl s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
+               
+step s1f3: FETCH FIRST FROM cursor1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+test_sto_reset_all_state
 
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
index bf94054790..091c212adc 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
@@ -1,15 +1,31 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: noseq s1f1 s2sleep s2u s1f2
+starting permutation: t00 noseq s1f1 t10 s2u s2v1 s1f2 t22 s2v2 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step noseq: SET enable_seqscan = false;
 step s1f1: SELECT c FROM sto1 where c = 1000;
 c              
 
 1000           
-step s2sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1000;
+step s2v1: VACUUM sto1;
 step s1f2: SELECT c FROM sto1 where c = 1001;
+c              
+
+step t22: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2v2: VACUUM sto1;
+step s1f3: SELECT c FROM sto1 where c = 1001;
 ERROR:  snapshot too old
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
index eb15bc23bf..201c106754 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_select.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -1,55 +1,164 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t11 s2vac1 s2vis3 s1f3 t12 s1f4 s2vac2 s2vis4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s1sleep s2u s1f2
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+test_sto_reset_all_state
+
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t11 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s2u s1sleep s1f2
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+test_sto_reset_all_state
 
-starting permutation: s2u s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2vis2 s1f2 t11 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+c              
+
+1              
+step t11: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
index eac18ca5b9..3be084b8fe 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using a cursor.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+    CREATE EXTENSION IF NOT EXISTS test_sto;
+    SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,16 +17,29 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+    DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
 step "s1f1"		{ FETCH FIRST FROM cursor1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ FETCH FIRST FROM cursor1; }
+step "s1f3"		{ FETCH FIRST FROM cursor1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t20"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:20 (not before,
+# because we need page pruning to see the xmin level change from 10 minutes earlier)
+permutation "t00" "s1decl" "s1f1" "t10" "s2u" "s1f2" "t20" "s1f3"
+
+# if there's no update, no snapshot too old error at time 00:20
+permutation "t00" "s1decl" "s1f1" "t10"       "s1f2" "t20" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
index 33d91ff852..f90bca3b7a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
@@ -1,8 +1,12 @@
 # This test is like sto_using_select, except that we test access via a
-# hash index.
+# hash index.  Explicit vacuuming is required in this version because
+# there is are no incidental calls to heap_page_prune_opt() that can
+# call SetOldSnapshotThresholdTimestamp().
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
     CREATE INDEX idx_sto1 ON sto1 USING HASH (c);
@@ -15,6 +19,7 @@ setup
 teardown
 {
     DROP TABLE sto1;
+	SELECT test_sto_reset_all_state();
 }
 
 session "s1"
@@ -22,10 +27,18 @@ setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "noseq"	{ SET enable_seqscan = false; }
 step "s1f1"		{ SELECT c FROM sto1 where c = 1000; }
 step "s1f2"		{ SELECT c FROM sto1 where c = 1001; }
+step "s1f3"		{ SELECT c FROM sto1 where c = 1001; }
 teardown		{ ROLLBACK; }
 
 session "s2"
-step "s2sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1000; }
+step "s2v1"		{ VACUUM sto1; }
+step "s2v2"		{ VACUUM sto1; }
 
-permutation "noseq" "s1f1" "s2sleep" "s2u" "s1f2"
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t22"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z'); }
+
+# snapshot too old at t22
+permutation "t00" "noseq" "s1f1" "t10" "s2u" "s2v1" "s1f2" "t22" "s2v2" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
index d7c34f3d89..cd7a97b742 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -1,19 +1,15 @@
 # This test provokes a "snapshot too old" error using SELECT statements.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	CREATE EXTENSION IF NOT EXISTS pg_visibility;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,15 +18,47 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+	DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f3"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f4"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
 teardown		{ COMMIT; }
 
 session "s2"
+step "s2vis1"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+step "s2vis2"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac1"	{ VACUUM sto1; }
+step "s2vis3"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac2"	{ VACUUM sto1; }
+step "s2vis4"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t01"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z'); }
+step "t11"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:11:00Z'); }
+step "t12"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z'); }
+
+# If there's an update, we get a snapshot too old error at time 00:12, and
+# VACUUM is allowed to remove the tuple our snapshot could see, which we know
+# because we see that the relation becomes all visible.  The earlier VACUUMs
+# were unable to remove the tuple we could see, which is is obvious because we
+# can see the row with value 1, and from the relation not being all visible
+# after the VACUUM.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t11" "s2vac1" "s2vis3" "s1f3" "t12" "s1f4" "s2vac2" "s2vis4"
+
+# Almost the same schedule, but this time we'll put s2vac2 and s2vis4 before
+# s1f4 just to demonstrate that the early pruning is allowed before the error
+# aborts s1's transaction.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t11" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
+
+# If we run the same schedule as above but without the update, we get no
+# snapshot too old error (even though our snapshot is older than the
+# threshold), and the relation remains all visible.
+permutation "t00" "s2vis1" "s1f1" "t01"       "s2vis2" "s1f2" "t11" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
index 7eeaeeb0dc..5ed46b3560 100644
--- a/src/test/modules/snapshot_too_old/sto.conf
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -1,2 +1,2 @@
 autovacuum = off
-old_snapshot_threshold = 0
+old_snapshot_threshold = 10
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
new file mode 100644
index 0000000000..c10afcf23a
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -0,0 +1,14 @@
+/* src/test/modules/snapshot_too_old/test_sto--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_sto" to load this file. \quit
+
+CREATE FUNCTION test_sto_clobber_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_clobber_snapshot_timestamp'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_reset_all_state()
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
new file mode 100644
index 0000000000..f6c9a1a000
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_sto.c
+ *	  Functions to support isolation tests for snapshot too old.
+ *
+ * These functions are not intended for use in a production database and
+ * could cause corruption.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  src/test/modules/snapshot_too_old/test_sto.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
+PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+
+/*
+ * Revert to initial state.  This is not safe except in carefully
+ * controlled tests.
+ */
+Datum
+test_sto_reset_all_state(PG_FUNCTION_ARGS)
+{
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->count_used = 0;
+	oldSnapshotControl->current_timestamp = 0;
+	oldSnapshotControl->head_offset = 0;
+	oldSnapshotControl->head_timestamp = 0;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	oldSnapshotControl->latest_xmin = InvalidTransactionId;
+	oldSnapshotControl->next_map_update = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = 0;
+	oldSnapshotControl->threshold_xid = InvalidTransactionId;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Update the minimum time used in snapshot-too-old code.  If set ahead of the
+ * current wall clock time (for example, the year 3000), this allows testing
+ * with arbitrary times.  This is not safe except in carefully controlled
+ * tests.
+ */
+Datum
+test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/snapshot_too_old/test_sto.control b/src/test/modules/snapshot_too_old/test_sto.control
new file mode 100644
index 0000000000..e497e5450e
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.control
@@ -0,0 +1,5 @@
+# test_sto test module
+comment = 'functions for internal testing of snapshot too old errors'
+default_version = '1.0'
+module_pathname = '$libdir/test_sto'
+relocatable = true
-- 
2.20.1

v5-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload

From 49f29afa0f7ee3da13326fb42cc77d00c15160d7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 17:05:42 +1200
Subject: [PATCH v5 5/6] Truncate snapshot-too-old time map when CLOG is
 truncated.

It's not safe to leave xids in the map that have wrapped around,
although it's probably very hard to actually reach that state.

Reported-by: Andres Freund
---
 src/backend/commands/vacuum.c                 |   3 +
 src/backend/utils/time/snapmgr.c              |  21 ++++
 src/include/utils/snapmgr.h                   |   1 +
 src/test/modules/snapshot_too_old/Makefile    |   4 +-
 .../snapshot_too_old/t/001_truncate.pl        | 100 ++++++++++++++++++
 5 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/t/001_truncate.pl

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5a110edb07..37ead45fa5 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1627,6 +1627,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 19e6c52b80..edb47c9664 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1974,6 +1974,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b28d13ce84..4f53aad956 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -135,6 +135,7 @@ extern TransactionId TransactionIdLimitedForOldSnapshots(TransactionId recentXmi
 														 Relation relation);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index 81836e9953..f919944984 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -9,7 +9,7 @@ EXTENSION = test_sto
 DATA = test_sto--1.0.sql
 PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
 
-EXTRA_INSTALL = contrib/pg_visibility
+EXTRA_INSTALL = contrib/pg_visibility contrib/old_snapshot
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
@@ -19,6 +19,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/s
 # because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/snapshot_too_old/t/001_truncate.pl b/src/test/modules/snapshot_too_old/t/001_truncate.pl
new file mode 100644
index 0000000000..afcca232f2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/001_truncate.pl
@@ -0,0 +1,100 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->append_conf("postgresql.conf", "autovacuum=off");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension old_snapshot');
+$node->psql('postgres', 'create extension test_sto');
+
+note "check time map is truncated when CLOG is";
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub advance_xid
+{
+	my $time = shift;
+	$node->psql('postgres', "select pg_current_xact_id()");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+sub vacuum_freeze_all
+{
+	$node->psql('postgres', 'vacuum freeze');
+	$node->psql('template0', 'vacuum freeze');
+	$node->psql('template1', 'vacuum freeze');
+}
+
+# build up a time map with 4 entries
+set_time('3000-01-01 00:00:00Z');
+advance_xid();
+set_time('3000-01-01 00:01:00Z');
+advance_xid();
+set_time('3000-01-01 00:02:00Z');
+advance_xid();
+set_time('3000-01-01 00:03:00Z');
+advance_xid();
+is(summarize_mapping(), "4|00:00:00|00:03:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we expect all XIDs to have been truncated
+is(summarize_mapping(), "0||");
+
+# put two more in the map
+set_time('3000-01-01 00:04:00Z');
+advance_xid();
+set_time('3000-01-01 00:05:00Z');
+advance_xid();
+is(summarize_mapping(), "2|00:04:00|00:05:00");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes; we should now have 18
+set_time('3000-01-01 00:21:00Z');
+advance_xid();
+is(summarize_mapping(), "18|00:04:00|00:21:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# this should leave just 16, because 2 were truncated
+is(summarize_mapping(), "16|00:06:00|00:21:00");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we should now be back to empty
+is(summarize_mapping(), "0||");
+
+$node->stop;
-- 
2.20.1

v5-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchtext/x-patch; charset=US-ASCII; name=v5-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchDownload

From 6409467a8c3a2b7507349850906be85abaacbaac Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 21 Apr 2020 20:48:20 +1200
Subject: [PATCH v5 6/6] Add TAP test for snapshot too old time map
 maintenance.

---
 .../t/002_xid_map_maintenance.pl              | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl

diff --git a/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
new file mode 100644
index 0000000000..eddd0ce5ae
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
@@ -0,0 +1,63 @@
+# Test xid various time/xid map maintenance edge cases
+# that were historically buggy.
+
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "autovacuum = off");
+$node->start;
+$node->psql('postgres', 'create extension test_sto');
+$node->psql('postgres', 'create extension old_snapshot');
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+# fill the map up to maximum capacity
+set_time('3000-01-01 00:00:00Z');
+set_time('3000-01-01 00:19:00Z');
+is(summarize_mapping(), "20|00:00:00|00:19:00");
+
+# make a jump larger than capacity; the mapping is blown away,
+# and our new minute is now the only one
+set_time('3000-01-01 02:00:00Z');
+is(summarize_mapping(), "1|02:00:00|02:00:00");
+
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");
+
+# test adding minutes while the map is full
+set_time('3000-01-01 02:20:00Z');
+is(summarize_mapping(), "20|02:01:00|02:20:00");
+set_time('3000-01-01 02:22:00Z');
+is(summarize_mapping(), "20|02:03:00|02:22:00");
+set_time('3000-01-01 02:22:01Z'); # one second past
+is(summarize_mapping(), "20|02:04:00|02:23:00");
+
+$node->stop;
-- 
2.20.1

#18

dilipbalaut@gmail.com

over 5 years ago

In reply to: Thomas Munro (#17)

Re: fixing old_snapshot_threshold's time->xid mapping

On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it. 0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly.

+1,  I think we should also add a way to test the case, where we
advance the timestamp by multiple slots.  I see that you have such
case
e.g
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");

But, I think we should try to extend it to test that we have put the
new xid only in those slots where we suppose to and not in other
slots?.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#19

dilipbalaut@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#18)

1 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Tue, Apr 21, 2020 at 4:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Apr 21, 2020 at 3:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Apr 21, 2020 at 2:05 PM Thomas Munro <thomas.munro@gmail.com> wrote:

As before, these two apply on top of Robert's patches (or at least his
0001 and 0002).

While trying to figure out if Robert's 0003 patch was correct, I added
yet another patch to this stack to test it. 0006 does basic xid map
maintenance that exercises the cases 0003 fixes, and I think it
demonstrates that they now work correctly.
+1,  I think we should also add a way to test the case, where we
advance the timestamp by multiple slots.  I see that you have such
case
e.g
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");
But, I think we should try to extend it to test that we have put the
new xid only in those slots where we suppose to and not in other
slots?.

I feel that we should. probably fix this check as well? Because if ts

update_ts then it will go to else part then there it will finally

end up in the last slot only so I think we can use this case also as
fast exit.

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 93a0c04..644d9b1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1831,7 +1831,7 @@
TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,

                if (!same_ts_as_threshold)
                {
-                       if (ts == update_ts)
+                       if (ts >= update_ts)
                        {
                                xlimit = latest_xmin;
                                if (NormalTransactionIdFollows(xlimit,
recentXmin))

This patch can be applied on top of other v5 patches.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v5-0007-Fix-check-while-computing-transaction-xid-limit.patchapplication/octet-stream; name=v5-0007-Fix-check-while-computing-transaction-xid-limit.patchDownload

From 4b383edd0046efaee97c9a09d8dcd27dc0ad3734 Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilipkumar@localhost.localdomain>
Date: Wed, 22 Apr 2020 11:05:30 +0530
Subject: [PATCH v5 7/7] Fix check while computing transaction xid limit

---
 src/backend/utils/time/snapmgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index dd934ba..296ae8f 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1760,7 +1760,7 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 
 		if (!same_ts_as_threshold)
 		{
-			if (ts == update_ts)
+			if (ts >= update_ts)
 			{
 				xlimit = latest_xmin;
 				if (NormalTransactionIdFollows(xlimit, recentXmin))
-- 
1.8.3.1

#20

thomas.munro@gmail.com

over 5 years ago

In reply to: Dilip Kumar (#19)

6 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Wed, Apr 22, 2020 at 5:39 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

-                       if (ts == update_ts)
+                       if (ts >= update_ts)

Hi Dilip, I didn't follow this bit -- could you explain?

Here's a rebase. In the 0004 patch I chose to leave behind some
unnecessary braces to avoid reindenting a bunch of code after removing
an if branch, just for ease of review, but I'd probably remove those
in a committed version. I'm going to add this stuff to the next CF so
we don't lose track of it.

Attachments:

v6-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Expose-oldSnapshotControl.patchDownload

From 628eb4cfb7a7d67125c5fd9ebc1ecd69b3d9cb82 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v6 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +----------------------
 src/include/utils/old_snapshot.h | 75 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 604d823f68..9dc19b2f58 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 0000000000..284af7d508
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
+	TransactionId latest_xmin;	/* latest snapshot xmin */
+	TimestampTz next_map_update;	/* latest snapshot valid up to */
+	slock_t		mutex_threshold;	/* protect threshold fields */
+	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;	/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
+	 * is full and the head must be advanced to add new entries.  Use
+	 * timestamps aligned to minute boundaries, since that seems less
+	 * surprising than aligning based on the first usage timestamp.  The
+	 * latest bucket is effectively stored within latest_xmin.  The circular
+	 * buffer is updated when we get a new xmin value that doesn't fall into
+	 * the same interval.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;	/* subscript of oldest tracked time */
+	TimestampTz head_timestamp; /* time corresponding to head xid */
+	int			count_used;		/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotControlData;
+
+extern volatile OldSnapshotControlData *oldSnapshotControl;
+
+#endif
-- 
2.20.1

v6-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v6-0002-contrib-old_snapshot-time-xid-mapping.patchDownload

From d19633f2b3c284300ff64b4a97dbb40ac6292f5f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:14:32 -0400
Subject: [PATCH v6 2/6] contrib/old_snapshot: time->xid mapping.

---
 contrib/Makefile                           |   1 +
 contrib/old_snapshot/Makefile              |  24 ++++
 contrib/old_snapshot/old_snapshot--1.0.sql |  14 ++
 contrib/old_snapshot/old_snapshot.control  |   5 +
 contrib/old_snapshot/time_mapping.c        | 159 +++++++++++++++++++++
 5 files changed, 203 insertions(+)
 create mode 100644 contrib/old_snapshot/Makefile
 create mode 100644 contrib/old_snapshot/old_snapshot--1.0.sql
 create mode 100644 contrib/old_snapshot/old_snapshot.control
 create mode 100644 contrib/old_snapshot/time_mapping.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..452ade0782 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		old_snapshot	\
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
new file mode 100644
index 0000000000..091231f25f
--- /dev/null
+++ b/contrib/old_snapshot/Makefile
@@ -0,0 +1,24 @@
+# contrib/old_snapshot/Makefile
+
+MODULE_big = old_snapshot
+OBJS = \
+	$(WIN32RES) \
+	time_mapping.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = old_snapshot
+DATA = old_snapshot--1.0.sql
+PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
+
+REGRESS = old_snapshot
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/old_snapshot
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
new file mode 100644
index 0000000000..9ebb8829e3
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -0,0 +1,14 @@
+/* contrib/old_snapshot/old_snapshot--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION old_snapshot" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_old_snapshot_time_mapping(array_offset OUT int4,
+											 end_timestamp OUT timestamptz,
+											 newest_xmin OUT xid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
+LANGUAGE C STRICT;
+
+-- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/old_snapshot.control b/contrib/old_snapshot/old_snapshot.control
new file mode 100644
index 0000000000..491eec536c
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot.control
@@ -0,0 +1,5 @@
+# old_snapshot extension
+comment = 'utilities in support of old_snapshot_threshold'
+default_version = '1.0'
+module_pathname = '$libdir/old_snapshot'
+relocatable = true
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
new file mode 100644
index 0000000000..37e0055a00
--- /dev/null
+++ b/contrib/old_snapshot/time_mapping.c
@@ -0,0 +1,159 @@
+/*-------------------------------------------------------------------------
+ *
+ * time_mapping.c
+ *	  time to XID mapping information
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  contrib/old_snapshot/time_mapping.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+/*
+ * Backend-private copy of the information from oldSnapshotControl which relates
+ * to the time to XID mapping, plus an index so that we can iterate.
+ *
+ * Note that the length of the xid_by_minute array is given by
+ * OLD_SNAPSHOT_TIME_MAP_ENTRIES (which is not a compile-time constant).
+ */
+typedef struct
+{
+	int				current_index;
+	int				head_offset;
+	TimestampTz		head_timestamp;
+	int				count_used;
+	TransactionId	xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotTimeMapping;
+
+#define NUM_TIME_MAPPING_COLUMNS 3
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+
+static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
+static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
+static HeapTuple MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc,
+												 OldSnapshotTimeMapping *mapping);
+
+/*
+ * SQL-callable set-returning function.
+ */
+Datum
+pg_old_snapshot_time_mapping(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	OldSnapshotTimeMapping *mapping;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+		mapping = GetOldSnapshotTimeMapping();
+		funcctx->user_fctx = mapping;
+		funcctx->tuple_desc = MakeOldSnapshotTimeMappingTupleDesc();
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	mapping = (OldSnapshotTimeMapping *) funcctx->user_fctx;
+
+	while (mapping->current_index < mapping->count_used)
+	{
+		HeapTuple	tuple;
+
+		tuple = MakeOldSnapshotTimeMappingTuple(funcctx->tuple_desc, mapping);
+		++mapping->current_index;
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Get the old snapshot time mapping data from shared memory.
+ */
+static OldSnapshotTimeMapping *
+GetOldSnapshotTimeMapping(void)
+{
+	OldSnapshotTimeMapping *mapping;
+
+	mapping = palloc(offsetof(OldSnapshotTimeMapping, xid_by_minute)
+					 + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
+	mapping->current_index = 0;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	mapping->head_offset = oldSnapshotControl->head_offset;
+	mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+	mapping->count_used = oldSnapshotControl->count_used;
+	for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+		mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	return mapping;
+}
+
+/*
+ * Build a tuple descriptor for the pg_old_snapshot_time_mapping() SRF.
+ */
+static TupleDesc
+MakeOldSnapshotTimeMappingTupleDesc(void)
+{
+	TupleDesc	tupdesc;
+
+	tupdesc = CreateTemplateTupleDesc(NUM_TIME_MAPPING_COLUMNS);
+
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "array_offset",
+					   INT4OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_timestamp",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "newest_xmin",
+					   XIDOID, -1, 0);
+
+	return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Convert one entry from the old snapshot time mapping to a HeapTuple.
+ */
+static HeapTuple
+MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mapping)
+{
+	Datum	values[NUM_TIME_MAPPING_COLUMNS];
+	bool	nulls[NUM_TIME_MAPPING_COLUMNS];
+	int		array_position;
+	TimestampTz	timestamp;
+
+	/*
+	 * Figure out the array position corresponding to the current index.
+	 *
+	 * Index 0 means the oldest entry in the mapping, which is stored at
+	 * mapping->head_offset. Index 1 means the next-oldest entry, which is a the
+	 * following index, and so on. We wrap around when we reach the end of the array.
+	 */
+	array_position = (mapping->head_offset + mapping->current_index)
+		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+
+	/*
+	 * No explicit timestamp is stored for any entry other than the oldest one,
+	 * but each entry corresponds to 1-minute period, so we can just add.
+	 */
+	timestamp = TimestampTzPlusMilliseconds(mapping->head_timestamp,
+											mapping->current_index * 60000);
+
+	/* Initialize nulls and values arrays. */
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = Int32GetDatum(array_position);
+	values[1] = TimestampTzGetDatum(timestamp);
+	values[2] = TransactionIdGetDatum(mapping->xid_by_minute[array_position]);
+
+	return heap_form_tuple(tupdesc, values, nulls);
+}
-- 
2.20.1

v6-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload

From 62357a4cccfa0dbc7be7ae9e8f505e4c432f89f8 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:15:57 -0400
Subject: [PATCH v6 3/6] Fix bugs in MaintainOldSnapshotTimeMapping.

---
 src/backend/utils/time/snapmgr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 9dc19b2f58..b41eadf905 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1944,10 +1944,32 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	else
 	{
 		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
+		int			distance_to_new_tail;
+		int			distance_to_current_tail;
+		int			advance;
 
-		oldSnapshotControl->head_timestamp = ts;
+		/*
+		 * Our goal is for the new "tail" of the mapping, that is, the entry
+		 * which is newest and thus furthest from the "head" entry, to
+		 * correspond to "ts". Since there's one entry per minute, the
+		 * distance between the current head and the new tail is just the
+		 * number of minutes of difference between ts and the current
+		 * head_timestamp.
+		 *
+		 * The distance from the current head to the current tail is one
+		 * less than the number of entries in the mapping, because the
+		 * entry at the head_offset is for 0 minutes after head_timestamp.
+		 *
+		 * The difference between these two values is the number of minutes
+		 * by which we need to advance the mapping, either adding new entries
+		 * or rotating old ones out.
+		 */
+		distance_to_new_tail =
+			(ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE;
+		distance_to_current_tail =
+			oldSnapshotControl->count_used - 1;
+		advance = distance_to_new_tail - distance_to_current_tail;
+		Assert(advance > 0);
 
 		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
 		{
@@ -1955,6 +1977,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 			oldSnapshotControl->head_offset = 0;
 			oldSnapshotControl->count_used = 1;
 			oldSnapshotControl->xid_by_minute[0] = xmin;
+			oldSnapshotControl->head_timestamp = ts;
 		}
 		else
 		{
@@ -1973,6 +1996,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 					else
 						oldSnapshotControl->head_offset = old_head + 1;
 					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+					oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
 				}
 				else
 				{
-- 
2.20.1

v6-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Rewrite-the-snapshot_too_old-tests.patchDownload

From 9cd7890998b94df1c2f1ab108a53028e1edac480 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 16:23:02 +1200
Subject: [PATCH v6 4/6] Rewrite the "snapshot_too_old" tests.

Previously the snapshot too old feature used a special test value for
old_snapshot_threshold.  Instead, use a new approach based on explicitly
advancing the "current" timestamp used in snapshot-too-old book keeping,
so that we can control the timeline precisely without having to resort
to sleeping or special test branches in the code.

Also check that early pruning actually happens, by vacuuming and
inspecting the visibility map at key points in the test schedule.
---
 src/backend/utils/time/snapmgr.c              |  21 ---
 src/test/modules/snapshot_too_old/Makefile    |  23 +--
 .../expected/sto_using_cursor.out             |  75 ++++-----
 .../expected/sto_using_hash_index.out         |  26 ++-
 .../expected/sto_using_select.out             | 157 +++++++++++++++---
 .../specs/sto_using_cursor.spec               |  30 ++--
 .../specs/sto_using_hash_index.spec           |  19 ++-
 .../specs/sto_using_select.spec               |  50 ++++--
 src/test/modules/snapshot_too_old/sto.conf    |   2 +-
 .../snapshot_too_old/test_sto--1.0.sql        |  14 ++
 src/test/modules/snapshot_too_old/test_sto.c  |  74 +++++++++
 .../modules/snapshot_too_old/test_sto.control |   5 +
 12 files changed, 366 insertions(+), 130 deletions(-)
 create mode 100644 src/test/modules/snapshot_too_old/test_sto--1.0.sql
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.c
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.control

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index b41eadf905..a94465235d 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1769,23 +1769,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 	next_map_update_ts = oldSnapshotControl->next_map_update;
 	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
-	/*
-	 * Zero threshold always overrides to latest xmin, if valid.  Without some
-	 * heuristic it will find its own snapshot too old on, for example, a
-	 * simple UPDATE -- which would make it useless for most testing, but
-	 * there is no principled way to ensure that it doesn't fail in this way.
-	 * Use a five-second delay to try to get useful testing behavior, but this
-	 * may need adjustment.
-	 */
-	if (old_snapshot_threshold == 0)
-	{
-		if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
-			&& TransactionIdFollows(latest_xmin, xlimit))
-			xlimit = latest_xmin;
-
-		ts -= 5 * USECS_PER_SEC;
-	}
-	else
 	{
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
@@ -1878,10 +1861,6 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	if (!map_update_required)
 		return;
 
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
 	/*
 	 * We don't want to do something stupid with unusual values, but we don't
 	 * want to litter the log with warnings or break otherwise normal
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..81836e9953 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -1,14 +1,22 @@
 # src/test/modules/snapshot_too_old/Makefile
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+MODULE_big = test_sto
+OBJS = \
+	$(WIN32RES) \
+	test_sto.o
+
+EXTENSION = test_sto
+DATA = test_sto--1.0.sql
+PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
+
+EXTRA_INSTALL = contrib/pg_visibility
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
-# Disabled because these tests require "old_snapshot_threshold" >= 0, which
-# typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require "old_snapshot_threshold" = 10, which
+# typical installcheck users do not have (e.g. buildfarm clients) and also
+# because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
@@ -21,8 +29,3 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# But it can nonetheless be very helpful to run tests on preexisting
-# installation, allow to do so, but only if requested explicitly.
-installcheck-force:
-	$(pg_isolation_regress_installcheck) $(ISOLATION)
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
index 8cc29ec82f..b007e2dc17 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -1,73 +1,60 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1decl s1f1 t10 s2u s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s1sleep s2u s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
-c              
+               
+step s1f3: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+test_sto_reset_all_state
 
-1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+               
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+starting permutation: t00 s1decl s1f1 t10 s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s2u s1sleep s1f2
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
-
-starting permutation: s1decl s2u s1f1 s1sleep s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
-
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s2u s1decl s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
+               
+step s1f3: FETCH FIRST FROM cursor1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+test_sto_reset_all_state
 
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
index bf94054790..091c212adc 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
@@ -1,15 +1,31 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: noseq s1f1 s2sleep s2u s1f2
+starting permutation: t00 noseq s1f1 t10 s2u s2v1 s1f2 t22 s2v2 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step noseq: SET enable_seqscan = false;
 step s1f1: SELECT c FROM sto1 where c = 1000;
 c              
 
 1000           
-step s2sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1000;
+step s2v1: VACUUM sto1;
 step s1f2: SELECT c FROM sto1 where c = 1001;
+c              
+
+step t22: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2v2: VACUUM sto1;
+step s1f3: SELECT c FROM sto1 where c = 1001;
 ERROR:  snapshot too old
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
index eb15bc23bf..ba27bc5261 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_select.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -1,55 +1,164 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s1f4 s2vac2 s2vis4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s1sleep s2u s1f2
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+test_sto_reset_all_state
+
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s2u s1sleep s1f2
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+test_sto_reset_all_state
 
-starting permutation: s2u s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+c              
+
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
index eac18ca5b9..3be084b8fe 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using a cursor.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+    CREATE EXTENSION IF NOT EXISTS test_sto;
+    SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,16 +17,29 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+    DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
 step "s1f1"		{ FETCH FIRST FROM cursor1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ FETCH FIRST FROM cursor1; }
+step "s1f3"		{ FETCH FIRST FROM cursor1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t20"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:20 (not before,
+# because we need page pruning to see the xmin level change from 10 minutes earlier)
+permutation "t00" "s1decl" "s1f1" "t10" "s2u" "s1f2" "t20" "s1f3"
+
+# if there's no update, no snapshot too old error at time 00:20
+permutation "t00" "s1decl" "s1f1" "t10"       "s1f2" "t20" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
index 33d91ff852..f90bca3b7a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
@@ -1,8 +1,12 @@
 # This test is like sto_using_select, except that we test access via a
-# hash index.
+# hash index.  Explicit vacuuming is required in this version because
+# there is are no incidental calls to heap_page_prune_opt() that can
+# call SetOldSnapshotThresholdTimestamp().
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
     CREATE INDEX idx_sto1 ON sto1 USING HASH (c);
@@ -15,6 +19,7 @@ setup
 teardown
 {
     DROP TABLE sto1;
+	SELECT test_sto_reset_all_state();
 }
 
 session "s1"
@@ -22,10 +27,18 @@ setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "noseq"	{ SET enable_seqscan = false; }
 step "s1f1"		{ SELECT c FROM sto1 where c = 1000; }
 step "s1f2"		{ SELECT c FROM sto1 where c = 1001; }
+step "s1f3"		{ SELECT c FROM sto1 where c = 1001; }
 teardown		{ ROLLBACK; }
 
 session "s2"
-step "s2sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1000; }
+step "s2v1"		{ VACUUM sto1; }
+step "s2v2"		{ VACUUM sto1; }
 
-permutation "noseq" "s1f1" "s2sleep" "s2u" "s1f2"
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t22"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z'); }
+
+# snapshot too old at t22
+permutation "t00" "noseq" "s1f1" "t10" "s2u" "s2v1" "s1f2" "t22" "s2v2" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
index d7c34f3d89..c7917d1b0a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -1,19 +1,15 @@
 # This test provokes a "snapshot too old" error using SELECT statements.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	CREATE EXTENSION IF NOT EXISTS pg_visibility;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,15 +18,47 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+	DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f3"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f4"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
 teardown		{ COMMIT; }
 
 session "s2"
+step "s2vis1"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+step "s2vis2"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac1"	{ VACUUM sto1; }
+step "s2vis3"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac2"	{ VACUUM sto1; }
+step "s2vis4"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t01"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t12"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z'); }
+
+# If there's an update, we get a snapshot too old error at time 00:12, and
+# VACUUM is allowed to remove the tuple our snapshot could see, which we know
+# because we see that the relation becomes all visible.  The earlier VACUUMs
+# were unable to remove the tuple we could see, which is is obvious because we
+# can see the row with value 1, and from the relation not being all visible
+# after the VACUUM.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s1f4" "s2vac2" "s2vis4"
+
+# Almost the same schedule, but this time we'll put s2vac2 and s2vis4 before
+# s1f4 just to demonstrate that the early pruning is allowed before the error
+# aborts s1's transaction.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
+
+# If we run the same schedule as above but without the update, we get no
+# snapshot too old error (even though our snapshot is older than the
+# threshold), and the relation remains all visible.
+permutation "t00" "s2vis1" "s1f1" "t01"       "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
index 7eeaeeb0dc..5ed46b3560 100644
--- a/src/test/modules/snapshot_too_old/sto.conf
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -1,2 +1,2 @@
 autovacuum = off
-old_snapshot_threshold = 0
+old_snapshot_threshold = 10
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
new file mode 100644
index 0000000000..c10afcf23a
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -0,0 +1,14 @@
+/* src/test/modules/snapshot_too_old/test_sto--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_sto" to load this file. \quit
+
+CREATE FUNCTION test_sto_clobber_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_clobber_snapshot_timestamp'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_reset_all_state()
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
new file mode 100644
index 0000000000..f6c9a1a000
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_sto.c
+ *	  Functions to support isolation tests for snapshot too old.
+ *
+ * These functions are not intended for use in a production database and
+ * could cause corruption.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  src/test/modules/snapshot_too_old/test_sto.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
+PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+
+/*
+ * Revert to initial state.  This is not safe except in carefully
+ * controlled tests.
+ */
+Datum
+test_sto_reset_all_state(PG_FUNCTION_ARGS)
+{
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->count_used = 0;
+	oldSnapshotControl->current_timestamp = 0;
+	oldSnapshotControl->head_offset = 0;
+	oldSnapshotControl->head_timestamp = 0;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	oldSnapshotControl->latest_xmin = InvalidTransactionId;
+	oldSnapshotControl->next_map_update = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = 0;
+	oldSnapshotControl->threshold_xid = InvalidTransactionId;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Update the minimum time used in snapshot-too-old code.  If set ahead of the
+ * current wall clock time (for example, the year 3000), this allows testing
+ * with arbitrary times.  This is not safe except in carefully controlled
+ * tests.
+ */
+Datum
+test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/snapshot_too_old/test_sto.control b/src/test/modules/snapshot_too_old/test_sto.control
new file mode 100644
index 0000000000..e497e5450e
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.control
@@ -0,0 +1,5 @@
+# test_sto test module
+comment = 'functions for internal testing of snapshot too old errors'
+default_version = '1.0'
+module_pathname = '$libdir/test_sto'
+relocatable = true
-- 
2.20.1

v6-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload

From 0725a21385afd4356452f3951f8613a0d7e50b31 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 17:05:42 +1200
Subject: [PATCH v6 5/6] Truncate snapshot-too-old time map when CLOG is
 truncated.

It's not safe to leave xids in the map that have wrapped around,
although it's probably very hard to actually reach that state.

Reported-by: Andres Freund
---
 src/backend/commands/vacuum.c                 |   3 +
 src/backend/utils/time/snapmgr.c              |  21 ++++
 src/include/utils/snapmgr.h                   |   1 +
 src/test/modules/snapshot_too_old/Makefile    |   4 +-
 .../snapshot_too_old/t/001_truncate.pl        | 100 ++++++++++++++++++
 5 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/t/001_truncate.pl

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 22228f5684..459c9126fc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1645,6 +1645,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index a94465235d..6958df3265 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1995,6 +1995,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b6b403e293..4560f1f03b 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -141,6 +141,7 @@ extern bool TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 extern void SetOldSnapshotThresholdTimestamp(TimestampTz ts, TransactionId xlimit);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index 81836e9953..f919944984 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -9,7 +9,7 @@ EXTENSION = test_sto
 DATA = test_sto--1.0.sql
 PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
 
-EXTRA_INSTALL = contrib/pg_visibility
+EXTRA_INSTALL = contrib/pg_visibility contrib/old_snapshot
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
@@ -19,6 +19,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/s
 # because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/snapshot_too_old/t/001_truncate.pl b/src/test/modules/snapshot_too_old/t/001_truncate.pl
new file mode 100644
index 0000000000..afcca232f2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/001_truncate.pl
@@ -0,0 +1,100 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->append_conf("postgresql.conf", "autovacuum=off");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension old_snapshot');
+$node->psql('postgres', 'create extension test_sto');
+
+note "check time map is truncated when CLOG is";
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub advance_xid
+{
+	my $time = shift;
+	$node->psql('postgres', "select pg_current_xact_id()");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+sub vacuum_freeze_all
+{
+	$node->psql('postgres', 'vacuum freeze');
+	$node->psql('template0', 'vacuum freeze');
+	$node->psql('template1', 'vacuum freeze');
+}
+
+# build up a time map with 4 entries
+set_time('3000-01-01 00:00:00Z');
+advance_xid();
+set_time('3000-01-01 00:01:00Z');
+advance_xid();
+set_time('3000-01-01 00:02:00Z');
+advance_xid();
+set_time('3000-01-01 00:03:00Z');
+advance_xid();
+is(summarize_mapping(), "4|00:00:00|00:03:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we expect all XIDs to have been truncated
+is(summarize_mapping(), "0||");
+
+# put two more in the map
+set_time('3000-01-01 00:04:00Z');
+advance_xid();
+set_time('3000-01-01 00:05:00Z');
+advance_xid();
+is(summarize_mapping(), "2|00:04:00|00:05:00");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes; we should now have 18
+set_time('3000-01-01 00:21:00Z');
+advance_xid();
+is(summarize_mapping(), "18|00:04:00|00:21:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# this should leave just 16, because 2 were truncated
+is(summarize_mapping(), "16|00:06:00|00:21:00");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we should now be back to empty
+is(summarize_mapping(), "0||");
+
+$node->stop;
-- 
2.20.1

v6-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchtext/x-patch; charset=US-ASCII; name=v6-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchDownload

From c7cfeb8adb91e8a00b0ccfd9fbb925fdcf209872 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 21 Apr 2020 20:48:20 +1200
Subject: [PATCH v6 6/6] Add TAP test for snapshot too old time map
 maintenance.

---
 .../t/002_xid_map_maintenance.pl              | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl

diff --git a/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
new file mode 100644
index 0000000000..eddd0ce5ae
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
@@ -0,0 +1,63 @@
+# Test xid various time/xid map maintenance edge cases
+# that were historically buggy.
+
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "autovacuum = off");
+$node->start;
+$node->psql('postgres', 'create extension test_sto');
+$node->psql('postgres', 'create extension old_snapshot');
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+# fill the map up to maximum capacity
+set_time('3000-01-01 00:00:00Z');
+set_time('3000-01-01 00:19:00Z');
+is(summarize_mapping(), "20|00:00:00|00:19:00");
+
+# make a jump larger than capacity; the mapping is blown away,
+# and our new minute is now the only one
+set_time('3000-01-01 02:00:00Z');
+is(summarize_mapping(), "1|02:00:00|02:00:00");
+
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");
+
+# test adding minutes while the map is full
+set_time('3000-01-01 02:20:00Z');
+is(summarize_mapping(), "20|02:01:00|02:20:00");
+set_time('3000-01-01 02:22:00Z');
+is(summarize_mapping(), "20|02:03:00|02:22:00");
+set_time('3000-01-01 02:22:01Z'); # one second past
+is(summarize_mapping(), "20|02:04:00|02:23:00");
+
+$node->stop;
-- 
2.20.1

#21

thomas.munro@gmail.com

over 5 years ago

In reply to: Thomas Munro (#20)

6 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Here's a rebase.

And another, since I was too slow and v6 is already in conflict...
sorry for the high frequency patches.

Attachments:

v7-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Expose-oldSnapshotControl.patchDownload

From 1ced7b8c881676f21623c048f5e9e012ca8416ec Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v7 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +----------------------
 src/include/utils/old_snapshot.h | 75 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 752af0c10d..6cfb07e82b 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 0000000000..284af7d508
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
+	TransactionId latest_xmin;	/* latest snapshot xmin */
+	TimestampTz next_map_update;	/* latest snapshot valid up to */
+	slock_t		mutex_threshold;	/* protect threshold fields */
+	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;	/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
+	 * is full and the head must be advanced to add new entries.  Use
+	 * timestamps aligned to minute boundaries, since that seems less
+	 * surprising than aligning based on the first usage timestamp.  The
+	 * latest bucket is effectively stored within latest_xmin.  The circular
+	 * buffer is updated when we get a new xmin value that doesn't fall into
+	 * the same interval.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;	/* subscript of oldest tracked time */
+	TimestampTz head_timestamp; /* time corresponding to head xid */
+	int			count_used;		/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotControlData;
+
+extern volatile OldSnapshotControlData *oldSnapshotControl;
+
+#endif
-- 
2.20.1

v7-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v7-0002-contrib-old_snapshot-time-xid-mapping.patchDownload

From 9de038c7663802feaa5567450f70b22979998d40 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:14:32 -0400
Subject: [PATCH v7 2/6] contrib/old_snapshot: time->xid mapping.

---
 contrib/Makefile                           |   1 +
 contrib/old_snapshot/Makefile              |  24 ++++
 contrib/old_snapshot/old_snapshot--1.0.sql |  14 ++
 contrib/old_snapshot/old_snapshot.control  |   5 +
 contrib/old_snapshot/time_mapping.c        | 159 +++++++++++++++++++++
 5 files changed, 203 insertions(+)
 create mode 100644 contrib/old_snapshot/Makefile
 create mode 100644 contrib/old_snapshot/old_snapshot--1.0.sql
 create mode 100644 contrib/old_snapshot/old_snapshot.control
 create mode 100644 contrib/old_snapshot/time_mapping.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..452ade0782 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		old_snapshot	\
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
new file mode 100644
index 0000000000..091231f25f
--- /dev/null
+++ b/contrib/old_snapshot/Makefile
@@ -0,0 +1,24 @@
+# contrib/old_snapshot/Makefile
+
+MODULE_big = old_snapshot
+OBJS = \
+	$(WIN32RES) \
+	time_mapping.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = old_snapshot
+DATA = old_snapshot--1.0.sql
+PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
+
+REGRESS = old_snapshot
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/old_snapshot
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
new file mode 100644
index 0000000000..9ebb8829e3
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -0,0 +1,14 @@
+/* contrib/old_snapshot/old_snapshot--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION old_snapshot" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_old_snapshot_time_mapping(array_offset OUT int4,
+											 end_timestamp OUT timestamptz,
+											 newest_xmin OUT xid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
+LANGUAGE C STRICT;
+
+-- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/old_snapshot.control b/contrib/old_snapshot/old_snapshot.control
new file mode 100644
index 0000000000..491eec536c
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot.control
@@ -0,0 +1,5 @@
+# old_snapshot extension
+comment = 'utilities in support of old_snapshot_threshold'
+default_version = '1.0'
+module_pathname = '$libdir/old_snapshot'
+relocatable = true
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
new file mode 100644
index 0000000000..37e0055a00
--- /dev/null
+++ b/contrib/old_snapshot/time_mapping.c
@@ -0,0 +1,159 @@
+/*-------------------------------------------------------------------------
+ *
+ * time_mapping.c
+ *	  time to XID mapping information
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  contrib/old_snapshot/time_mapping.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+/*
+ * Backend-private copy of the information from oldSnapshotControl which relates
+ * to the time to XID mapping, plus an index so that we can iterate.
+ *
+ * Note that the length of the xid_by_minute array is given by
+ * OLD_SNAPSHOT_TIME_MAP_ENTRIES (which is not a compile-time constant).
+ */
+typedef struct
+{
+	int				current_index;
+	int				head_offset;
+	TimestampTz		head_timestamp;
+	int				count_used;
+	TransactionId	xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotTimeMapping;
+
+#define NUM_TIME_MAPPING_COLUMNS 3
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+
+static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
+static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
+static HeapTuple MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc,
+												 OldSnapshotTimeMapping *mapping);
+
+/*
+ * SQL-callable set-returning function.
+ */
+Datum
+pg_old_snapshot_time_mapping(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	OldSnapshotTimeMapping *mapping;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+		mapping = GetOldSnapshotTimeMapping();
+		funcctx->user_fctx = mapping;
+		funcctx->tuple_desc = MakeOldSnapshotTimeMappingTupleDesc();
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	mapping = (OldSnapshotTimeMapping *) funcctx->user_fctx;
+
+	while (mapping->current_index < mapping->count_used)
+	{
+		HeapTuple	tuple;
+
+		tuple = MakeOldSnapshotTimeMappingTuple(funcctx->tuple_desc, mapping);
+		++mapping->current_index;
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Get the old snapshot time mapping data from shared memory.
+ */
+static OldSnapshotTimeMapping *
+GetOldSnapshotTimeMapping(void)
+{
+	OldSnapshotTimeMapping *mapping;
+
+	mapping = palloc(offsetof(OldSnapshotTimeMapping, xid_by_minute)
+					 + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
+	mapping->current_index = 0;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	mapping->head_offset = oldSnapshotControl->head_offset;
+	mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+	mapping->count_used = oldSnapshotControl->count_used;
+	for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+		mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	return mapping;
+}
+
+/*
+ * Build a tuple descriptor for the pg_old_snapshot_time_mapping() SRF.
+ */
+static TupleDesc
+MakeOldSnapshotTimeMappingTupleDesc(void)
+{
+	TupleDesc	tupdesc;
+
+	tupdesc = CreateTemplateTupleDesc(NUM_TIME_MAPPING_COLUMNS);
+
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "array_offset",
+					   INT4OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_timestamp",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "newest_xmin",
+					   XIDOID, -1, 0);
+
+	return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Convert one entry from the old snapshot time mapping to a HeapTuple.
+ */
+static HeapTuple
+MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mapping)
+{
+	Datum	values[NUM_TIME_MAPPING_COLUMNS];
+	bool	nulls[NUM_TIME_MAPPING_COLUMNS];
+	int		array_position;
+	TimestampTz	timestamp;
+
+	/*
+	 * Figure out the array position corresponding to the current index.
+	 *
+	 * Index 0 means the oldest entry in the mapping, which is stored at
+	 * mapping->head_offset. Index 1 means the next-oldest entry, which is a the
+	 * following index, and so on. We wrap around when we reach the end of the array.
+	 */
+	array_position = (mapping->head_offset + mapping->current_index)
+		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+
+	/*
+	 * No explicit timestamp is stored for any entry other than the oldest one,
+	 * but each entry corresponds to 1-minute period, so we can just add.
+	 */
+	timestamp = TimestampTzPlusMilliseconds(mapping->head_timestamp,
+											mapping->current_index * 60000);
+
+	/* Initialize nulls and values arrays. */
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = Int32GetDatum(array_position);
+	values[1] = TimestampTzGetDatum(timestamp);
+	values[2] = TransactionIdGetDatum(mapping->xid_by_minute[array_position]);
+
+	return heap_form_tuple(tupdesc, values, nulls);
+}
-- 
2.20.1

v7-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v7-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload

From de7937f2c5fc56fff35cece046baf5d511974ac2 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:15:57 -0400
Subject: [PATCH v7 3/6] Fix bugs in MaintainOldSnapshotTimeMapping.

---
 src/backend/utils/time/snapmgr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 6cfb07e82b..781ab434a3 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1944,10 +1944,32 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	else
 	{
 		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
+		int			distance_to_new_tail;
+		int			distance_to_current_tail;
+		int			advance;
 
-		oldSnapshotControl->head_timestamp = ts;
+		/*
+		 * Our goal is for the new "tail" of the mapping, that is, the entry
+		 * which is newest and thus furthest from the "head" entry, to
+		 * correspond to "ts". Since there's one entry per minute, the
+		 * distance between the current head and the new tail is just the
+		 * number of minutes of difference between ts and the current
+		 * head_timestamp.
+		 *
+		 * The distance from the current head to the current tail is one
+		 * less than the number of entries in the mapping, because the
+		 * entry at the head_offset is for 0 minutes after head_timestamp.
+		 *
+		 * The difference between these two values is the number of minutes
+		 * by which we need to advance the mapping, either adding new entries
+		 * or rotating old ones out.
+		 */
+		distance_to_new_tail =
+			(ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE;
+		distance_to_current_tail =
+			oldSnapshotControl->count_used - 1;
+		advance = distance_to_new_tail - distance_to_current_tail;
+		Assert(advance > 0);
 
 		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
 		{
@@ -1955,6 +1977,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 			oldSnapshotControl->head_offset = 0;
 			oldSnapshotControl->count_used = 1;
 			oldSnapshotControl->xid_by_minute[0] = xmin;
+			oldSnapshotControl->head_timestamp = ts;
 		}
 		else
 		{
@@ -1973,6 +1996,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 					else
 						oldSnapshotControl->head_offset = old_head + 1;
 					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+					oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
 				}
 				else
 				{
-- 
2.20.1

v7-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v7-0004-Rewrite-the-snapshot_too_old-tests.patchDownload

From 0c21a87cabe2ffa02745d191713dc7f89a26cef8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 16:23:02 +1200
Subject: [PATCH v7 4/6] Rewrite the "snapshot_too_old" tests.

Previously the snapshot too old feature used a special test value for
old_snapshot_threshold.  Instead, use a new approach based on explicitly
advancing the "current" timestamp used in snapshot-too-old book keeping,
so that we can control the timeline precisely without having to resort
to sleeping or special test branches in the code.

Also check that early pruning actually happens, by vacuuming and
inspecting the visibility map at key points in the test schedule.
---
 src/backend/utils/time/snapmgr.c              |  21 ---
 src/test/modules/snapshot_too_old/Makefile    |  23 +--
 .../expected/sto_using_cursor.out             |  75 ++++-----
 .../expected/sto_using_hash_index.out         |  26 ++-
 .../expected/sto_using_select.out             | 157 +++++++++++++++---
 .../specs/sto_using_cursor.spec               |  30 ++--
 .../specs/sto_using_hash_index.spec           |  19 ++-
 .../specs/sto_using_select.spec               |  50 ++++--
 src/test/modules/snapshot_too_old/sto.conf    |   2 +-
 .../snapshot_too_old/test_sto--1.0.sql        |  14 ++
 src/test/modules/snapshot_too_old/test_sto.c  |  74 +++++++++
 .../modules/snapshot_too_old/test_sto.control |   5 +
 12 files changed, 366 insertions(+), 130 deletions(-)
 create mode 100644 src/test/modules/snapshot_too_old/test_sto--1.0.sql
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.c
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.control

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 781ab434a3..7fe283bf29 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1769,23 +1769,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 	next_map_update_ts = oldSnapshotControl->next_map_update;
 	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
-	/*
-	 * Zero threshold always overrides to latest xmin, if valid.  Without some
-	 * heuristic it will find its own snapshot too old on, for example, a
-	 * simple UPDATE -- which would make it useless for most testing, but
-	 * there is no principled way to ensure that it doesn't fail in this way.
-	 * Use a five-second delay to try to get useful testing behavior, but this
-	 * may need adjustment.
-	 */
-	if (old_snapshot_threshold == 0)
-	{
-		if (TransactionIdPrecedes(latest_xmin, MyProc->xmin)
-			&& TransactionIdFollows(latest_xmin, xlimit))
-			xlimit = latest_xmin;
-
-		ts -= 5 * USECS_PER_SEC;
-	}
-	else
 	{
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
@@ -1878,10 +1861,6 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	if (!map_update_required)
 		return;
 
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
 	/*
 	 * We don't want to do something stupid with unusual values, but we don't
 	 * want to litter the log with warnings or break otherwise normal
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..81836e9953 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -1,14 +1,22 @@
 # src/test/modules/snapshot_too_old/Makefile
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+MODULE_big = test_sto
+OBJS = \
+	$(WIN32RES) \
+	test_sto.o
+
+EXTENSION = test_sto
+DATA = test_sto--1.0.sql
+PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
+
+EXTRA_INSTALL = contrib/pg_visibility
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
-# Disabled because these tests require "old_snapshot_threshold" >= 0, which
-# typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require "old_snapshot_threshold" = 10, which
+# typical installcheck users do not have (e.g. buildfarm clients) and also
+# because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
@@ -21,8 +29,3 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# But it can nonetheless be very helpful to run tests on preexisting
-# installation, allow to do so, but only if requested explicitly.
-installcheck-force:
-	$(pg_isolation_regress_installcheck) $(ISOLATION)
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
index 8cc29ec82f..b007e2dc17 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -1,73 +1,60 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1decl s1f1 t10 s2u s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s1sleep s2u s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
-c              
+               
+step s1f3: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+test_sto_reset_all_state
 
-1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+               
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+starting permutation: t00 s1decl s1f1 t10 s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s2u s1sleep s1f2
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
-
-starting permutation: s1decl s2u s1f1 s1sleep s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
-
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s2u s1decl s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
+               
+step s1f3: FETCH FIRST FROM cursor1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+test_sto_reset_all_state
 
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
index bf94054790..091c212adc 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
@@ -1,15 +1,31 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: noseq s1f1 s2sleep s2u s1f2
+starting permutation: t00 noseq s1f1 t10 s2u s2v1 s1f2 t22 s2v2 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step noseq: SET enable_seqscan = false;
 step s1f1: SELECT c FROM sto1 where c = 1000;
 c              
 
 1000           
-step s2sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1000;
+step s2v1: VACUUM sto1;
 step s1f2: SELECT c FROM sto1 where c = 1001;
+c              
+
+step t22: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2v2: VACUUM sto1;
+step s1f3: SELECT c FROM sto1 where c = 1001;
 ERROR:  snapshot too old
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
index eb15bc23bf..ba27bc5261 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_select.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -1,55 +1,164 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s1f4 s2vac2 s2vis4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s1sleep s2u s1f2
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+test_sto_reset_all_state
+
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s2u s1sleep s1f2
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+test_sto_reset_all_state
 
-starting permutation: s2u s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+c              
+
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
index eac18ca5b9..3be084b8fe 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using a cursor.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+    CREATE EXTENSION IF NOT EXISTS test_sto;
+    SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,16 +17,29 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+    DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
 step "s1f1"		{ FETCH FIRST FROM cursor1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ FETCH FIRST FROM cursor1; }
+step "s1f3"		{ FETCH FIRST FROM cursor1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t20"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:20 (not before,
+# because we need page pruning to see the xmin level change from 10 minutes earlier)
+permutation "t00" "s1decl" "s1f1" "t10" "s2u" "s1f2" "t20" "s1f3"
+
+# if there's no update, no snapshot too old error at time 00:20
+permutation "t00" "s1decl" "s1f1" "t10"       "s1f2" "t20" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
index 33d91ff852..f90bca3b7a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
@@ -1,8 +1,12 @@
 # This test is like sto_using_select, except that we test access via a
-# hash index.
+# hash index.  Explicit vacuuming is required in this version because
+# there is are no incidental calls to heap_page_prune_opt() that can
+# call SetOldSnapshotThresholdTimestamp().
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
     CREATE INDEX idx_sto1 ON sto1 USING HASH (c);
@@ -15,6 +19,7 @@ setup
 teardown
 {
     DROP TABLE sto1;
+	SELECT test_sto_reset_all_state();
 }
 
 session "s1"
@@ -22,10 +27,18 @@ setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "noseq"	{ SET enable_seqscan = false; }
 step "s1f1"		{ SELECT c FROM sto1 where c = 1000; }
 step "s1f2"		{ SELECT c FROM sto1 where c = 1001; }
+step "s1f3"		{ SELECT c FROM sto1 where c = 1001; }
 teardown		{ ROLLBACK; }
 
 session "s2"
-step "s2sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1000; }
+step "s2v1"		{ VACUUM sto1; }
+step "s2v2"		{ VACUUM sto1; }
 
-permutation "noseq" "s1f1" "s2sleep" "s2u" "s1f2"
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t22"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z'); }
+
+# snapshot too old at t22
+permutation "t00" "noseq" "s1f1" "t10" "s2u" "s2v1" "s1f2" "t22" "s2v2" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
index d7c34f3d89..c7917d1b0a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -1,19 +1,15 @@
 # This test provokes a "snapshot too old" error using SELECT statements.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	CREATE EXTENSION IF NOT EXISTS pg_visibility;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,15 +18,47 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+	DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f3"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f4"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
 teardown		{ COMMIT; }
 
 session "s2"
+step "s2vis1"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+step "s2vis2"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac1"	{ VACUUM sto1; }
+step "s2vis3"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac2"	{ VACUUM sto1; }
+step "s2vis4"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t01"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t12"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z'); }
+
+# If there's an update, we get a snapshot too old error at time 00:12, and
+# VACUUM is allowed to remove the tuple our snapshot could see, which we know
+# because we see that the relation becomes all visible.  The earlier VACUUMs
+# were unable to remove the tuple we could see, which is is obvious because we
+# can see the row with value 1, and from the relation not being all visible
+# after the VACUUM.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s1f4" "s2vac2" "s2vis4"
+
+# Almost the same schedule, but this time we'll put s2vac2 and s2vis4 before
+# s1f4 just to demonstrate that the early pruning is allowed before the error
+# aborts s1's transaction.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
+
+# If we run the same schedule as above but without the update, we get no
+# snapshot too old error (even though our snapshot is older than the
+# threshold), and the relation remains all visible.
+permutation "t00" "s2vis1" "s1f1" "t01"       "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
index 7eeaeeb0dc..5ed46b3560 100644
--- a/src/test/modules/snapshot_too_old/sto.conf
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -1,2 +1,2 @@
 autovacuum = off
-old_snapshot_threshold = 0
+old_snapshot_threshold = 10
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
new file mode 100644
index 0000000000..c10afcf23a
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -0,0 +1,14 @@
+/* src/test/modules/snapshot_too_old/test_sto--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_sto" to load this file. \quit
+
+CREATE FUNCTION test_sto_clobber_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_clobber_snapshot_timestamp'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_reset_all_state()
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
new file mode 100644
index 0000000000..f6c9a1a000
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_sto.c
+ *	  Functions to support isolation tests for snapshot too old.
+ *
+ * These functions are not intended for use in a production database and
+ * could cause corruption.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  src/test/modules/snapshot_too_old/test_sto.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
+PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+
+/*
+ * Revert to initial state.  This is not safe except in carefully
+ * controlled tests.
+ */
+Datum
+test_sto_reset_all_state(PG_FUNCTION_ARGS)
+{
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->count_used = 0;
+	oldSnapshotControl->current_timestamp = 0;
+	oldSnapshotControl->head_offset = 0;
+	oldSnapshotControl->head_timestamp = 0;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	oldSnapshotControl->latest_xmin = InvalidTransactionId;
+	oldSnapshotControl->next_map_update = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = 0;
+	oldSnapshotControl->threshold_xid = InvalidTransactionId;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Update the minimum time used in snapshot-too-old code.  If set ahead of the
+ * current wall clock time (for example, the year 3000), this allows testing
+ * with arbitrary times.  This is not safe except in carefully controlled
+ * tests.
+ */
+Datum
+test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/snapshot_too_old/test_sto.control b/src/test/modules/snapshot_too_old/test_sto.control
new file mode 100644
index 0000000000..e497e5450e
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.control
@@ -0,0 +1,5 @@
+# test_sto test module
+comment = 'functions for internal testing of snapshot too old errors'
+default_version = '1.0'
+module_pathname = '$libdir/test_sto'
+relocatable = true
-- 
2.20.1

v7-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v7-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload

From 136a7c80a2cc0d6d7e407a5642f612b8aa31fd40 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 17:05:42 +1200
Subject: [PATCH v7 5/6] Truncate snapshot-too-old time map when CLOG is
 truncated.

It's not safe to leave xids in the map that have wrapped around,
although it's probably very hard to actually reach that state.

Reported-by: Andres Freund
---
 src/backend/commands/vacuum.c                 |   3 +
 src/backend/utils/time/snapmgr.c              |  21 ++++
 src/include/utils/snapmgr.h                   |   1 +
 src/test/modules/snapshot_too_old/Makefile    |   4 +-
 .../snapshot_too_old/t/001_truncate.pl        | 100 ++++++++++++++++++
 5 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/t/001_truncate.pl

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 22228f5684..459c9126fc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1645,6 +1645,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 7fe283bf29..0237e62f7d 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1995,6 +1995,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b6b403e293..4560f1f03b 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -141,6 +141,7 @@ extern bool TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 extern void SetOldSnapshotThresholdTimestamp(TimestampTz ts, TransactionId xlimit);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index 81836e9953..f919944984 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -9,7 +9,7 @@ EXTENSION = test_sto
 DATA = test_sto--1.0.sql
 PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
 
-EXTRA_INSTALL = contrib/pg_visibility
+EXTRA_INSTALL = contrib/pg_visibility contrib/old_snapshot
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
@@ -19,6 +19,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/s
 # because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/snapshot_too_old/t/001_truncate.pl b/src/test/modules/snapshot_too_old/t/001_truncate.pl
new file mode 100644
index 0000000000..afcca232f2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/001_truncate.pl
@@ -0,0 +1,100 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->append_conf("postgresql.conf", "autovacuum=off");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension old_snapshot');
+$node->psql('postgres', 'create extension test_sto');
+
+note "check time map is truncated when CLOG is";
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub advance_xid
+{
+	my $time = shift;
+	$node->psql('postgres', "select pg_current_xact_id()");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+sub vacuum_freeze_all
+{
+	$node->psql('postgres', 'vacuum freeze');
+	$node->psql('template0', 'vacuum freeze');
+	$node->psql('template1', 'vacuum freeze');
+}
+
+# build up a time map with 4 entries
+set_time('3000-01-01 00:00:00Z');
+advance_xid();
+set_time('3000-01-01 00:01:00Z');
+advance_xid();
+set_time('3000-01-01 00:02:00Z');
+advance_xid();
+set_time('3000-01-01 00:03:00Z');
+advance_xid();
+is(summarize_mapping(), "4|00:00:00|00:03:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we expect all XIDs to have been truncated
+is(summarize_mapping(), "0||");
+
+# put two more in the map
+set_time('3000-01-01 00:04:00Z');
+advance_xid();
+set_time('3000-01-01 00:05:00Z');
+advance_xid();
+is(summarize_mapping(), "2|00:04:00|00:05:00");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes; we should now have 18
+set_time('3000-01-01 00:21:00Z');
+advance_xid();
+is(summarize_mapping(), "18|00:04:00|00:21:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# this should leave just 16, because 2 were truncated
+is(summarize_mapping(), "16|00:06:00|00:21:00");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we should now be back to empty
+is(summarize_mapping(), "0||");
+
+$node->stop;
-- 
2.20.1

v7-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchtext/x-patch; charset=US-ASCII; name=v7-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchDownload

From bc29fbe82b81e0f19ade8cb12076cc27ef02c24a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 21 Apr 2020 20:48:20 +1200
Subject: [PATCH v7 6/6] Add TAP test for snapshot too old time map
 maintenance.

---
 .../t/002_xid_map_maintenance.pl              | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl

diff --git a/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
new file mode 100644
index 0000000000..eddd0ce5ae
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
@@ -0,0 +1,63 @@
+# Test xid various time/xid map maintenance edge cases
+# that were historically buggy.
+
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "autovacuum = off");
+$node->start;
+$node->psql('postgres', 'create extension test_sto');
+$node->psql('postgres', 'create extension old_snapshot');
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+# fill the map up to maximum capacity
+set_time('3000-01-01 00:00:00Z');
+set_time('3000-01-01 00:19:00Z');
+is(summarize_mapping(), "20|00:00:00|00:19:00");
+
+# make a jump larger than capacity; the mapping is blown away,
+# and our new minute is now the only one
+set_time('3000-01-01 02:00:00Z');
+is(summarize_mapping(), "1|02:00:00|02:00:00");
+
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");
+
+# test adding minutes while the map is full
+set_time('3000-01-01 02:20:00Z');
+is(summarize_mapping(), "20|02:01:00|02:20:00");
+set_time('3000-01-01 02:22:00Z');
+is(summarize_mapping(), "20|02:03:00|02:22:00");
+set_time('3000-01-01 02:22:01Z'); # one second past
+is(summarize_mapping(), "20|02:04:00|02:23:00");
+
+$node->stop;
-- 
2.20.1

#22

thomas.munro@gmail.com

over 5 years ago

In reply to: Thomas Munro (#21)

6 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Fri, Aug 14, 2020 at 1:04 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Aug 14, 2020 at 12:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Here's a rebase.

And another, since I was too slow and v6 is already in conflict...
sorry for the high frequency patches.

And ... now that this has a commitfest entry, cfbot told me about a
small problem in a makefile. Third time lucky?

Attachments:

v8-0001-Expose-oldSnapshotControl.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Expose-oldSnapshotControl.patchDownload

From c28576db2eaf7d9c68d0bb3566599638dd79deb1 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v8 1/6] Expose oldSnapshotControl.

---
 src/backend/utils/time/snapmgr.c | 55 +----------------------
 src/include/utils/old_snapshot.h | 75 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 752af0c10d..6cfb07e82b 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -74,59 +75,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 0000000000..284af7d508
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
+	TransactionId latest_xmin;	/* latest snapshot xmin */
+	TimestampTz next_map_update;	/* latest snapshot valid up to */
+	slock_t		mutex_threshold;	/* protect threshold fields */
+	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;	/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
+	 * is full and the head must be advanced to add new entries.  Use
+	 * timestamps aligned to minute boundaries, since that seems less
+	 * surprising than aligning based on the first usage timestamp.  The
+	 * latest bucket is effectively stored within latest_xmin.  The circular
+	 * buffer is updated when we get a new xmin value that doesn't fall into
+	 * the same interval.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;	/* subscript of oldest tracked time */
+	TimestampTz head_timestamp; /* time corresponding to head xid */
+	int			count_used;		/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotControlData;
+
+extern volatile OldSnapshotControlData *oldSnapshotControl;
+
+#endif
-- 
2.20.1

v8-0002-contrib-old_snapshot-time-xid-mapping.patchtext/x-patch; charset=US-ASCII; name=v8-0002-contrib-old_snapshot-time-xid-mapping.patchDownload

From 312fcda0adeac6aca862dc83744eee1b0c693453 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:14:32 -0400
Subject: [PATCH v8 2/6] contrib/old_snapshot: time->xid mapping.

---
 contrib/Makefile                           |   1 +
 contrib/old_snapshot/Makefile              |  22 +++
 contrib/old_snapshot/old_snapshot--1.0.sql |  14 ++
 contrib/old_snapshot/old_snapshot.control  |   5 +
 contrib/old_snapshot/time_mapping.c        | 159 +++++++++++++++++++++
 5 files changed, 201 insertions(+)
 create mode 100644 contrib/old_snapshot/Makefile
 create mode 100644 contrib/old_snapshot/old_snapshot--1.0.sql
 create mode 100644 contrib/old_snapshot/old_snapshot.control
 create mode 100644 contrib/old_snapshot/time_mapping.c

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..452ade0782 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		old_snapshot	\
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
new file mode 100644
index 0000000000..77c85df322
--- /dev/null
+++ b/contrib/old_snapshot/Makefile
@@ -0,0 +1,22 @@
+# contrib/old_snapshot/Makefile
+
+MODULE_big = old_snapshot
+OBJS = \
+	$(WIN32RES) \
+	time_mapping.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = old_snapshot
+DATA = old_snapshot--1.0.sql
+PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/old_snapshot
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
new file mode 100644
index 0000000000..9ebb8829e3
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -0,0 +1,14 @@
+/* contrib/old_snapshot/old_snapshot--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION old_snapshot" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_old_snapshot_time_mapping(array_offset OUT int4,
+											 end_timestamp OUT timestamptz,
+											 newest_xmin OUT xid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
+LANGUAGE C STRICT;
+
+-- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/old_snapshot.control b/contrib/old_snapshot/old_snapshot.control
new file mode 100644
index 0000000000..491eec536c
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot.control
@@ -0,0 +1,5 @@
+# old_snapshot extension
+comment = 'utilities in support of old_snapshot_threshold'
+default_version = '1.0'
+module_pathname = '$libdir/old_snapshot'
+relocatable = true
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
new file mode 100644
index 0000000000..37e0055a00
--- /dev/null
+++ b/contrib/old_snapshot/time_mapping.c
@@ -0,0 +1,159 @@
+/*-------------------------------------------------------------------------
+ *
+ * time_mapping.c
+ *	  time to XID mapping information
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  contrib/old_snapshot/time_mapping.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+/*
+ * Backend-private copy of the information from oldSnapshotControl which relates
+ * to the time to XID mapping, plus an index so that we can iterate.
+ *
+ * Note that the length of the xid_by_minute array is given by
+ * OLD_SNAPSHOT_TIME_MAP_ENTRIES (which is not a compile-time constant).
+ */
+typedef struct
+{
+	int				current_index;
+	int				head_offset;
+	TimestampTz		head_timestamp;
+	int				count_used;
+	TransactionId	xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotTimeMapping;
+
+#define NUM_TIME_MAPPING_COLUMNS 3
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+
+static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
+static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
+static HeapTuple MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc,
+												 OldSnapshotTimeMapping *mapping);
+
+/*
+ * SQL-callable set-returning function.
+ */
+Datum
+pg_old_snapshot_time_mapping(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	OldSnapshotTimeMapping *mapping;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+		mapping = GetOldSnapshotTimeMapping();
+		funcctx->user_fctx = mapping;
+		funcctx->tuple_desc = MakeOldSnapshotTimeMappingTupleDesc();
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	mapping = (OldSnapshotTimeMapping *) funcctx->user_fctx;
+
+	while (mapping->current_index < mapping->count_used)
+	{
+		HeapTuple	tuple;
+
+		tuple = MakeOldSnapshotTimeMappingTuple(funcctx->tuple_desc, mapping);
+		++mapping->current_index;
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Get the old snapshot time mapping data from shared memory.
+ */
+static OldSnapshotTimeMapping *
+GetOldSnapshotTimeMapping(void)
+{
+	OldSnapshotTimeMapping *mapping;
+
+	mapping = palloc(offsetof(OldSnapshotTimeMapping, xid_by_minute)
+					 + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
+	mapping->current_index = 0;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	mapping->head_offset = oldSnapshotControl->head_offset;
+	mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+	mapping->count_used = oldSnapshotControl->count_used;
+	for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+		mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	return mapping;
+}
+
+/*
+ * Build a tuple descriptor for the pg_old_snapshot_time_mapping() SRF.
+ */
+static TupleDesc
+MakeOldSnapshotTimeMappingTupleDesc(void)
+{
+	TupleDesc	tupdesc;
+
+	tupdesc = CreateTemplateTupleDesc(NUM_TIME_MAPPING_COLUMNS);
+
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "array_offset",
+					   INT4OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_timestamp",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "newest_xmin",
+					   XIDOID, -1, 0);
+
+	return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Convert one entry from the old snapshot time mapping to a HeapTuple.
+ */
+static HeapTuple
+MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mapping)
+{
+	Datum	values[NUM_TIME_MAPPING_COLUMNS];
+	bool	nulls[NUM_TIME_MAPPING_COLUMNS];
+	int		array_position;
+	TimestampTz	timestamp;
+
+	/*
+	 * Figure out the array position corresponding to the current index.
+	 *
+	 * Index 0 means the oldest entry in the mapping, which is stored at
+	 * mapping->head_offset. Index 1 means the next-oldest entry, which is a the
+	 * following index, and so on. We wrap around when we reach the end of the array.
+	 */
+	array_position = (mapping->head_offset + mapping->current_index)
+		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+
+	/*
+	 * No explicit timestamp is stored for any entry other than the oldest one,
+	 * but each entry corresponds to 1-minute period, so we can just add.
+	 */
+	timestamp = TimestampTzPlusMilliseconds(mapping->head_timestamp,
+											mapping->current_index * 60000);
+
+	/* Initialize nulls and values arrays. */
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = Int32GetDatum(array_position);
+	values[1] = TimestampTzGetDatum(timestamp);
+	values[2] = TransactionIdGetDatum(mapping->xid_by_minute[array_position]);
+
+	return heap_form_tuple(tupdesc, values, nulls);
+}
-- 
2.20.1

v8-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchtext/x-patch; charset=US-ASCII; name=v8-0003-Fix-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload

From 56fa7dd44060f5088cb839c19529f29d39929c5f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:15:57 -0400
Subject: [PATCH v8 3/6] Fix bugs in MaintainOldSnapshotTimeMapping.

---
 src/backend/utils/time/snapmgr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 6cfb07e82b..781ab434a3 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1944,10 +1944,32 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	else
 	{
 		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
+		int			distance_to_new_tail;
+		int			distance_to_current_tail;
+		int			advance;
 
-		oldSnapshotControl->head_timestamp = ts;
+		/*
+		 * Our goal is for the new "tail" of the mapping, that is, the entry
+		 * which is newest and thus furthest from the "head" entry, to
+		 * correspond to "ts". Since there's one entry per minute, the
+		 * distance between the current head and the new tail is just the
+		 * number of minutes of difference between ts and the current
+		 * head_timestamp.
+		 *
+		 * The distance from the current head to the current tail is one
+		 * less than the number of entries in the mapping, because the
+		 * entry at the head_offset is for 0 minutes after head_timestamp.
+		 *
+		 * The difference between these two values is the number of minutes
+		 * by which we need to advance the mapping, either adding new entries
+		 * or rotating old ones out.
+		 */
+		distance_to_new_tail =
+			(ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE;
+		distance_to_current_tail =
+			oldSnapshotControl->count_used - 1;
+		advance = distance_to_new_tail - distance_to_current_tail;
+		Assert(advance > 0);
 
 		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
 		{
@@ -1955,6 +1977,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 			oldSnapshotControl->head_offset = 0;
 			oldSnapshotControl->count_used = 1;
 			oldSnapshotControl->xid_by_minute[0] = xmin;
+			oldSnapshotControl->head_timestamp = ts;
 		}
 		else
 		{
@@ -1973,6 +1996,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 					else
 						oldSnapshotControl->head_offset = old_head + 1;
 					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+					oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
 				}
 				else
 				{
-- 
2.20.1

v8-0004-Rewrite-the-snapshot_too_old-tests.patchtext/x-patch; charset=US-ASCII; name=v8-0004-Rewrite-the-snapshot_too_old-tests.patchDownload

From 17921ccab7087b058a94e4539d8c12b9aea206ae Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 16:23:02 +1200
Subject: [PATCH v8 4/6] Rewrite the "snapshot_too_old" tests.

Previously the snapshot too old feature used a special test value for
old_snapshot_threshold.  Instead, use a new approach based on explicitly
advancing the "current" timestamp used in snapshot-too-old book keeping,
so that we can control the timeline precisely without having to resort
to sleeping or special test branches in the code.

Also check that early pruning actually happens, by vacuuming and
inspecting the visibility map at key points in the test schedule.
---
 src/backend/utils/time/snapmgr.c              |  21 ---
 src/test/modules/snapshot_too_old/Makefile    |  23 +--
 .../expected/sto_using_cursor.out             |  75 ++++-----
 .../expected/sto_using_hash_index.out         |  26 ++-
 .../expected/sto_using_select.out             | 157 +++++++++++++++---
 .../specs/sto_using_cursor.spec               |  30 ++--
 .../specs/sto_using_hash_index.spec           |  19 ++-
 .../specs/sto_using_select.spec               |  50 ++++--
 src/test/modules/snapshot_too_old/sto.conf    |   2 +-
 .../snapshot_too_old/test_sto--1.0.sql        |  14 ++
 src/test/modules/snapshot_too_old/test_sto.c  |  74 +++++++++
 .../modules/snapshot_too_old/test_sto.control |   5 +
 12 files changed, 366 insertions(+), 130 deletions(-)
 create mode 100644 src/test/modules/snapshot_too_old/test_sto--1.0.sql
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.c
 create mode 100644 src/test/modules/snapshot_too_old/test_sto.control

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 781ab434a3..7fe283bf29 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1769,23 +1769,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 	next_map_update_ts = oldSnapshotControl->next_map_update;
 	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
 
-	/*
-	 * Zero threshold always overrides to latest xmin, if valid.  Without some
-	 * heuristic it will find its own snapshot too old on, for example, a
-	 * simple UPDATE -- which would make it useless for most testing, but
-	 * there is no principled way to ensure that it doesn't fail in this way.
-	 * Use a five-second delay to try to get useful testing behavior, but this
-	 * may need adjustment.
-	 */
-	if (old_snapshot_threshold == 0)
-	{
-		if (TransactionIdPrecedes(latest_xmin, MyProc->xmin)
-			&& TransactionIdFollows(latest_xmin, xlimit))
-			xlimit = latest_xmin;
-
-		ts -= 5 * USECS_PER_SEC;
-	}
-	else
 	{
 		ts = AlignTimestampToMinuteBoundary(ts)
 			- (old_snapshot_threshold * USECS_PER_MINUTE);
@@ -1878,10 +1861,6 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	if (!map_update_required)
 		return;
 
-	/* No further tracking needed for 0 (used for testing). */
-	if (old_snapshot_threshold == 0)
-		return;
-
 	/*
 	 * We don't want to do something stupid with unusual values, but we don't
 	 * want to litter the log with warnings or break otherwise normal
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..81836e9953 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -1,14 +1,22 @@
 # src/test/modules/snapshot_too_old/Makefile
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+MODULE_big = test_sto
+OBJS = \
+	$(WIN32RES) \
+	test_sto.o
+
+EXTENSION = test_sto
+DATA = test_sto--1.0.sql
+PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
+
+EXTRA_INSTALL = contrib/pg_visibility
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
-# Disabled because these tests require "old_snapshot_threshold" >= 0, which
-# typical installcheck users do not have (e.g. buildfarm clients).
+# Disabled because these tests require "old_snapshot_threshold" = 10, which
+# typical installcheck users do not have (e.g. buildfarm clients) and also
+# because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
@@ -21,8 +29,3 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-# But it can nonetheless be very helpful to run tests on preexisting
-# installation, allow to do so, but only if requested explicitly.
-installcheck-force:
-	$(pg_isolation_regress_installcheck) $(ISOLATION)
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
index 8cc29ec82f..b007e2dc17 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_cursor.out
@@ -1,73 +1,60 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1decl s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s1decl s1f1 t10 s2u s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
 step s1f2: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s1sleep s2u s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
-c              
+               
+step s1f3: FETCH FIRST FROM cursor1;
+ERROR:  snapshot too old
+test_sto_reset_all_state
 
-1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+               
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+starting permutation: t00 s1decl s1f1 t10 s1f2 t20 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1decl s1f1 s2u s1sleep s1f2
+               
 step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
 step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
-
-starting permutation: s1decl s2u s1f1 s1sleep s1f2
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f1: FETCH FIRST FROM cursor1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
-
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+step t20: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s2u s1decl s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1decl: DECLARE cursor1 CURSOR FOR SELECT c FROM sto1;
-step s1f1: FETCH FIRST FROM cursor1;
+               
+step s1f3: FETCH FIRST FROM cursor1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+test_sto_reset_all_state
 
-0                             
-step s1f2: FETCH FIRST FROM cursor1;
-ERROR:  snapshot too old
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
index bf94054790..091c212adc 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_hash_index.out
@@ -1,15 +1,31 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: noseq s1f1 s2sleep s2u s1f2
+starting permutation: t00 noseq s1f1 t10 s2u s2v1 s1f2 t22 s2v2 s1f3
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step noseq: SET enable_seqscan = false;
 step s1f1: SELECT c FROM sto1 where c = 1000;
 c              
 
 1000           
-step s2sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1000;
+step s2v1: VACUUM sto1;
 step s1f2: SELECT c FROM sto1 where c = 1001;
+c              
+
+step t22: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2v2: VACUUM sto1;
+step s1f3: SELECT c FROM sto1 where c = 1001;
 ERROR:  snapshot too old
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/expected/sto_using_select.out b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
index eb15bc23bf..ba27bc5261 100644
--- a/src/test/modules/snapshot_too_old/expected/sto_using_select.out
+++ b/src/test/modules/snapshot_too_old/expected/sto_using_select.out
@@ -1,55 +1,164 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1f1 s1sleep s1f2 s2u
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s1f4 s2vac2 s2vis4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s1sleep s2u s1f2
-step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+               
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+test_sto_reset_all_state
+
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2u s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
 
-starting permutation: s1f1 s2u s1sleep s1f2
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
 1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
 step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
 
-0                             
+f              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+f              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 ERROR:  snapshot too old
+test_sto_reset_all_state
 
-starting permutation: s2u s1f1 s1sleep s1f2
-step s2u: UPDATE sto1 SET c = 1001 WHERE c = 1;
+               
+
+starting permutation: t00 s2vis1 s1f1 t01 s2vis2 s1f2 t10 s2vac1 s2vis3 s1f3 t12 s2vac2 s2vis4 s1f4
+step t00: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vis1: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f1: SELECT c FROM sto1 ORDER BY c LIMIT 1;
 c              
 
-2              
-step s1sleep: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
-setting        pg_sleep       
+1              
+step t01: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z');
+test_sto_clobber_snapshot_timestamp
 
-0                             
+               
+step s2vis2: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
 step s1f2: SELECT c FROM sto1 ORDER BY c LIMIT 1;
-ERROR:  snapshot too old
+c              
+
+1              
+step t10: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac1: VACUUM sto1;
+step s2vis3: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f3: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+step t12: SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z');
+test_sto_clobber_snapshot_timestamp
+
+               
+step s2vac2: VACUUM sto1;
+step s2vis4: SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass);
+every          
+
+t              
+step s1f4: SELECT c FROM sto1 ORDER BY c LIMIT 1;
+c              
+
+1              
+test_sto_reset_all_state
+
+               
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
index eac18ca5b9..3be084b8fe 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_cursor.spec
@@ -1,19 +1,14 @@
 # This test provokes a "snapshot too old" error using a cursor.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+    CREATE EXTENSION IF NOT EXISTS test_sto;
+    SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,16 +17,29 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+    DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1decl"	{ DECLARE cursor1 CURSOR FOR SELECT c FROM sto1; }
 step "s1f1"		{ FETCH FIRST FROM cursor1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ FETCH FIRST FROM cursor1; }
+step "s1f3"		{ FETCH FIRST FROM cursor1; }
 teardown		{ COMMIT; }
 
 session "s2"
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t20"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:20:00Z'); }
+
+# if there's an update, we get a snapshot too old error at time 00:20 (not before,
+# because we need page pruning to see the xmin level change from 10 minutes earlier)
+permutation "t00" "s1decl" "s1f1" "t10" "s2u" "s1f2" "t20" "s1f3"
+
+# if there's no update, no snapshot too old error at time 00:20
+permutation "t00" "s1decl" "s1f1" "t10"       "s1f2" "t20" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
index 33d91ff852..f90bca3b7a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_hash_index.spec
@@ -1,8 +1,12 @@
 # This test is like sto_using_select, except that we test access via a
-# hash index.
+# hash index.  Explicit vacuuming is required in this version because
+# there is are no incidental calls to heap_page_prune_opt() that can
+# call SetOldSnapshotThresholdTimestamp().
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
     CREATE INDEX idx_sto1 ON sto1 USING HASH (c);
@@ -15,6 +19,7 @@ setup
 teardown
 {
     DROP TABLE sto1;
+	SELECT test_sto_reset_all_state();
 }
 
 session "s1"
@@ -22,10 +27,18 @@ setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "noseq"	{ SET enable_seqscan = false; }
 step "s1f1"		{ SELECT c FROM sto1 where c = 1000; }
 step "s1f2"		{ SELECT c FROM sto1 where c = 1001; }
+step "s1f3"		{ SELECT c FROM sto1 where c = 1001; }
 teardown		{ ROLLBACK; }
 
 session "s2"
-step "s2sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1000; }
+step "s2v1"		{ VACUUM sto1; }
+step "s2v2"		{ VACUUM sto1; }
 
-permutation "noseq" "s1f1" "s2sleep" "s2u" "s1f2"
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t22"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:22:00Z'); }
+
+# snapshot too old at t22
+permutation "t00" "noseq" "s1f1" "t10" "s2u" "s2v1" "s1f2" "t22" "s2v2" "s1f3"
diff --git a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
index d7c34f3d89..c7917d1b0a 100644
--- a/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
+++ b/src/test/modules/snapshot_too_old/specs/sto_using_select.spec
@@ -1,19 +1,15 @@
 # This test provokes a "snapshot too old" error using SELECT statements.
 #
-# The sleep is needed because with a threshold of zero a statement could error
-# on changes it made.  With more normal settings no external delay is needed,
-# but we don't want these tests to run long enough to see that, since
-# granularity is in minutes.
-#
-# Since results depend on the value of old_snapshot_threshold, sneak that into
-# the line generated by the sleep, so that a surprising values isn't so hard
-# to identify.
+# Expects old_snapshot_threshold = 10.  Not suitable for installcheck since it
+# messes with internal snapmgr.c state.
 
 setup
 {
+	CREATE EXTENSION IF NOT EXISTS test_sto;
+	CREATE EXTENSION IF NOT EXISTS pg_visibility;
+	SELECT test_sto_reset_all_state();
     CREATE TABLE sto1 (c int NOT NULL);
     INSERT INTO sto1 SELECT generate_series(1, 1000);
-    CREATE TABLE sto2 (c int NOT NULL);
 }
 setup
 {
@@ -22,15 +18,47 @@ setup
 
 teardown
 {
-    DROP TABLE sto1, sto2;
+	DROP TABLE sto1;
+    SELECT test_sto_reset_all_state();
 }
 
 session "s1"
 setup			{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
 step "s1f1"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
-step "s1sleep"	{ SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold'; }
 step "s1f2"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f3"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
+step "s1f4"		{ SELECT c FROM sto1 ORDER BY c LIMIT 1; }
 teardown		{ COMMIT; }
 
 session "s2"
+step "s2vis1"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
 step "s2u"		{ UPDATE sto1 SET c = 1001 WHERE c = 1; }
+step "s2vis2"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac1"	{ VACUUM sto1; }
+step "s2vis3"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+step "s2vac2"	{ VACUUM sto1; }
+step "s2vis4"	{ SELECT EVERY(all_visible) FROM pg_visibility_map('sto1'::regclass); }
+
+session "time"
+step "t00"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:00:00Z'); }
+step "t01"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:01:00Z'); }
+step "t10"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:10:00Z'); }
+step "t12"		{ SELECT test_sto_clobber_snapshot_timestamp('3000-01-01 00:12:00Z'); }
+
+# If there's an update, we get a snapshot too old error at time 00:12, and
+# VACUUM is allowed to remove the tuple our snapshot could see, which we know
+# because we see that the relation becomes all visible.  The earlier VACUUMs
+# were unable to remove the tuple we could see, which is is obvious because we
+# can see the row with value 1, and from the relation not being all visible
+# after the VACUUM.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s1f4" "s2vac2" "s2vis4"
+
+# Almost the same schedule, but this time we'll put s2vac2 and s2vis4 before
+# s1f4 just to demonstrate that the early pruning is allowed before the error
+# aborts s1's transaction.
+permutation "t00" "s2vis1" "s1f1" "t01" "s2u" "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
+
+# If we run the same schedule as above but without the update, we get no
+# snapshot too old error (even though our snapshot is older than the
+# threshold), and the relation remains all visible.
+permutation "t00" "s2vis1" "s1f1" "t01"       "s2vis2" "s1f2" "t10" "s2vac1" "s2vis3" "s1f3" "t12" "s2vac2" "s2vis4" "s1f4"
diff --git a/src/test/modules/snapshot_too_old/sto.conf b/src/test/modules/snapshot_too_old/sto.conf
index 7eeaeeb0dc..5ed46b3560 100644
--- a/src/test/modules/snapshot_too_old/sto.conf
+++ b/src/test/modules/snapshot_too_old/sto.conf
@@ -1,2 +1,2 @@
 autovacuum = off
-old_snapshot_threshold = 0
+old_snapshot_threshold = 10
diff --git a/src/test/modules/snapshot_too_old/test_sto--1.0.sql b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
new file mode 100644
index 0000000000..c10afcf23a
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto--1.0.sql
@@ -0,0 +1,14 @@
+/* src/test/modules/snapshot_too_old/test_sto--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_sto" to load this file. \quit
+
+CREATE FUNCTION test_sto_clobber_snapshot_timestamp(now timestamptz)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_clobber_snapshot_timestamp'
+LANGUAGE C STRICT;
+
+CREATE FUNCTION test_sto_reset_all_state()
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'test_sto_reset_all_state'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/snapshot_too_old/test_sto.c b/src/test/modules/snapshot_too_old/test_sto.c
new file mode 100644
index 0000000000..f6c9a1a000
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.c
@@ -0,0 +1,74 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_sto.c
+ *	  Functions to support isolation tests for snapshot too old.
+ *
+ * These functions are not intended for use in a production database and
+ * could cause corruption.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  src/test/modules/snapshot_too_old/test_sto.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(test_sto_reset_all_state);
+PG_FUNCTION_INFO_V1(test_sto_clobber_snapshot_timestamp);
+
+/*
+ * Revert to initial state.  This is not safe except in carefully
+ * controlled tests.
+ */
+Datum
+test_sto_reset_all_state(PG_FUNCTION_ARGS)
+{
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	oldSnapshotControl->count_used = 0;
+	oldSnapshotControl->current_timestamp = 0;
+	oldSnapshotControl->head_offset = 0;
+	oldSnapshotControl->head_timestamp = 0;
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_latest_xmin);
+	oldSnapshotControl->latest_xmin = InvalidTransactionId;
+	oldSnapshotControl->next_map_update = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_latest_xmin);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = 0;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_threshold);
+	oldSnapshotControl->threshold_timestamp = 0;
+	oldSnapshotControl->threshold_xid = InvalidTransactionId;
+	SpinLockRelease(&oldSnapshotControl->mutex_threshold);
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Update the minimum time used in snapshot-too-old code.  If set ahead of the
+ * current wall clock time (for example, the year 3000), this allows testing
+ * with arbitrary times.  This is not safe except in carefully controlled
+ * tests.
+ */
+Datum
+test_sto_clobber_snapshot_timestamp(PG_FUNCTION_ARGS)
+{
+	TimestampTz new_current_timestamp = PG_GETARG_TIMESTAMPTZ(0);
+
+	SpinLockAcquire(&oldSnapshotControl->mutex_current);
+	oldSnapshotControl->current_timestamp = new_current_timestamp;
+	SpinLockRelease(&oldSnapshotControl->mutex_current);
+
+	PG_RETURN_NULL();
+}
diff --git a/src/test/modules/snapshot_too_old/test_sto.control b/src/test/modules/snapshot_too_old/test_sto.control
new file mode 100644
index 0000000000..e497e5450e
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/test_sto.control
@@ -0,0 +1,5 @@
+# test_sto test module
+comment = 'functions for internal testing of snapshot too old errors'
+default_version = '1.0'
+module_pathname = '$libdir/test_sto'
+relocatable = true
-- 
2.20.1

v8-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchtext/x-patch; charset=US-ASCII; name=v8-0005-Truncate-snapshot-too-old-time-map-when-CLOG-is-t.patchDownload

From 3a1609049980c3f0fc6a7dcf56e38c681c0c167e Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 20 Apr 2020 17:05:42 +1200
Subject: [PATCH v8 5/6] Truncate snapshot-too-old time map when CLOG is
 truncated.

It's not safe to leave xids in the map that have wrapped around,
although it's probably very hard to actually reach that state.

Reported-by: Andres Freund
---
 src/backend/commands/vacuum.c                 |   3 +
 src/backend/utils/time/snapmgr.c              |  21 ++++
 src/include/utils/snapmgr.h                   |   1 +
 src/test/modules/snapshot_too_old/Makefile    |   4 +-
 .../snapshot_too_old/t/001_truncate.pl        | 100 ++++++++++++++++++
 5 files changed, 128 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/t/001_truncate.pl

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 22228f5684..459c9126fc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1645,6 +1645,9 @@ vac_truncate_clog(TransactionId frozenXID,
 	 */
 	AdvanceOldestCommitTsXid(frozenXID);
 
+	/* Make sure snapshot_too_old drops old XIDs. */
+	TruncateOldSnapshotTimeMapping(frozenXID);
+
 	/*
 	 * Truncate CLOG, multixact and CommitTs to the oldest computed value.
 	 */
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 7fe283bf29..0237e62f7d 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1995,6 +1995,27 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 }
 
 
+/*
+ * Remove old xids from the timing map, so the CLOG can be truncated.
+ */
+void
+TruncateOldSnapshotTimeMapping(TransactionId frozenXID)
+{
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_EXCLUSIVE);
+	while (oldSnapshotControl->count_used > 0 &&
+		   TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[oldSnapshotControl->head_offset],
+								 frozenXID))
+	{
+		oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
+		oldSnapshotControl->head_offset =
+			(oldSnapshotControl->head_offset + 1) %
+			OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+		oldSnapshotControl->count_used--;
+	}
+	LWLockRelease(OldSnapshotTimeMapLock);
+}
+
+
 /*
  * Setup a snapshot that replaces normal catalog snapshots that allows catalog
  * access to behave just like it did at a certain point in the past.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index b6b403e293..4560f1f03b 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -141,6 +141,7 @@ extern bool TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 extern void SetOldSnapshotThresholdTimestamp(TimestampTz ts, TransactionId xlimit);
 extern void MaintainOldSnapshotTimeMapping(TimestampTz whenTaken,
 										   TransactionId xmin);
+extern void TruncateOldSnapshotTimeMapping(TransactionId frozenXID);
 
 extern char *ExportSnapshot(Snapshot snapshot);
 
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index 81836e9953..f919944984 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -9,7 +9,7 @@ EXTENSION = test_sto
 DATA = test_sto--1.0.sql
 PGDESCFILE = "test_sto -- internal testing for snapshot too old errors"
 
-EXTRA_INSTALL = contrib/pg_visibility
+EXTRA_INSTALL = contrib/pg_visibility contrib/old_snapshot
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
@@ -19,6 +19,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/s
 # because it'd be dangerous on a production system.
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/snapshot_too_old/t/001_truncate.pl b/src/test/modules/snapshot_too_old/t/001_truncate.pl
new file mode 100644
index 0000000000..afcca232f2
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/001_truncate.pl
@@ -0,0 +1,100 @@
+# Test truncation of the old snapshot time mapping, to check
+# that we can't get into trouble when xids wrap around.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "max_prepared_transactions=10");
+$node->append_conf("postgresql.conf", "autovacuum=off");
+$node->start;
+$node->psql('postgres', 'update pg_database set datallowconn = true');
+$node->psql('postgres', 'create extension old_snapshot');
+$node->psql('postgres', 'create extension test_sto');
+
+note "check time map is truncated when CLOG is";
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub advance_xid
+{
+	my $time = shift;
+	$node->psql('postgres', "select pg_current_xact_id()");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+sub vacuum_freeze_all
+{
+	$node->psql('postgres', 'vacuum freeze');
+	$node->psql('template0', 'vacuum freeze');
+	$node->psql('template1', 'vacuum freeze');
+}
+
+# build up a time map with 4 entries
+set_time('3000-01-01 00:00:00Z');
+advance_xid();
+set_time('3000-01-01 00:01:00Z');
+advance_xid();
+set_time('3000-01-01 00:02:00Z');
+advance_xid();
+set_time('3000-01-01 00:03:00Z');
+advance_xid();
+is(summarize_mapping(), "4|00:00:00|00:03:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we expect all XIDs to have been truncated
+is(summarize_mapping(), "0||");
+
+# put two more in the map
+set_time('3000-01-01 00:04:00Z');
+advance_xid();
+set_time('3000-01-01 00:05:00Z');
+advance_xid();
+is(summarize_mapping(), "2|00:04:00|00:05:00");
+
+# prepare a transaction, to stop xmin from getting further ahead
+$node->psql('postgres', "begin; select pg_current_xact_id(); prepare transaction 'tx1'");
+
+# add 16 more minutes; we should now have 18
+set_time('3000-01-01 00:21:00Z');
+advance_xid();
+is(summarize_mapping(), "18|00:04:00|00:21:00");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# this should leave just 16, because 2 were truncated
+is(summarize_mapping(), "16|00:06:00|00:21:00");
+
+# commit tx1, and then freeze again to get rid of all of them
+$node->psql('postgres', "commit prepared 'tx1'");
+
+# now cause frozen XID to advance
+vacuum_freeze_all();
+
+# we should now be back to empty
+is(summarize_mapping(), "0||");
+
+$node->stop;
-- 
2.20.1

v8-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchtext/x-patch; charset=US-ASCII; name=v8-0006-Add-TAP-test-for-snapshot-too-old-time-map-mainte.patchDownload

From 36236bf3684028503a41855167123bb7269459f8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 21 Apr 2020 20:48:20 +1200
Subject: [PATCH v8 6/6] Add TAP test for snapshot too old time map
 maintenance.

---
 .../t/002_xid_map_maintenance.pl              | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl

diff --git a/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
new file mode 100644
index 0000000000..eddd0ce5ae
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/t/002_xid_map_maintenance.pl
@@ -0,0 +1,63 @@
+# Test xid various time/xid map maintenance edge cases
+# that were historically buggy.
+
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 8;
+
+my $node = get_new_node('master');
+$node->init;
+$node->append_conf("postgresql.conf", "timezone = UTC");
+$node->append_conf("postgresql.conf", "old_snapshot_threshold=10");
+$node->append_conf("postgresql.conf", "autovacuum = off");
+$node->start;
+$node->psql('postgres', 'create extension test_sto');
+$node->psql('postgres', 'create extension old_snapshot');
+
+sub set_time
+{
+	my $time = shift;
+	$node->psql('postgres', "select test_sto_clobber_snapshot_timestamp('$time')");
+}
+
+sub summarize_mapping
+{
+	my $out;
+	$node->psql('postgres',
+				"select count(*),
+						to_char(min(end_timestamp), 'HH24:MI:SS'),
+						to_char(max(end_timestamp), 'HH24:MI:SS')
+						from pg_old_snapshot_time_mapping()",
+				stdout => \$out);
+	return $out;
+}
+
+# fill the map up to maximum capacity
+set_time('3000-01-01 00:00:00Z');
+set_time('3000-01-01 00:19:00Z');
+is(summarize_mapping(), "20|00:00:00|00:19:00");
+
+# make a jump larger than capacity; the mapping is blown away,
+# and our new minute is now the only one
+set_time('3000-01-01 02:00:00Z');
+is(summarize_mapping(), "1|02:00:00|02:00:00");
+
+# test adding minutes while the map is not full
+set_time('3000-01-01 02:01:00Z');
+is(summarize_mapping(), "2|02:00:00|02:01:00");
+set_time('3000-01-01 02:05:00Z');
+is(summarize_mapping(), "6|02:00:00|02:05:00");
+set_time('3000-01-01 02:19:00Z');
+is(summarize_mapping(), "20|02:00:00|02:19:00");
+
+# test adding minutes while the map is full
+set_time('3000-01-01 02:20:00Z');
+is(summarize_mapping(), "20|02:01:00|02:20:00");
+set_time('3000-01-01 02:22:00Z');
+is(summarize_mapping(), "20|02:03:00|02:22:00");
+set_time('3000-01-01 02:22:01Z'); # one second past
+is(summarize_mapping(), "20|02:04:00|02:23:00");
+
+$node->stop;
-- 
2.20.1

#23

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Thomas Munro (#22)

Re: fixing old_snapshot_threshold's time->xid mapping

On Sat, Aug 15, 2020 at 10:09:15AM +1200, Thomas Munro wrote:

And ... now that this has a commitfest entry, cfbot told me about a
small problem in a makefile. Third time lucky?

Still lucky since then, and the CF bot does not complain. So... The
meat of the patch is in 0003 which is fixing an actual bug. Robert,
Thomas, anything specific you are waiting for here? As this is a bug
fix, perhaps it would be better to just move on with some portions of
the set?

Kevin, I really think that you should chime in here. This is
originally your feature.
--
Michael

#24

robertmhaas@gmail.com

over 5 years ago

In reply to: Michael Paquier (#23)

Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Sep 17, 2020 at 1:47 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Aug 15, 2020 at 10:09:15AM +1200, Thomas Munro wrote:

And ... now that this has a commitfest entry, cfbot told me about a
small problem in a makefile. Third time lucky?

Still lucky since then, and the CF bot does not complain. So... The
meat of the patch is in 0003 which is fixing an actual bug. Robert,
Thomas, anything specific you are waiting for here? As this is a bug
fix, perhaps it would be better to just move on with some portions of
the set?

Yeah, I plan to push forward with 0001 through 0003 soon, but 0001
needs to be revised with a PGDLLIMPORT marking, I think, and 0002
needs documentation. Not sure whether there's going to be adequate
support for back-patching given that it's adding a new contrib module
for observability and not just fixing a bug, so my tentative plan is
to just push into master. If there is a great clamor for back-patching
then I can, but I'm not very excited about pushing the bug fix into
the back-branches without the observability stuff, because then if
somebody claims that it's not working properly, it'll be almost
impossible to understand why.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25

robertmhaas@gmail.com

over 5 years ago

In reply to: Robert Haas (#24)

3 attachment(s)

Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Sep 17, 2020 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, I plan to push forward with 0001 through 0003 soon, but 0001
needs to be revised with a PGDLLIMPORT marking, I think, and 0002
needs documentation.

So here's an updated version of those three, with proposed commit
messages, a PGDLLIMPORT for 0001, and docs for 0002.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v9-0003-Fix-two-bugs-in-MaintainOldSnapshotTimeMapping.patchapplication/octet-stream; name=v9-0003-Fix-two-bugs-in-MaintainOldSnapshotTimeMapping.patchDownload

From 61501c5a8dfd8214451f6c1bf90db7d52a426f39 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:15:57 -0400
Subject: [PATCH v9 3/3] Fix two bugs in MaintainOldSnapshotTimeMapping.

The previous coding was confused about whether head_timestamp was
intended to represent the timestamp for the newest bucket in the
mapping or the oldest timestamp for the oldest bucket in the mapping.
Decide that it's intended to be the oldest one, and repair
accordingly.

To do that, we need to do two things. First, when advancing to a
new bucket, don't categorically set head_timestamp to the new
timestamp. Do this only if we're blowing out the map completely
because a lot of time has passed since we last maintained it. If
we're replacing entries one by one, advance head_timestamp by
1 minute for each; if we're filling in unused entries, don't
advance head_timestamp at all.

Second, fix the computation of how many buckets we need to advance.
The previous formula would be correct if head_timestamp were the
timestamp for the new bucket, but we're now making all the code
agree that it's the timestamp for the oldest bucket, so adjust the
formula accordingly.

Patch by me, reviewed by Thomas Munro and Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
---
 src/backend/utils/time/snapmgr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 650c2aa815..8c41483e87 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1949,10 +1949,32 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 	else
 	{
 		/* We need a new bucket, but it might not be the very next one. */
-		int			advance = ((ts - oldSnapshotControl->head_timestamp)
-							   / USECS_PER_MINUTE);
+		int			distance_to_new_tail;
+		int			distance_to_current_tail;
+		int			advance;
 
-		oldSnapshotControl->head_timestamp = ts;
+		/*
+		 * Our goal is for the new "tail" of the mapping, that is, the entry
+		 * which is newest and thus furthest from the "head" entry, to
+		 * correspond to "ts". Since there's one entry per minute, the
+		 * distance between the current head and the new tail is just the
+		 * number of minutes of difference between ts and the current
+		 * head_timestamp.
+		 *
+		 * The distance from the current head to the current tail is one
+		 * less than the number of entries in the mapping, because the
+		 * entry at the head_offset is for 0 minutes after head_timestamp.
+		 *
+		 * The difference between these two values is the number of minutes
+		 * by which we need to advance the mapping, either adding new entries
+		 * or rotating old ones out.
+		 */
+		distance_to_new_tail =
+			(ts - oldSnapshotControl->head_timestamp) / USECS_PER_MINUTE;
+		distance_to_current_tail =
+			oldSnapshotControl->count_used - 1;
+		advance = distance_to_new_tail - distance_to_current_tail;
+		Assert(advance > 0);
 
 		if (advance >= OLD_SNAPSHOT_TIME_MAP_ENTRIES)
 		{
@@ -1960,6 +1982,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 			oldSnapshotControl->head_offset = 0;
 			oldSnapshotControl->count_used = 1;
 			oldSnapshotControl->xid_by_minute[0] = xmin;
+			oldSnapshotControl->head_timestamp = ts;
 		}
 		else
 		{
@@ -1978,6 +2001,7 @@ MaintainOldSnapshotTimeMapping(TimestampTz whenTaken, TransactionId xmin)
 					else
 						oldSnapshotControl->head_offset = old_head + 1;
 					oldSnapshotControl->xid_by_minute[old_head] = xmin;
+					oldSnapshotControl->head_timestamp += USECS_PER_MINUTE;
 				}
 				else
 				{
-- 
2.24.3 (Apple Git-128)

v9-0002-Add-new-old_snapshot-contrib-module.patchapplication/octet-stream; name=v9-0002-Add-new-old_snapshot-contrib-module.patchDownload

From 4236b1332ae5200bdcd57667249f0ef808f5140d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 12:14:32 -0400
Subject: [PATCH v9 2/3] Add new 'old_snapshot' contrib module.

You can use this to view the contents of the time to XID mapping
which the server maintains when old_snapshot_threshold != -1.
Being able to view that information may be interesting for users,
and it's definitely useful for figuring out whether the mapping
is being maintained correctly. It isn't, so that will need to be
fixed in a subsequent commit.

Patch by me, reviewed by Thomas Munro and Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
---
 contrib/Makefile                           |   1 +
 contrib/old_snapshot/Makefile              |  22 +++
 contrib/old_snapshot/old_snapshot--1.0.sql |  14 ++
 contrib/old_snapshot/old_snapshot.control  |   5 +
 contrib/old_snapshot/time_mapping.c        | 159 +++++++++++++++++++++
 doc/src/sgml/contrib.sgml                  |   1 +
 doc/src/sgml/filelist.sgml                 |   1 +
 doc/src/sgml/oldsnapshot.sgml              |  33 +++++
 8 files changed, 236 insertions(+)
 create mode 100644 contrib/old_snapshot/Makefile
 create mode 100644 contrib/old_snapshot/old_snapshot--1.0.sql
 create mode 100644 contrib/old_snapshot/old_snapshot.control
 create mode 100644 contrib/old_snapshot/time_mapping.c
 create mode 100644 doc/src/sgml/oldsnapshot.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index c8d2a16273..7a4866e338 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		old_snapshot	\
 		pageinspect	\
 		passwordcheck	\
 		pg_buffercache	\
diff --git a/contrib/old_snapshot/Makefile b/contrib/old_snapshot/Makefile
new file mode 100644
index 0000000000..77c85df322
--- /dev/null
+++ b/contrib/old_snapshot/Makefile
@@ -0,0 +1,22 @@
+# contrib/old_snapshot/Makefile
+
+MODULE_big = old_snapshot
+OBJS = \
+	$(WIN32RES) \
+	time_mapping.o
+PG_CPPFLAGS = -I$(libpq_srcdir)
+
+EXTENSION = old_snapshot
+DATA = old_snapshot--1.0.sql
+PGFILEDESC = "old_snapshot - utilities in support of old_snapshot_threshold"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/old_snapshot
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/old_snapshot/old_snapshot--1.0.sql b/contrib/old_snapshot/old_snapshot--1.0.sql
new file mode 100644
index 0000000000..9ebb8829e3
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot--1.0.sql
@@ -0,0 +1,14 @@
+/* contrib/old_snapshot/old_snapshot--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION old_snapshot" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION pg_old_snapshot_time_mapping(array_offset OUT int4,
+											 end_timestamp OUT timestamptz,
+											 newest_xmin OUT xid)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_old_snapshot_time_mapping'
+LANGUAGE C STRICT;
+
+-- XXX. Do we want REVOKE commands here?
diff --git a/contrib/old_snapshot/old_snapshot.control b/contrib/old_snapshot/old_snapshot.control
new file mode 100644
index 0000000000..491eec536c
--- /dev/null
+++ b/contrib/old_snapshot/old_snapshot.control
@@ -0,0 +1,5 @@
+# old_snapshot extension
+comment = 'utilities in support of old_snapshot_threshold'
+default_version = '1.0'
+module_pathname = '$libdir/old_snapshot'
+relocatable = true
diff --git a/contrib/old_snapshot/time_mapping.c b/contrib/old_snapshot/time_mapping.c
new file mode 100644
index 0000000000..37e0055a00
--- /dev/null
+++ b/contrib/old_snapshot/time_mapping.c
@@ -0,0 +1,159 @@
+/*-------------------------------------------------------------------------
+ *
+ * time_mapping.c
+ *	  time to XID mapping information
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ *	  contrib/old_snapshot/time_mapping.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "funcapi.h"
+#include "storage/lwlock.h"
+#include "utils/old_snapshot.h"
+#include "utils/snapmgr.h"
+#include "utils/timestamp.h"
+
+/*
+ * Backend-private copy of the information from oldSnapshotControl which relates
+ * to the time to XID mapping, plus an index so that we can iterate.
+ *
+ * Note that the length of the xid_by_minute array is given by
+ * OLD_SNAPSHOT_TIME_MAP_ENTRIES (which is not a compile-time constant).
+ */
+typedef struct
+{
+	int				current_index;
+	int				head_offset;
+	TimestampTz		head_timestamp;
+	int				count_used;
+	TransactionId	xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotTimeMapping;
+
+#define NUM_TIME_MAPPING_COLUMNS 3
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(pg_old_snapshot_time_mapping);
+
+static OldSnapshotTimeMapping *GetOldSnapshotTimeMapping(void);
+static TupleDesc MakeOldSnapshotTimeMappingTupleDesc(void);
+static HeapTuple MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc,
+												 OldSnapshotTimeMapping *mapping);
+
+/*
+ * SQL-callable set-returning function.
+ */
+Datum
+pg_old_snapshot_time_mapping(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+	OldSnapshotTimeMapping *mapping;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext	oldcontext;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+		mapping = GetOldSnapshotTimeMapping();
+		funcctx->user_fctx = mapping;
+		funcctx->tuple_desc = MakeOldSnapshotTimeMappingTupleDesc();
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+	mapping = (OldSnapshotTimeMapping *) funcctx->user_fctx;
+
+	while (mapping->current_index < mapping->count_used)
+	{
+		HeapTuple	tuple;
+
+		tuple = MakeOldSnapshotTimeMappingTuple(funcctx->tuple_desc, mapping);
+		++mapping->current_index;
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * Get the old snapshot time mapping data from shared memory.
+ */
+static OldSnapshotTimeMapping *
+GetOldSnapshotTimeMapping(void)
+{
+	OldSnapshotTimeMapping *mapping;
+
+	mapping = palloc(offsetof(OldSnapshotTimeMapping, xid_by_minute)
+					 + sizeof(TransactionId) * OLD_SNAPSHOT_TIME_MAP_ENTRIES);
+	mapping->current_index = 0;
+
+	LWLockAcquire(OldSnapshotTimeMapLock, LW_SHARED);
+	mapping->head_offset = oldSnapshotControl->head_offset;
+	mapping->head_timestamp = oldSnapshotControl->head_timestamp;
+	mapping->count_used = oldSnapshotControl->count_used;
+	for (int i = 0; i < OLD_SNAPSHOT_TIME_MAP_ENTRIES; ++i)
+		mapping->xid_by_minute[i] = oldSnapshotControl->xid_by_minute[i];
+	LWLockRelease(OldSnapshotTimeMapLock);
+
+	return mapping;
+}
+
+/*
+ * Build a tuple descriptor for the pg_old_snapshot_time_mapping() SRF.
+ */
+static TupleDesc
+MakeOldSnapshotTimeMappingTupleDesc(void)
+{
+	TupleDesc	tupdesc;
+
+	tupdesc = CreateTemplateTupleDesc(NUM_TIME_MAPPING_COLUMNS);
+
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "array_offset",
+					   INT4OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_timestamp",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "newest_xmin",
+					   XIDOID, -1, 0);
+
+	return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Convert one entry from the old snapshot time mapping to a HeapTuple.
+ */
+static HeapTuple
+MakeOldSnapshotTimeMappingTuple(TupleDesc tupdesc, OldSnapshotTimeMapping *mapping)
+{
+	Datum	values[NUM_TIME_MAPPING_COLUMNS];
+	bool	nulls[NUM_TIME_MAPPING_COLUMNS];
+	int		array_position;
+	TimestampTz	timestamp;
+
+	/*
+	 * Figure out the array position corresponding to the current index.
+	 *
+	 * Index 0 means the oldest entry in the mapping, which is stored at
+	 * mapping->head_offset. Index 1 means the next-oldest entry, which is a the
+	 * following index, and so on. We wrap around when we reach the end of the array.
+	 */
+	array_position = (mapping->head_offset + mapping->current_index)
+		% OLD_SNAPSHOT_TIME_MAP_ENTRIES;
+
+	/*
+	 * No explicit timestamp is stored for any entry other than the oldest one,
+	 * but each entry corresponds to 1-minute period, so we can just add.
+	 */
+	timestamp = TimestampTzPlusMilliseconds(mapping->head_timestamp,
+											mapping->current_index * 60000);
+
+	/* Initialize nulls and values arrays. */
+	memset(nulls, 0, sizeof(nulls));
+	values[0] = Int32GetDatum(array_position);
+	values[1] = TimestampTzGetDatum(timestamp);
+	values[2] = TransactionIdGetDatum(mapping->xid_by_minute[array_position]);
+
+	return heap_form_tuple(tupdesc, values, nulls);
+}
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index c82dde2726..4e833d79ef 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -116,6 +116,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
  &isn;
  &lo;
  &ltree;
+ &oldsnapshot;
  &pageinspect;
  &passwordcheck;
  &pgbuffercache;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 828396d4a9..47271addc1 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -129,6 +129,7 @@
 <!ENTITY lo              SYSTEM "lo.sgml">
 <!ENTITY ltree           SYSTEM "ltree.sgml">
 <!ENTITY oid2name        SYSTEM "oid2name.sgml">
+<!ENTITY oldsnapshot     SYSTEM "oldsnapshot.sgml">
 <!ENTITY pageinspect     SYSTEM "pageinspect.sgml">
 <!ENTITY passwordcheck   SYSTEM "passwordcheck.sgml">
 <!ENTITY pgbuffercache   SYSTEM "pgbuffercache.sgml">
diff --git a/doc/src/sgml/oldsnapshot.sgml b/doc/src/sgml/oldsnapshot.sgml
new file mode 100644
index 0000000000..a665ae72e7
--- /dev/null
+++ b/doc/src/sgml/oldsnapshot.sgml
@@ -0,0 +1,33 @@
+<!-- doc/src/sgml/oldsnapshot.sgml -->
+
+<sect1 id="oldsnapshot" xreflabel="old_snapshot">
+ <title>old_snapshot</title>
+
+ <indexterm zone="oldsnapshot">
+  <primary>old_snapshot</primary>
+ </indexterm>
+
+ <para>
+  The <filename>old_snapshot</filename> module allows inspection
+  of the server state that is used to implement
+  <xref linkend="guc-old-snapshot-threshold" />.
+ </para>
+
+ <sect2>
+  <title>Functions</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><function>pg_old_snapshot_time_mapping(array_offset OUT int4, end_timestamp OUT timestamptz, newest_xmin OUT xid) returns setof record</function></term>
+    <listitem>
+     <para>
+      Returns all of the entries in the server's timestamp to XID mapping.
+      Each entry represents the newest xmin of any snapshot taken in the
+      corresponding minute.
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect2>
+
+</sect1>
-- 
2.24.3 (Apple Git-128)

v9-0001-Expose-oldSnapshotControl-definition-via-new-head.patchapplication/octet-stream; name=v9-0001-Expose-oldSnapshotControl-definition-via-new-head.patchDownload

From 7f7c11c562541c04e525ca64650c6f15a33bd778 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 16 Apr 2020 09:37:31 -0400
Subject: [PATCH v9 1/3] Expose oldSnapshotControl definition via new header.

This makes it possible for code outside snapmgr.c to examine the
contents of this data structure. This commit does not add any code
which actually does so; a subsequent commit will make that change.

Patch by me, reviewed by Thomas Munro and Dilip Kumar.

Discussion: http://postgr.es/m/CA+TgmoY=aqf0zjTD+3dUWYkgMiNDegDLFjo+6ze=Wtpik+3XqA@mail.gmail.com
---
 src/backend/utils/time/snapmgr.c | 55 +----------------------
 src/include/utils/old_snapshot.h | 75 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 53 deletions(-)
 create mode 100644 src/include/utils/old_snapshot.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ed7f5239a0..650c2aa815 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -64,6 +64,7 @@
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/old_snapshot.h"
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/snapmgr.h"
@@ -76,59 +77,7 @@
  */
 int			old_snapshot_threshold; /* number of minutes, -1 disables */
 
-/*
- * Structure for dealing with old_snapshot_threshold implementation.
- */
-typedef struct OldSnapshotControlData
-{
-	/*
-	 * Variables for old snapshot handling are shared among processes and are
-	 * only allowed to move forward.
-	 */
-	slock_t		mutex_current;	/* protect current_timestamp */
-	TimestampTz current_timestamp;	/* latest snapshot timestamp */
-	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
-	TransactionId latest_xmin;	/* latest snapshot xmin */
-	TimestampTz next_map_update;	/* latest snapshot valid up to */
-	slock_t		mutex_threshold;	/* protect threshold fields */
-	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
-	TransactionId threshold_xid;	/* earlier xid may be gone */
-
-	/*
-	 * Keep one xid per minute for old snapshot error handling.
-	 *
-	 * Use a circular buffer with a head offset, a count of entries currently
-	 * used, and a timestamp corresponding to the xid at the head offset.  A
-	 * count_used value of zero means that there are no times stored; a
-	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
-	 * is full and the head must be advanced to add new entries.  Use
-	 * timestamps aligned to minute boundaries, since that seems less
-	 * surprising than aligning based on the first usage timestamp.  The
-	 * latest bucket is effectively stored within latest_xmin.  The circular
-	 * buffer is updated when we get a new xmin value that doesn't fall into
-	 * the same interval.
-	 *
-	 * It is OK if the xid for a given time slot is from earlier than
-	 * calculated by adding the number of minutes corresponding to the
-	 * (possibly wrapped) distance from the head offset to the time of the
-	 * head entry, since that just results in the vacuuming of old tuples
-	 * being slightly less aggressive.  It would not be OK for it to be off in
-	 * the other direction, since it might result in vacuuming tuples that are
-	 * still expected to be there.
-	 *
-	 * Use of an SLRU was considered but not chosen because it is more
-	 * heavyweight than is needed for this, and would probably not be any less
-	 * code to implement.
-	 *
-	 * Persistence is not needed.
-	 */
-	int			head_offset;	/* subscript of oldest tracked time */
-	TimestampTz head_timestamp; /* time corresponding to head xid */
-	int			count_used;		/* how many slots are in use */
-	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
-} OldSnapshotControlData;
-
-static volatile OldSnapshotControlData *oldSnapshotControl;
+volatile OldSnapshotControlData *oldSnapshotControl;
 
 
 /*
diff --git a/src/include/utils/old_snapshot.h b/src/include/utils/old_snapshot.h
new file mode 100644
index 0000000000..e6da1833a6
--- /dev/null
+++ b/src/include/utils/old_snapshot.h
@@ -0,0 +1,75 @@
+/*-------------------------------------------------------------------------
+ *
+ * old_snapshot.h
+ *		Data structures for 'snapshot too old'
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/old_snapshot.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef OLD_SNAPSHOT_H
+#define OLD_SNAPSHOT_H
+
+#include "datatype/timestamp.h"
+#include "storage/s_lock.h"
+
+/*
+ * Structure for dealing with old_snapshot_threshold implementation.
+ */
+typedef struct OldSnapshotControlData
+{
+	/*
+	 * Variables for old snapshot handling are shared among processes and are
+	 * only allowed to move forward.
+	 */
+	slock_t		mutex_current;	/* protect current_timestamp */
+	TimestampTz current_timestamp;	/* latest snapshot timestamp */
+	slock_t		mutex_latest_xmin;	/* protect latest_xmin and next_map_update */
+	TransactionId latest_xmin;	/* latest snapshot xmin */
+	TimestampTz next_map_update;	/* latest snapshot valid up to */
+	slock_t		mutex_threshold;	/* protect threshold fields */
+	TimestampTz threshold_timestamp;	/* earlier snapshot is old */
+	TransactionId threshold_xid;	/* earlier xid may be gone */
+
+	/*
+	 * Keep one xid per minute for old snapshot error handling.
+	 *
+	 * Use a circular buffer with a head offset, a count of entries currently
+	 * used, and a timestamp corresponding to the xid at the head offset.  A
+	 * count_used value of zero means that there are no times stored; a
+	 * count_used value of OLD_SNAPSHOT_TIME_MAP_ENTRIES means that the buffer
+	 * is full and the head must be advanced to add new entries.  Use
+	 * timestamps aligned to minute boundaries, since that seems less
+	 * surprising than aligning based on the first usage timestamp.  The
+	 * latest bucket is effectively stored within latest_xmin.  The circular
+	 * buffer is updated when we get a new xmin value that doesn't fall into
+	 * the same interval.
+	 *
+	 * It is OK if the xid for a given time slot is from earlier than
+	 * calculated by adding the number of minutes corresponding to the
+	 * (possibly wrapped) distance from the head offset to the time of the
+	 * head entry, since that just results in the vacuuming of old tuples
+	 * being slightly less aggressive.  It would not be OK for it to be off in
+	 * the other direction, since it might result in vacuuming tuples that are
+	 * still expected to be there.
+	 *
+	 * Use of an SLRU was considered but not chosen because it is more
+	 * heavyweight than is needed for this, and would probably not be any less
+	 * code to implement.
+	 *
+	 * Persistence is not needed.
+	 */
+	int			head_offset;	/* subscript of oldest tracked time */
+	TimestampTz head_timestamp; /* time corresponding to head xid */
+	int			count_used;		/* how many slots are in use */
+	TransactionId xid_by_minute[FLEXIBLE_ARRAY_MEMBER];
+} OldSnapshotControlData;
+
+extern PGDLLIMPORT volatile OldSnapshotControlData *oldSnapshotControl;
+
+#endif
-- 
2.24.3 (Apple Git-128)

#26

thomas.munro@gmail.com

over 5 years ago

In reply to: Robert Haas (#25)

Re: fixing old_snapshot_threshold's time->xid mapping

On Sat, Sep 19, 2020 at 12:19 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 17, 2020 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, I plan to push forward with 0001 through 0003 soon, but 0001
needs to be revised with a PGDLLIMPORT marking, I think, and 0002
needs documentation.

So here's an updated version of those three, with proposed commit
messages, a PGDLLIMPORT for 0001, and docs for 0002.

LGTM.

#27

Hamid Akhtar

hamid.akhtar@gmail.com

over 5 years ago

In reply to: Thomas Munro (#26)

Re: fixing old_snapshot_threshold's time->xid mapping

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

Patch looks good to me.

#28

robertmhaas@gmail.com

over 5 years ago

In reply to: Thomas Munro (#26)

Re: fixing old_snapshot_threshold's time->xid mapping

On Wed, Sep 23, 2020 at 9:16 PM Thomas Munro <thomas.munro@gmail.com> wrote:

LGTM.

Committed.

Thomas, with respect to your part of this patch set, I wonder if we
can make the functions that you're using to write tests safe enough
that we could add them to contrib/old_snapshot and let users run them
if they want. As you have them, they are hedged around with vague and
scary warnings, but is that really justified? And if so, can it be
fixed? It would be nicer not to end up with two loadable modules here,
and maybe the right sorts of functions could even have some practical
use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#29

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Robert Haas (#28)

Re: fixing old_snapshot_threshold's time->xid mapping

On Thu, Sep 24, 2020 at 03:46:14PM -0400, Robert Haas wrote:

Committed.

Cool, thanks.

Thomas, with respect to your part of this patch set, I wonder if we
can make the functions that you're using to write tests safe enough
that we could add them to contrib/old_snapshot and let users run them
if they want. As you have them, they are hedged around with vague and
scary warnings, but is that really justified? And if so, can it be
fixed? It would be nicer not to end up with two loadable modules here,
and maybe the right sorts of functions could even have some practical
use.

I have switched this item as waiting on author in the CF app then, as
we are not completely done yet.
--
Michael

#30