old_snapshot_threshold allows heap:toast disagreement

Started by Robert Haasover 9 years ago11 messages
#1Robert Haas
robertmhaas@gmail.com
1 attachment(s)

On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote:

This PostgreSQL 9.6 open item now needs a permanent owner. Would any other
committer like to take ownership? If this role interests you, please read
this thread and the policy linked above, then send an initial status update
bearing a date for your subsequent status update. If the item does not have a
permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting
commit 848ef42 and followups.

I will adopt this item. I will provide an initial patch for this
issue, or convince someone else to do so, within one week. Therefore,
expect a further status update from me on or before July 28th. I
expect that the patch will be based on ideas from these emails:

/messages/by-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de
/messages/by-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de

Here is a patch. Please review. I'm not suffering from an
overwhelming excess of confidence that this is correct. In
particular, I'm concerned about the fact that there isn't always a
registered snapshot - if you make it ERROR out when that happens
instead of doing what it actually does, the regression tests fail in
CLUSTER and CREATE VIEW. Also, I wonder why it's right to use
pairingheap_first() instead of looking at the oldest snapshot on the
active snapshot stack, or conversely why that is right and this is
not. Or maybe we need to check both.

The basic principle of testing both MVCC and TOAST snapshots for
snapshot-too-old problems seems sound. Furthermore, it seems clear
enough that if we're ever looking up a datum in the TOAST table whose
xmin is no longer included in our MyPgXact->xmin, then we're
vulnerable to having TOAST chunks vacuumed away under us even without
snapshot_too_old enabled. But there's a bit of spooky action at a
distance: if we don't see any snapshots, then we have to assume that
the scan is being performed with a non-MVCC snapshot and therefore we
don't need to worry. But there's no way to verify that via an
assertion, because the connection between the relevant MVCC snapshot
and the corresponding TOAST snapshot is pretty indirect, mediated only
by snapmgr.c. It's a bit like a 9-1-1 operator (or whatever your
local emergency number is) saying "hey, thanks for calling. if I
don't hear back from you about this issue again, I'll just assume
everything's OK." That might actually the correct approach in some
cases, but it certainly comes with a bit of risk.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

ost-heap-toast-disagreement-v1.patchapplication/x-download; name=ost-heap-toast-disagreement-v1.patchDownload
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 4cffd21..e76148a 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -40,6 +40,7 @@
 #include "utils/expandeddatum.h"
 #include "utils/fmgroids.h"
 #include "utils/rel.h"
+#include "utils/snapmgr.h"
 #include "utils/typcache.h"
 #include "utils/tqual.h"
 
@@ -81,6 +82,7 @@ static int toast_open_indexes(Relation toastrel,
 				   int *num_indexes);
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 					LOCKMODE lock);
+static void init_toast_snapshot(Snapshot toast_snapshot);
 
 
 /* ----------
@@ -1665,6 +1667,7 @@ toast_delete_datum(Relation rel, Datum value)
 	HeapTuple	toasttup;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		return;
@@ -1696,8 +1699,9 @@ toast_delete_datum(Relation rel, Datum value)
 	 * sequence or not, but since we've already locked the index we might as
 	 * well use systable_beginscan_ordered.)
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										   SnapshotToast, 1, &toastkey);
+										   &SnapshotToast, 1, &toastkey);
 	while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -1730,6 +1734,7 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
 	int			num_indexes;
 	int			validIndex;
 	Relation   *toastidxs;
+	SnapshotData	SnapshotToast;
 
 	/* Fetch a valid index relation */
 	validIndex = toast_open_indexes(toastrel,
@@ -1748,9 +1753,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
 	/*
 	 * Is there any such chunk?
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan(toastrel,
 								   RelationGetRelid(toastidxs[validIndex]),
-								   true, SnapshotToast, 1, &toastkey);
+								   true, &SnapshotToast, 1, &toastkey);
 
 	if (systable_getnext(toastscan) != NULL)
 		result = true;
@@ -1813,6 +1819,7 @@ toast_fetch_datum(struct varlena * attr)
 	int32		chunksize;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		elog(ERROR, "toast_fetch_datum shouldn't be called for non-ondisk datums");
@@ -1859,8 +1866,9 @@ toast_fetch_datum(struct varlena * attr)
 	 */
 	nextidx = 0;
 
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										   SnapshotToast, 1, &toastkey);
+										   &SnapshotToast, 1, &toastkey);
 	while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -1990,6 +1998,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	int32		chcpyend;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		elog(ERROR, "toast_fetch_datum_slice shouldn't be called for non-ondisk datums");
@@ -2082,9 +2091,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	 *
 	 * The index is on (valueid, chunkidx) so they will come in order
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	nextidx = startchunk;
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										 SnapshotToast, nscankeys, toastkey);
+										 &SnapshotToast, nscankeys, toastkey);
 	while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -2289,3 +2299,16 @@ toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
 		index_close(toastidxs[i], lock);
 	pfree(toastidxs);
 }
+
+/* ----------
+ * init_toast_snapshot
+ *
+ *	Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot
+ *	to initialize the TOAST snapshot; since we don't know which one to use,
+ *	just use the oldest one.
+ */
+static void
+init_toast_snapshot(Snapshot toast_snapshot)
+{
+	InitToastSnapshot(toast_snapshot, GetMinimumSnapshot());
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e1caf01..c62e895 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -394,6 +394,25 @@ GetLatestSnapshot(void)
 }
 
 /*
+ * GetMinimumSnapshot
+ *
+ *		Get the oldest snapshot for this backend.
+ */
+Snapshot
+GetMinimumSnapshot(void)
+{
+	Snapshot	minSnapshot;
+
+	if (pairingheap_is_empty(&RegisteredSnapshots))
+		return NULL;
+
+	minSnapshot = pairingheap_container(SnapshotData, ph_node,
+									pairingheap_first(&RegisteredSnapshots));
+
+	return minSnapshot;
+}
+
+/*
  * GetCatalogSnapshot
  *		Get a snapshot that is sufficiently up-to-date for scan of the
  *		system catalog with the specified OID.
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 2960455..d99c847 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -78,7 +78,6 @@
 /* Static variables representing various special snapshot semantics */
 SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
-SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
 
 /* local functions */
 static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index c2c6632..ea17aeb 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -28,6 +28,8 @@ typedef uint64 XLogRecPtr;
 #define InvalidXLogRecPtr	0
 #define XLogRecPtrIsInvalid(r)	((r) == InvalidXLogRecPtr)
 
+#define MaximumXLogRecPtr	PG_UINT64_MAX
+
 /*
  * XLogSegNo - physical log file sequence number.
  */
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 3d5dea7..fcd0c75 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -279,7 +279,8 @@ TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page)
 
 	if (old_snapshot_threshold >= 0
 		&& (snapshot) != NULL
-		&& (snapshot)->satisfies == HeapTupleSatisfiesMVCC
+		&& ((snapshot)->satisfies == HeapTupleSatisfiesMVCC
+			|| (snapshot)->satisfies == HeapTupleSatisfiesToast)
 		&& !XLogRecPtrIsInvalid((snapshot)->lsn)
 		&& PageGetLSN(page) > (snapshot)->lsn)
 		TestForOldSnapshot_impl(snapshot, relation);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index e941427..027a64d 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,7 @@ extern TransactionId RecentGlobalDataXmin;
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
+extern Snapshot GetMinimumSnapshot(void);
 
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 2445944..2507e4a 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -15,26 +15,17 @@
 #ifndef TQUAL_H
 #define TQUAL_H
 
+#include "access/xlogdefs.h"
 #include "utils/snapshot.h"
 
 
 /* Static variables representing various special snapshot semantics */
 extern PGDLLIMPORT SnapshotData SnapshotSelfData;
 extern PGDLLIMPORT SnapshotData SnapshotAnyData;
-extern PGDLLIMPORT SnapshotData SnapshotToastData;
 extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
 
 #define SnapshotSelf		(&SnapshotSelfData)
 #define SnapshotAny			(&SnapshotAnyData)
-#define SnapshotToast		(&SnapshotToastData)
-
-/*
- * We don't provide a static SnapshotDirty variable because it would be
- * non-reentrant.  Instead, users of that snapshot type should declare a
- * local variable of type SnapshotData, and initialize it with this macro.
- */
-#define InitDirtySnapshot(snapshotdata)  \
-	((snapshotdata).satisfies = HeapTupleSatisfiesDirty)
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
@@ -100,4 +91,33 @@ extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  HeapTuple htup,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
+
+/*
+ * We don't provide a static SnapshotDirty variable because it would be
+ * non-reentrant.  Instead, users of that snapshot type should declare a
+ * local variable of type SnapshotData, and initialize it with this macro.
+ */
+#define InitDirtySnapshot(snapshotdata)  \
+	((snapshotdata).satisfies = HeapTupleSatisfiesDirty)
+
+/*
+ * Similarly, some initialization is required for SnapshotToast.  We need
+ * to set lsn correctly to support snapshot_too_old; the idea is to initialize
+ * these values from the MVCC snapshot that was used to read the data from the
+ * heap.  In practice, we don't know which MVCC snapshot was used, but it's OK
+ * to use an older one; at worst, we will get a "snapshot too old" error that
+ * might have been avoided otherwise.
+ *
+ * If there's no snapshot from which to initialize, we assume that we need
+ * not worry about throwing a "snapshot too old" error.
+ */
+static inline void
+InitToastSnapshot(Snapshot toast_snapshot, Snapshot snapshot)
+{
+	toast_snapshot->satisfies = HeapTupleSatisfiesToast;
+	toast_snapshot->lsn = MaximumXLogRecPtr;
+	if (snapshot != NULL)
+		toast_snapshot->lsn = snapshot->lsn;
+}
+
 #endif   /* TQUAL_H */
#2Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: old_snapshot_threshold allows heap:toast disagreement

Hi,

On 2016-07-26 15:13:41 -0400, Robert Haas wrote:

On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch <noah@leadboat.com> wrote:

This PostgreSQL 9.6 open item now needs a permanent owner. Would any other
committer like to take ownership? If this role interests you, please read
this thread and the policy linked above, then send an initial status update
bearing a date for your subsequent status update. If the item does not have a
permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by reverting
commit 848ef42 and followups.

I will adopt this item. I will provide an initial patch for this
issue, or convince someone else to do so, within one week. Therefore,
expect a further status update from me on or before July 28th. I
expect that the patch will be based on ideas from these emails:

/messages/by-id/1AB8F80A-D16E-4154-9497-98FBB164253D@anarazel.de
/messages/by-id/20160720181213.f4io7gc6lyc377sw@alap3.anarazel.de

Here is a patch.

Cool!

Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using
InvalidXLogRecPtr do the trick? That already prevents errors.

Please review. I'm not suffering from an overwhelming excess of
confidence that this is correct. In particular, I'm concerned about
the fact that there isn't always a registered snapshot - if you make
it ERROR out when that happens instead of doing what it actually does,
the regression tests fail in CLUSTER and CREATE VIEW.

Right.

Also, I wonder why it's right to use
pairingheap_first() instead of looking at the oldest snapshot on the
active snapshot stack, or conversely why that is right and this is
not. Or maybe we need to check both.

Well, generally, the older the snapshot we use is, the more we'll error
out spuriously. The reason to use the oldest from the pairing heap is
that that'll be the most conservative value, right? If there's neither
an active, nor a registered snapshot, we'll not prevent pruning in the
first place (and xmin ought to be invalid). As registered snapshots are
usually "older" than active ones, we definitely have to check those. But
you're right, it seems safer to also check active ones - which squares
with SnapshotResetXmin().

The basic principle of testing both MVCC and TOAST snapshots for
snapshot-too-old problems seems sound. Furthermore, it seems clear
enough that if we're ever looking up a datum in the TOAST table whose
xmin is no longer included in our MyPgXact->xmin, then we're
vulnerable to having TOAST chunks vacuumed away under us even without
snapshot_too_old enabled.

Exactly.

But there's a bit of spooky action at a
distance: if we don't see any snapshots, then we have to assume that
the scan is being performed with a non-MVCC snapshot and therefore we
don't need to worry. But there's no way to verify that via an
assertion, because the connection between the relevant MVCC snapshot
and the corresponding TOAST snapshot is pretty indirect, mediated only
by snapmgr.c.

Hm. Could we perhaps assert that the session has a valid xmin?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#2)
1 attachment(s)
Re: old_snapshot_threshold allows heap:toast disagreement

On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:

Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using
InvalidXLogRecPtr do the trick? That already prevents errors.

Oh, right.

Also, I wonder why it's right to use
pairingheap_first() instead of looking at the oldest snapshot on the
active snapshot stack, or conversely why that is right and this is
not. Or maybe we need to check both.

Well, generally, the older the snapshot we use is, the more we'll error
out spuriously. The reason to use the oldest from the pairing heap is
that that'll be the most conservative value, right? If there's neither
an active, nor a registered snapshot, we'll not prevent pruning in the
first place (and xmin ought to be invalid). As registered snapshots are
usually "older" than active ones, we definitely have to check those. But
you're right, it seems safer to also check active ones - which squares
with SnapshotResetXmin().

OK. That's a bit inconvenient because we don't have an O(1) way to
access the bottom of the active snapshot stack; I've attempted to add
such a mechanism here.

But there's a bit of spooky action at a
distance: if we don't see any snapshots, then we have to assume that
the scan is being performed with a non-MVCC snapshot and therefore we
don't need to worry. But there's no way to verify that via an
assertion, because the connection between the relevant MVCC snapshot
and the corresponding TOAST snapshot is pretty indirect, mediated only
by snapmgr.c.

Hm. Could we perhaps assert that the session has a valid xmin?

I don't think so. CLUSTER?

New version attached.

(Official status update: I expect to have this issue wrapped up in the
next few days, assuming that review and discussion continue. If for
some reason that fails to be the case, I will provide a further
official status update no later than Tuesday of next week: August 2,
2016.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

ost-heap-toast-disagreement-v2.patchapplication/x-download; name=ost-heap-toast-disagreement-v2.patchDownload
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 4cffd21..91afba1 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -40,6 +40,7 @@
 #include "utils/expandeddatum.h"
 #include "utils/fmgroids.h"
 #include "utils/rel.h"
+#include "utils/snapmgr.h"
 #include "utils/typcache.h"
 #include "utils/tqual.h"
 
@@ -81,6 +82,7 @@ static int toast_open_indexes(Relation toastrel,
 				   int *num_indexes);
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 					LOCKMODE lock);
+static void init_toast_snapshot(Snapshot toast_snapshot);
 
 
 /* ----------
@@ -1665,6 +1667,7 @@ toast_delete_datum(Relation rel, Datum value)
 	HeapTuple	toasttup;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		return;
@@ -1696,8 +1699,9 @@ toast_delete_datum(Relation rel, Datum value)
 	 * sequence or not, but since we've already locked the index we might as
 	 * well use systable_beginscan_ordered.)
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										   SnapshotToast, 1, &toastkey);
+										   &SnapshotToast, 1, &toastkey);
 	while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -1730,6 +1734,7 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
 	int			num_indexes;
 	int			validIndex;
 	Relation   *toastidxs;
+	SnapshotData	SnapshotToast;
 
 	/* Fetch a valid index relation */
 	validIndex = toast_open_indexes(toastrel,
@@ -1748,9 +1753,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
 	/*
 	 * Is there any such chunk?
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan(toastrel,
 								   RelationGetRelid(toastidxs[validIndex]),
-								   true, SnapshotToast, 1, &toastkey);
+								   true, &SnapshotToast, 1, &toastkey);
 
 	if (systable_getnext(toastscan) != NULL)
 		result = true;
@@ -1813,6 +1819,7 @@ toast_fetch_datum(struct varlena * attr)
 	int32		chunksize;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		elog(ERROR, "toast_fetch_datum shouldn't be called for non-ondisk datums");
@@ -1859,8 +1866,9 @@ toast_fetch_datum(struct varlena * attr)
 	 */
 	nextidx = 0;
 
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										   SnapshotToast, 1, &toastkey);
+										   &SnapshotToast, 1, &toastkey);
 	while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -1990,6 +1998,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	int32		chcpyend;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		elog(ERROR, "toast_fetch_datum_slice shouldn't be called for non-ondisk datums");
@@ -2082,9 +2091,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	 *
 	 * The index is on (valueid, chunkidx) so they will come in order
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	nextidx = startchunk;
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										 SnapshotToast, nscankeys, toastkey);
+										 &SnapshotToast, nscankeys, toastkey);
 	while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -2289,3 +2299,17 @@ toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
 		index_close(toastidxs[i], lock);
 	pfree(toastidxs);
 }
+
+/* ----------
+ * init_toast_snapshot
+ *
+ *	Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot
+ *	to initialize the TOAST snapshot; since we don't know which one to use,
+ *	just use the oldest one.  This is safe: at worst, we will get a "snapshot
+ *	too old" error that might have been avoided otherwise.
+ */
+static void
+init_toast_snapshot(Snapshot toast_snapshot)
+{
+	InitToastSnapshot(toast_snapshot, GetMinimumSnapshotLSN());
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e1caf01..42bff83 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -188,6 +188,9 @@ typedef struct ActiveSnapshotElt
 /* Top of the stack of active snapshots */
 static ActiveSnapshotElt *ActiveSnapshot = NULL;
 
+/* Bottom of the stack of active snapshots */
+static ActiveSnapshotElt *OldestActiveSnapshot = NULL;
+
 /*
  * Currently registered Snapshots.  Ordered in a heap by xmin, so that we can
  * quickly find the one with lowest xmin, to advance our MyPgXat->xmin.
@@ -394,6 +397,35 @@ GetLatestSnapshot(void)
 }
 
 /*
+ * GetMinimumSnapshotLSN
+ *
+ *		Get the oldest LSN for any known snapshot.
+ */
+XLogRecPtr
+GetMinimumSnapshotLSN(void)
+{
+	XLogRecPtr	RegisteredLSN = InvalidXLogRecPtr;
+	XLogRecPtr	ActiveLSN = InvalidXLogRecPtr;
+
+	if (!pairingheap_is_empty(&RegisteredSnapshots))
+	{
+		Snapshot	minSnapshot;
+
+		minSnapshot = pairingheap_container(SnapshotData, ph_node,
+									pairingheap_first(&RegisteredSnapshots));
+		RegisteredLSN = minSnapshot->lsn;
+	}
+
+	if (OldestActiveSnapshot != NULL)
+		ActiveLSN = OldestActiveSnapshot->as_snap->lsn;
+
+	if (XLogRecPtrIsInvalid(RegisteredLSN) || RegisteredLSN > ActiveLSN)
+		return ActiveLSN;
+
+	return RegisteredLSN;
+}
+
+/*
  * GetCatalogSnapshot
  *		Get a snapshot that is sufficiently up-to-date for scan of the
  *		system catalog with the specified OID.
@@ -674,6 +706,8 @@ PushActiveSnapshot(Snapshot snap)
 	newactive->as_snap->active_count++;
 
 	ActiveSnapshot = newactive;
+	if (OldestActiveSnapshot == NULL)
+		OldestActiveSnapshot = ActiveSnapshot;
 }
 
 /*
@@ -744,6 +778,8 @@ PopActiveSnapshot(void)
 
 	pfree(ActiveSnapshot);
 	ActiveSnapshot = newstack;
+	if (ActiveSnapshot == NULL)
+		OldestActiveSnapshot = NULL;
 
 	SnapshotResetXmin();
 }
@@ -953,6 +989,8 @@ AtSubAbort_Snapshot(int level)
 		pfree(ActiveSnapshot);
 
 		ActiveSnapshot = next;
+		if (ActiveSnapshot == NULL)
+			OldestActiveSnapshot = NULL;
 	}
 
 	SnapshotResetXmin();
@@ -1037,6 +1075,7 @@ AtEOXact_Snapshot(bool isCommit)
 	 * it'll go away with TopTransactionContext.
 	 */
 	ActiveSnapshot = NULL;
+	OldestActiveSnapshot = NULL;
 	pairingheap_reset(&RegisteredSnapshots);
 
 	CurrentSnapshot = NULL;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 2960455..d99c847 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -78,7 +78,6 @@
 /* Static variables representing various special snapshot semantics */
 SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
-SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
 
 /* local functions */
 static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 3d5dea7..fcd0c75 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -279,7 +279,8 @@ TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page)
 
 	if (old_snapshot_threshold >= 0
 		&& (snapshot) != NULL
-		&& (snapshot)->satisfies == HeapTupleSatisfiesMVCC
+		&& ((snapshot)->satisfies == HeapTupleSatisfiesMVCC
+			|| (snapshot)->satisfies == HeapTupleSatisfiesToast)
 		&& !XLogRecPtrIsInvalid((snapshot)->lsn)
 		&& PageGetLSN(page) > (snapshot)->lsn)
 		TestForOldSnapshot_impl(snapshot, relation);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index e941427..39d8193 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,7 @@ extern TransactionId RecentGlobalDataXmin;
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
+extern XLogRecPtr GetMinimumSnapshotLSN(void);
 
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 2445944..b82e843 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -16,25 +16,16 @@
 #define TQUAL_H
 
 #include "utils/snapshot.h"
+#include "access/xlogdefs.h"
 
 
 /* Static variables representing various special snapshot semantics */
 extern PGDLLIMPORT SnapshotData SnapshotSelfData;
 extern PGDLLIMPORT SnapshotData SnapshotAnyData;
-extern PGDLLIMPORT SnapshotData SnapshotToastData;
 extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
 
 #define SnapshotSelf		(&SnapshotSelfData)
 #define SnapshotAny			(&SnapshotAnyData)
-#define SnapshotToast		(&SnapshotToastData)
-
-/*
- * We don't provide a static SnapshotDirty variable because it would be
- * non-reentrant.  Instead, users of that snapshot type should declare a
- * local variable of type SnapshotData, and initialize it with this macro.
- */
-#define InitDirtySnapshot(snapshotdata)  \
-	((snapshotdata).satisfies = HeapTupleSatisfiesDirty)
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
@@ -100,4 +91,24 @@ extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  HeapTuple htup,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
+
+/*
+ * We don't provide a static SnapshotDirty variable because it would be
+ * non-reentrant.  Instead, users of that snapshot type should declare a
+ * local variable of type SnapshotData, and initialize it with this macro.
+ */
+#define InitDirtySnapshot(snapshotdata)  \
+	((snapshotdata).satisfies = HeapTupleSatisfiesDirty)
+
+/*
+ * Similarly, some initialization is required for SnapshotToast.  We need
+ * to set lsn correctly to support snapshot_too_old.
+ */
+static inline void
+InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
+{
+	snapshot->satisfies = HeapTupleSatisfiesToast;
+	snapshot->lsn = lsn;
+}
+
 #endif   /* TQUAL_H */
#4Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#3)
Re: old_snapshot_threshold allows heap:toast disagreement

On 2016-07-28 15:40:48 -0400, Robert Haas wrote:

On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:

Also, I wonder why it's right to use
pairingheap_first() instead of looking at the oldest snapshot on the
active snapshot stack, or conversely why that is right and this is
not. Or maybe we need to check both.

Well, generally, the older the snapshot we use is, the more we'll error
out spuriously. The reason to use the oldest from the pairing heap is
that that'll be the most conservative value, right? If there's neither
an active, nor a registered snapshot, we'll not prevent pruning in the
first place (and xmin ought to be invalid). As registered snapshots are
usually "older" than active ones, we definitely have to check those. But
you're right, it seems safer to also check active ones - which squares
with SnapshotResetXmin().

OK. That's a bit inconvenient because we don't have an O(1) way to
access the bottom of the active snapshot stack; I've attempted to add
such a mechanism here.

I think just iterating through the active snapshots would have been
fine. Afaics there's no guarantee that the first active snapshot pushed
is the relevant one - in contrast to registered one, which are ordered
by virtue of the heap.

But there's a bit of spooky action at a
distance: if we don't see any snapshots, then we have to assume that
the scan is being performed with a non-MVCC snapshot and therefore we
don't need to worry. But there's no way to verify that via an
assertion, because the connection between the relevant MVCC snapshot
and the corresponding TOAST snapshot is pretty indirect, mediated only
by snapmgr.c.

Hm. Could we perhaps assert that the session has a valid xmin?

I don't think so. CLUSTER?

That should have one during any toast lookups afaics - the relevant code
is
/* Start a new transaction for each relation. */
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
/* Do the job. */
cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose);
PopActiveSnapshot();
CommitTransactionCommand();
right? And
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...

if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
sets xmin.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#4)
Re: old_snapshot_threshold allows heap:toast disagreement

On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:

I think just iterating through the active snapshots would have been
fine. Afaics there's no guarantee that the first active snapshot pushed
is the relevant one - in contrast to registered one, which are ordered
by virtue of the heap.

I think the oldest snapshot has to be on the bottom of the stack; how not?

Hm. Could we perhaps assert that the session has a valid xmin?

I don't think so. CLUSTER?

That should have one during any toast lookups afaics - the relevant code
is
/* Start a new transaction for each relation. */
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
/* Do the job. */
cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose);
PopActiveSnapshot();
CommitTransactionCommand();
right? And
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...

if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
sets xmin.

Hmm, OK, I'll have to check on that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: old_snapshot_threshold allows heap:toast disagreement

On 2016-07-28 23:08:13 -0400, Robert Haas wrote:

On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:

I think just iterating through the active snapshots would have been
fine. Afaics there's no guarantee that the first active snapshot pushed
is the relevant one - in contrast to registered one, which are ordered
by virtue of the heap.

I think the oldest snapshot has to be on the bottom of the stack; how not?

Well, one might push a previously acuired (and registered) snapshot onto
the stack. Afaics that'll only happen if the snapshot is already
registered, but I'm not sure.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#3)
Re: old_snapshot_threshold allows heap:toast disagreement

On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:

New version attached.

+static inline void
+InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
+{
+ snapshot->satisfies = HeapTupleSatisfiesToast;
+ snapshot->lsn = lsn;
+}

Here, don't you need to initialize whenTaken as that is also used in
TestForOldSnapshot_impl() to report error "snapshot too old".

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#7)
Re: old_snapshot_threshold allows heap:toast disagreement

On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:

New version attached.

+static inline void
+InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
+{
+ snapshot->satisfies = HeapTupleSatisfiesToast;
+ snapshot->lsn = lsn;
+}

Here, don't you need to initialize whenTaken as that is also used in
TestForOldSnapshot_impl() to report error "snapshot too old".

Hmm, yeah. This is actually a bit confusing. We want the "oldest"
snapshot, but there are three different notions of "oldest":

1. Smallest LSN.
2. Smallest whenTaken.
3. Smallest xmin.

Which one do we use?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#8)
Re: old_snapshot_threshold allows heap:toast disagreement

On Tue, Aug 2, 2016 at 9:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund <andres@anarazel.de> wrote:

New version attached.

+static inline void
+InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
+{
+ snapshot->satisfies = HeapTupleSatisfiesToast;
+ snapshot->lsn = lsn;
+}

Here, don't you need to initialize whenTaken as that is also used in
TestForOldSnapshot_impl() to report error "snapshot too old".

Hmm, yeah. This is actually a bit confusing. We want the "oldest"
snapshot, but there are three different notions of "oldest":

1. Smallest LSN.
2. Smallest whenTaken.
3. Smallest xmin.

Which one do we use?

Which ever notion we choose, I think we should use the LSN and
whenTaken from the same snapshot. I think we can choose the snapshot
with smallest xmin and then use the LSN and whenTaken from it. I
think xmin comparison makes more sense because you are already
retrieving smallest xmin snapshot from the registered snapshots.

Whatever value we choose here, I think we can't guarantee that it will
be in sync with the value used for heap. So there is a chance that we
might spuriously raise "snapshot too old" error. I think we should
mentioned this as a caveat in docs [1]https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR where we explain behaviour of
about old_snapshot_threshold.

[1]: https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#4)
1 attachment(s)
Re: old_snapshot_threshold allows heap:toast disagreement

On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:

I think just iterating through the active snapshots would have been
fine. Afaics there's no guarantee that the first active snapshot pushed
is the relevant one - in contrast to registered one, which are ordered
by virtue of the heap.

After a bit of scrutiny, I don't think this is a problem, and I don't
want to introduce belt-and-suspenders protections against can't-happen
conditions that may get cargo-culted into other places. (How's that
for cramming as many Tom-Lane-isms as possible into one sentence?
More might be done, but it would be some sort of bespoke crock of the
first water.)

Anyway, the reason I think it's OK is that PushActiveSnapshot() does
not copy the input snapshot unless it's CurrentSnapshot or
SecondarySnapshot -- so every snapshot that gets pushed on the active
snapshot stack is either one that's already registered or one that was
just taken (because CurrentSnapshot will get overwritten on next call
to GetTransactionSnapshot). In the former case, it's included in the
registered snapshots so it doesn't even matter whether we notice that
it is also on the active snapshot stack. In the latter case, having
just been taken, it must be newer than the oldest registered snapshot,
which was necessarily taken earlier. I think that the only time it
matters whether we even look at the active snapshot stack is when
there are no registered snapshots whatsoever. In many cases -- like
QueryDescs -- we register the snapshot first and then later put it on
the active snapshot stack, but there are some things like CLUSTER that
only make the snapshot active and don't register it. But if they do
that, they can only do it in first-in, first-out fashion.

But there's a bit of spooky action at a
distance: if we don't see any snapshots, then we have to assume that
the scan is being performed with a non-MVCC snapshot and therefore we
don't need to worry. But there's no way to verify that via an
assertion, because the connection between the relevant MVCC snapshot
and the corresponding TOAST snapshot is pretty indirect, mediated only
by snapmgr.c.

Hm. Could we perhaps assert that the session has a valid xmin?

I don't think so. CLUSTER?

That should have one during any toast lookups afaics - the relevant code
is
/* Start a new transaction for each relation. */
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
/* Do the job. */
cluster_rel(rvtc->tableOid, rvtc->indexOid, true, stmt->verbose);
PopActiveSnapshot();
CommitTransactionCommand();
right? And
Snapshot
GetSnapshotData(Snapshot snapshot)
{
...

if (!TransactionIdIsValid(MyPgXact->xmin))
MyPgXact->xmin = TransactionXmin = xmin;
sets xmin.

Yeah. Actually, consistent with the above, I discovered that as long
as we consult both the active snapshot stack and the pairingheap of
registered snapshots, it seems to be fine to just insist that we
always get a snapshot back from the snapmgr and use that to initialize
the TOAST snapshot. So I did it that way in the attached version.

New patch attached. Barring objections, I will commit this tomorrow.
If there are objections, I will send another status update tomorrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

ost-heap-toast-disagreement-v3.patchapplication/x-download; name=ost-heap-toast-disagreement-v3.patchDownload
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 4cffd21..55b6b41 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -40,6 +40,7 @@
 #include "utils/expandeddatum.h"
 #include "utils/fmgroids.h"
 #include "utils/rel.h"
+#include "utils/snapmgr.h"
 #include "utils/typcache.h"
 #include "utils/tqual.h"
 
@@ -81,6 +82,7 @@ static int toast_open_indexes(Relation toastrel,
 				   int *num_indexes);
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 					LOCKMODE lock);
+static void init_toast_snapshot(Snapshot toast_snapshot);
 
 
 /* ----------
@@ -1665,6 +1667,7 @@ toast_delete_datum(Relation rel, Datum value)
 	HeapTuple	toasttup;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		return;
@@ -1696,8 +1699,9 @@ toast_delete_datum(Relation rel, Datum value)
 	 * sequence or not, but since we've already locked the index we might as
 	 * well use systable_beginscan_ordered.)
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										   SnapshotToast, 1, &toastkey);
+										   &SnapshotToast, 1, &toastkey);
 	while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -1730,6 +1734,7 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
 	int			num_indexes;
 	int			validIndex;
 	Relation   *toastidxs;
+	SnapshotData	SnapshotToast;
 
 	/* Fetch a valid index relation */
 	validIndex = toast_open_indexes(toastrel,
@@ -1748,9 +1753,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
 	/*
 	 * Is there any such chunk?
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan(toastrel,
 								   RelationGetRelid(toastidxs[validIndex]),
-								   true, SnapshotToast, 1, &toastkey);
+								   true, &SnapshotToast, 1, &toastkey);
 
 	if (systable_getnext(toastscan) != NULL)
 		result = true;
@@ -1813,6 +1819,7 @@ toast_fetch_datum(struct varlena * attr)
 	int32		chunksize;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		elog(ERROR, "toast_fetch_datum shouldn't be called for non-ondisk datums");
@@ -1859,8 +1866,9 @@ toast_fetch_datum(struct varlena * attr)
 	 */
 	nextidx = 0;
 
+	init_toast_snapshot(&SnapshotToast);
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										   SnapshotToast, 1, &toastkey);
+										   &SnapshotToast, 1, &toastkey);
 	while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -1990,6 +1998,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	int32		chcpyend;
 	int			num_indexes;
 	int			validIndex;
+	SnapshotData	SnapshotToast;
 
 	if (!VARATT_IS_EXTERNAL_ONDISK(attr))
 		elog(ERROR, "toast_fetch_datum_slice shouldn't be called for non-ondisk datums");
@@ -2082,9 +2091,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
 	 *
 	 * The index is on (valueid, chunkidx) so they will come in order
 	 */
+	init_toast_snapshot(&SnapshotToast);
 	nextidx = startchunk;
 	toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
-										 SnapshotToast, nscankeys, toastkey);
+										 &SnapshotToast, nscankeys, toastkey);
 	while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
 	{
 		/*
@@ -2289,3 +2299,22 @@ toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
 		index_close(toastidxs[i], lock);
 	pfree(toastidxs);
 }
+
+/* ----------
+ * init_toast_snapshot
+ *
+ *	Initialize an appropriate TOAST snapshot.  We must use an MVCC snapshot
+ *	to initialize the TOAST snapshot; since we don't know which one to use,
+ *	just use the oldest one.  This is safe: at worst, we will get a "snapshot
+ *	too old" error that might have been avoided otherwise.
+ */
+static void
+init_toast_snapshot(Snapshot toast_snapshot)
+{
+	Snapshot	snapshot = GetOldestSnapshot();
+
+	if (snapshot == NULL)
+		elog(ERROR, "no known snapshots");
+
+	InitToastSnapshot(toast_snapshot, snapshot->lsn, snapshot->whenTaken);
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index e1caf01..4bddaed 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -188,6 +188,9 @@ typedef struct ActiveSnapshotElt
 /* Top of the stack of active snapshots */
 static ActiveSnapshotElt *ActiveSnapshot = NULL;
 
+/* Bottom of the stack of active snapshots */
+static ActiveSnapshotElt *OldestActiveSnapshot = NULL;
+
 /*
  * Currently registered Snapshots.  Ordered in a heap by xmin, so that we can
  * quickly find the one with lowest xmin, to advance our MyPgXat->xmin.
@@ -394,6 +397,34 @@ GetLatestSnapshot(void)
 }
 
 /*
+ * GetOldestSnapshot
+ *
+ *		Get the oldest known snapshot, as judged by the LSN.
+ */
+Snapshot
+GetOldestSnapshot(void)
+{
+	Snapshot	OldestRegisteredSnapshot = NULL;
+	XLogRecPtr	RegisteredLSN = InvalidXLogRecPtr;
+	XLogRecPtr	ActiveLSN = InvalidXLogRecPtr;
+
+	if (!pairingheap_is_empty(&RegisteredSnapshots))
+	{
+		OldestRegisteredSnapshot = pairingheap_container(SnapshotData, ph_node,
+									pairingheap_first(&RegisteredSnapshots));
+		RegisteredLSN = OldestRegisteredSnapshot->lsn;
+	}
+
+	if (OldestActiveSnapshot != NULL)
+		ActiveLSN = OldestActiveSnapshot->as_snap->lsn;
+
+	if (XLogRecPtrIsInvalid(RegisteredLSN) || RegisteredLSN > ActiveLSN)
+		return OldestActiveSnapshot->as_snap;
+
+	return OldestRegisteredSnapshot;
+}
+
+/*
  * GetCatalogSnapshot
  *		Get a snapshot that is sufficiently up-to-date for scan of the
  *		system catalog with the specified OID.
@@ -674,6 +705,8 @@ PushActiveSnapshot(Snapshot snap)
 	newactive->as_snap->active_count++;
 
 	ActiveSnapshot = newactive;
+	if (OldestActiveSnapshot == NULL)
+		OldestActiveSnapshot = ActiveSnapshot;
 }
 
 /*
@@ -744,6 +777,8 @@ PopActiveSnapshot(void)
 
 	pfree(ActiveSnapshot);
 	ActiveSnapshot = newstack;
+	if (ActiveSnapshot == NULL)
+		OldestActiveSnapshot = NULL;
 
 	SnapshotResetXmin();
 }
@@ -953,6 +988,8 @@ AtSubAbort_Snapshot(int level)
 		pfree(ActiveSnapshot);
 
 		ActiveSnapshot = next;
+		if (ActiveSnapshot == NULL)
+			OldestActiveSnapshot = NULL;
 	}
 
 	SnapshotResetXmin();
@@ -1037,6 +1074,7 @@ AtEOXact_Snapshot(bool isCommit)
 	 * it'll go away with TopTransactionContext.
 	 */
 	ActiveSnapshot = NULL;
+	OldestActiveSnapshot = NULL;
 	pairingheap_reset(&RegisteredSnapshots);
 
 	CurrentSnapshot = NULL;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 2960455..d99c847 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -78,7 +78,6 @@
 /* Static variables representing various special snapshot semantics */
 SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
-SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};
 
 /* local functions */
 static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 3d5dea7..fcd0c75 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -279,7 +279,8 @@ TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page)
 
 	if (old_snapshot_threshold >= 0
 		&& (snapshot) != NULL
-		&& (snapshot)->satisfies == HeapTupleSatisfiesMVCC
+		&& ((snapshot)->satisfies == HeapTupleSatisfiesMVCC
+			|| (snapshot)->satisfies == HeapTupleSatisfiesToast)
 		&& !XLogRecPtrIsInvalid((snapshot)->lsn)
 		&& PageGetLSN(page) > (snapshot)->lsn)
 		TestForOldSnapshot_impl(snapshot, relation);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index e941427..9e38272 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,7 @@ extern TransactionId RecentGlobalDataXmin;
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
 extern void SnapshotSetCommandId(CommandId curcid);
+extern Snapshot GetOldestSnapshot(void);
 
 extern Snapshot GetCatalogSnapshot(Oid relid);
 extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 2445944..8041e7b 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -16,25 +16,16 @@
 #define TQUAL_H
 
 #include "utils/snapshot.h"
+#include "access/xlogdefs.h"
 
 
 /* Static variables representing various special snapshot semantics */
 extern PGDLLIMPORT SnapshotData SnapshotSelfData;
 extern PGDLLIMPORT SnapshotData SnapshotAnyData;
-extern PGDLLIMPORT SnapshotData SnapshotToastData;
 extern PGDLLIMPORT SnapshotData CatalogSnapshotData;
 
 #define SnapshotSelf		(&SnapshotSelfData)
 #define SnapshotAny			(&SnapshotAnyData)
-#define SnapshotToast		(&SnapshotToastData)
-
-/*
- * We don't provide a static SnapshotDirty variable because it would be
- * non-reentrant.  Instead, users of that snapshot type should declare a
- * local variable of type SnapshotData, and initialize it with this macro.
- */
-#define InitDirtySnapshot(snapshotdata)  \
-	((snapshotdata).satisfies = HeapTupleSatisfiesDirty)
 
 /* This macro encodes the knowledge of which snapshots are MVCC-safe */
 #define IsMVCCSnapshot(snapshot)  \
@@ -100,4 +91,25 @@ extern bool ResolveCminCmaxDuringDecoding(struct HTAB *tuplecid_data,
 							  HeapTuple htup,
 							  Buffer buffer,
 							  CommandId *cmin, CommandId *cmax);
+
+/*
+ * We don't provide a static SnapshotDirty variable because it would be
+ * non-reentrant.  Instead, users of that snapshot type should declare a
+ * local variable of type SnapshotData, and initialize it with this macro.
+ */
+#define InitDirtySnapshot(snapshotdata)  \
+	((snapshotdata).satisfies = HeapTupleSatisfiesDirty)
+
+/*
+ * Similarly, some initialization is required for SnapshotToast.  We need
+ * to set lsn and whenTaken correctly to support snapshot_too_old.
+ */
+static inline void
+InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn, int64 whenTaken)
+{
+	snapshot->satisfies = HeapTupleSatisfiesToast;
+	snapshot->lsn = lsn;
+	snapshot->whenTaken = whenTaken;
+}
+
 #endif   /* TQUAL_H */
#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#10)
Re: old_snapshot_threshold allows heap:toast disagreement

On Wed, Aug 3, 2016 at 2:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jul 28, 2016 at 7:29 PM, Andres Freund <andres@anarazel.de> wrote:

Yeah. Actually, consistent with the above, I discovered that as long
as we consult both the active snapshot stack and the pairingheap of
registered snapshots, it seems to be fine to just insist that we
always get a snapshot back from the snapmgr and use that to initialize
the TOAST snapshot. So I did it that way in the attached version.

New patch attached.

Code looks good to me. I have not tested the patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers