Pinning a buffer in TupleTableSlot is unnecessary

Started by Heikki Linnakangasover 9 years ago25 messages
#1Heikki Linnakangas
hlinnaka@iki.fi
1 attachment(s)

While profiling some queries and looking at executor overhead, I
realized that we're not making much use of TupleTableSlot's ability to
hold a buffer pin. In a SeqScan, the buffer is held pinned by the
underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.
The only thing that relies on that feature is TidScan, but we could
easily teach TidScan to hold the buffer pin directly.

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a
ExecStoreTuple() and ExecClearTuple(), which get called a lot. I
couldn't measure any actual difference from that, though, but it seems
like a good idea from a readability point of view, anyway.

How much do we need to worry about breaking extensions? I.e. to what
extent do we consider the TupleTableSlot API to be stable, for
extensions to use? The FDW API uses TupleTableSlots - this patch had to
fix the ExecStoreTuple calls in postgres_fdw.

We could refrain from changing the signature of ExecStoreTuple(), and
throw an error if you try to pass a valid buffer to it. But I also have
some bigger changes to TupleTableSlot in mind. I think we could gain
some speed by making slots "read-only". For example, it would be nice if
a SeqScan could just replace the tts_tuple pointer in the slot, and not
have to worry that the upper nodes might have materialized the slot, and
freeing the copied tuple. Because that happens very rarely in practice.
It would be better if those few places where we currently materialize an
existing slot, we would create a copy of the slot, and leave the
original slot unmodified. I'll start a different thread on that after
some more experimentation, but if we e.g. get rid of
ExecMaterializeSlot() in its current form, would that be OK?

- Heikki

Attachments:

remove-buffer-from-slot-1.patchapplication/x-patch; name=remove-buffer-from-slot-1.patchDownload
commit 44ba53760e83921817b4878e48a2e4ad2fd67493
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Tue Aug 30 13:06:59 2016 +0300

    Remove support for holding a buffer pin in a TupleTableSlot.

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index daf0438..299fa62 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1387,7 +1387,6 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 */
 	ExecStoreTuple(fsstate->tuples[fsstate->next_tuple++],
 				   slot,
-				   InvalidBuffer,
 				   false);
 
 	return slot;
@@ -3152,7 +3151,7 @@ store_returning_result(PgFdwModifyState *fmstate,
 											NULL,
 											fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
-		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
+		ExecStoreTuple(newtup, slot, true);
 	}
 	PG_CATCH();
 	{
@@ -3258,7 +3257,7 @@ get_returning_data(ForeignScanState *node)
 												dmstate->retrieved_attrs,
 												NULL,
 												dmstate->temp_cxt);
-			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
+			ExecStoreTuple(newtup, slot, false);
 		}
 		PG_CATCH();
 		{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b0b43cf..91895fd 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2531,7 +2531,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(heapTuple, slot, false);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -2679,7 +2679,7 @@ IndexCheckExclusion(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(heapTuple, slot, false);
 
 		/*
 		 * In a partial index, ignore tuples that don't satisfy the predicate.
@@ -3100,7 +3100,7 @@ validate_index_heapscan(Relation heapRelation,
 			MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+			ExecStoreTuple(heapTuple, slot, false);
 
 			/*
 			 * In a partial index, discard tuples that don't satisfy the
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index b9fe102..faad4b9 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -96,7 +96,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
 
 	/* Need a slot to hold the tuple being examined */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));
-	ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+	ExecStoreTuple(heapTuple, slot, false);
 
 	/*
 	 * for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c617abb..6137dda 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -743,7 +743,7 @@ compute_index_stats(Relation onerel, double totalrows,
 			ResetExprContext(econtext);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+			ExecStoreTuple(heapTuple, slot, false);
 
 			/* If index is partial, check predicate */
 			if (predicate != NIL)
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 26f9114..baf204c 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -124,7 +124,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(trigdata->tg_relation));
 
-	ExecStoreTuple(new_row, slot, InvalidBuffer, false);
+	ExecStoreTuple(new_row, slot, false);
 
 	/*
 	 * Typically the index won't have expressions, but if it does we need an
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5947e72..a2ab4b2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2435,7 +2435,7 @@ CopyFrom(CopyState cstate)
 
 		/* Place tuple in tuple slot --- but slot shouldn't free it */
 		slot = myslot;
-		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(tuple, slot, false);
 
 		skip_tuple = false;
 
@@ -2603,7 +2603,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			List	   *recheckIndexes;
 
 			cstate->cur_lineno = firstBufferedLineNo + i;
-			ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
+			ExecStoreTuple(bufferedTuples[i], myslot, false);
 			recheckIndexes =
 				ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
 									  estate, false, NULL, NIL);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 86e9814..fdbd86e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4142,7 +4142,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 				 * Process supplied expressions to replace selected columns.
 				 * Expression inputs come from the old tuple.
 				 */
-				ExecStoreTuple(tuple, oldslot, InvalidBuffer, false);
+				ExecStoreTuple(tuple, oldslot, false);
 				econtext->ecxt_scantuple = oldslot;
 
 				foreach(l, tab->newvals)
@@ -4173,7 +4173,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 			}
 
 			/* Now check any constraints on the possibly-changed tuple */
-			ExecStoreTuple(tuple, newslot, InvalidBuffer, false);
+			ExecStoreTuple(tuple, newslot, false);
 			econtext->ecxt_scantuple = newslot;
 
 			foreach(l, notnull_attrs)
@@ -7358,7 +7358,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
-		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(tuple, slot, false);
 
 		if (!ExecQual(exprstate, econtext, true))
 			ereport(ERROR,
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 9de22a1..4ac7b8f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2058,7 +2058,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2133,7 +2133,7 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2513,7 +2513,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2609,7 +2609,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2925,7 +2925,7 @@ TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
 			oldslot = estate->es_trig_oldtup_slot;
 			if (oldslot->tts_tupleDescriptor != tupdesc)
 				ExecSetSlotDescriptor(oldslot, tupdesc);
-			ExecStoreTuple(oldtup, oldslot, InvalidBuffer, false);
+			ExecStoreTuple(oldtup, oldslot, false);
 		}
 		if (HeapTupleIsValid(newtup))
 		{
@@ -2938,7 +2938,7 @@ TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
 			newslot = estate->es_trig_newtup_slot;
 			if (newslot->tts_tupleDescriptor != tupdesc)
 				ExecSetSlotDescriptor(newslot, tupdesc);
-			ExecStoreTuple(newtup, newslot, InvalidBuffer, false);
+			ExecStoreTuple(newtup, newslot, false);
 		}
 
 		/*
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0e2d834..bcd9894 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -753,7 +753,7 @@ retry:
 		 * Extract the index column values and isnull flags from the existing
 		 * tuple.
 		 */
-		ExecStoreTuple(tup, existing_slot, InvalidBuffer, false);
+		ExecStoreTuple(tup, existing_slot, false);
 		FormIndexDatum(indexInfo, existing_slot, estate,
 					   existing_values, existing_isnull);
 
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index fb0013d..36fb73b 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -79,7 +79,7 @@ ExecScanFetch(ScanState *node,
 
 			/* Store test tuple in the plan node's scan slot */
 			ExecStoreTuple(estate->es_epqTuple[scanrelid - 1],
-						   slot, InvalidBuffer, false);
+						   slot, false);
 
 			/* Check if it meets the access-method conditions */
 			if (!(*recheckMtd) (node, slot))
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 533050d..e298d40 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -118,7 +118,6 @@ MakeTupleTableSlot(void)
 	slot->tts_tuple = NULL;
 	slot->tts_tupleDescriptor = NULL;
 	slot->tts_mcxt = CurrentMemoryContext;
-	slot->tts_buffer = InvalidBuffer;
 	slot->tts_nvalid = 0;
 	slot->tts_values = NULL;
 	slot->tts_isnull = NULL;
@@ -289,14 +288,9 @@ ExecSetSlotDescriptor(TupleTableSlot *slot,		/* slot to change */
  *
  *		tuple:	tuple to store
  *		slot:	slot to store it in
- *		buffer: disk buffer if tuple is in a disk page, else InvalidBuffer
  *		shouldFree: true if ExecClearTuple should pfree() the tuple
  *					when done with it
  *
- * If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin
- * on the buffer which is held until the slot is cleared, so that the tuple
- * won't go away on us.
- *
  * shouldFree is normally set 'true' for tuples constructed on-the-fly.
  * It must always be 'false' for tuples that are stored in disk pages,
  * since we don't want to try to pfree those.
@@ -322,7 +316,6 @@ ExecSetSlotDescriptor(TupleTableSlot *slot,		/* slot to change */
 TupleTableSlot *
 ExecStoreTuple(HeapTuple tuple,
 			   TupleTableSlot *slot,
-			   Buffer buffer,
 			   bool shouldFree)
 {
 	/*
@@ -354,24 +347,6 @@ ExecStoreTuple(HeapTuple tuple,
 	/* Mark extracted state invalid */
 	slot->tts_nvalid = 0;
 
-	/*
-	 * If tuple is on a disk page, keep the page pinned as long as we hold a
-	 * pointer into it.  We assume the caller already has such a pin.
-	 *
-	 * This is coded to optimize the case where the slot previously held a
-	 * tuple on the same disk page: in that case releasing and re-acquiring
-	 * the pin is a waste of cycles.  This is a common situation during
-	 * seqscans, so it's worth troubling over.
-	 */
-	if (slot->tts_buffer != buffer)
-	{
-		if (BufferIsValid(slot->tts_buffer))
-			ReleaseBuffer(slot->tts_buffer);
-		slot->tts_buffer = buffer;
-		if (BufferIsValid(buffer))
-			IncrBufferRefCount(buffer);
-	}
-
 	return slot;
 }
 
@@ -404,14 +379,6 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
 		heap_free_minimal_tuple(slot->tts_mintuple);
 
 	/*
-	 * Drop the pin on the referenced buffer, if there is one.
-	 */
-	if (BufferIsValid(slot->tts_buffer))
-		ReleaseBuffer(slot->tts_buffer);
-
-	slot->tts_buffer = InvalidBuffer;
-
-	/*
 	 * Store the new tuple into the specified slot.
 	 */
 	slot->tts_isempty = false;
@@ -460,14 +427,6 @@ ExecClearTuple(TupleTableSlot *slot)	/* slot in which to store tuple */
 	slot->tts_shouldFreeMin = false;
 
 	/*
-	 * Drop the pin on the referenced buffer, if there is one.
-	 */
-	if (BufferIsValid(slot->tts_buffer))
-		ReleaseBuffer(slot->tts_buffer);
-
-	slot->tts_buffer = InvalidBuffer;
-
-	/*
 	 * Mark it empty.
 	 */
 	slot->tts_isempty = true;
@@ -755,14 +714,6 @@ ExecMaterializeSlot(TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldContext);
 
 	/*
-	 * Drop the pin on the referenced buffer, if there is one.
-	 */
-	if (BufferIsValid(slot->tts_buffer))
-		ReleaseBuffer(slot->tts_buffer);
-
-	slot->tts_buffer = InvalidBuffer;
-
-	/*
 	 * Mark extracted state invalid.  This is important because the slot is
 	 * not supposed to depend any more on the previous external data; we
 	 * mustn't leave any dangling pass-by-reference datums in tts_values.
@@ -809,7 +760,7 @@ ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
 	newTuple = ExecCopySlotTuple(srcslot);
 	MemoryContextSwitchTo(oldContext);
 
-	return ExecStoreTuple(newTuple, dstslot, InvalidBuffer, true);
+	return ExecStoreTuple(newTuple, dstslot, true);
 }
 
 
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index ce2fc28..8f9b4a0 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2077,7 +2077,6 @@ agg_retrieve_direct(AggState *aggstate)
 				 */
 				ExecStoreTuple(aggstate->grp_firstTuple,
 							   firstSlot,
-							   InvalidBuffer,
 							   true);
 				aggstate->grp_firstTuple = NULL;		/* don't keep two
 														 * pointers */
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 449aacb..4c489e3 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -274,7 +274,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 */
 		ExecStoreTuple(&scan->rs_ctup,
 					   slot,
-					   scan->rs_cbuf,
 					   false);
 
 		/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 438d1b2..832409c 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -295,8 +295,6 @@ gather_getnext(GatherState *gatherstate)
 			{
 				ExecStoreTuple(tup,		/* tuple to store */
 							   fslot,	/* slot in which to store the tuple */
-							   InvalidBuffer,	/* buffer associated with this
-												 * tuple */
 							   false);	/* slot should not pfree tuple */
 				return fslot;
 			}
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3143bd9..187be4a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,6 @@ IndexNext(IndexScanState *node)
 		 */
 		ExecStoreTuple(tuple,	/* tuple to store */
 					   slot,	/* slot to store in */
-					   scandesc->xs_cbuf,		/* buffer containing tuple */
 					   false);	/* don't pfree */
 
 		/*
@@ -198,7 +197,7 @@ IndexNextWithReorder(IndexScanState *node)
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
-				ExecStoreTuple(tuple, slot, InvalidBuffer, true);
+				ExecStoreTuple(tuple, slot, true);
 				return slot;
 			}
 		}
@@ -230,7 +229,6 @@ next_indextuple:
 		 */
 		ExecStoreTuple(tuple,	/* tuple to store */
 					   slot,	/* slot to store in */
-					   scandesc->xs_cbuf,		/* buffer containing tuple */
 					   false);	/* don't pfree */
 
 		/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index af7b26c..c70288f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -751,7 +751,7 @@ ldelete:;
 
 			if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))
 				ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc));
-			ExecStoreTuple(&deltuple, slot, InvalidBuffer, false);
+			ExecStoreTuple(&deltuple, slot, false);
 		}
 
 		rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
@@ -1173,7 +1173,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
-	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
+	ExecStoreTuple(&tuple, mtstate->mt_existing, false);
 
 	/*
 	 * Make tuple and any needed join variables available to ExecQual and
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 9ce7c02..9a9c9d2 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -65,7 +65,6 @@ SampleNext(SampleScanState *node)
 	if (tuple)
 		ExecStoreTuple(tuple,	/* tuple to store */
 					   slot,	/* slot to store in */
-					   node->ss.ss_currentScanDesc->rs_cbuf,	/* tuple's buffer */
 					   false);	/* don't pfree this pointer */
 	else
 		ExecClearTuple(slot);
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 00bf3a5..5e26594 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -83,15 +83,11 @@ SeqNext(SeqScanState *node)
 	 * save the tuple and the buffer returned to us by the access methods in
 	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
 	 * tuples returned by heap_getnext() are pointers onto disk pages and were
-	 * not created with palloc() and so should not be pfree()'d.  Note also
-	 * that ExecStoreTuple will increment the refcount of the buffer; the
-	 * refcount will not be dropped until the tuple table slot is cleared.
+	 * not created with palloc() and so should not be pfree()'d.
 	 */
 	if (tuple)
 		ExecStoreTuple(tuple,	/* tuple to store */
 					   slot,	/* slot to store in */
-					   scandesc->rs_cbuf,		/* buffer associated with this
-												 * tuple */
 					   false);	/* don't pfree this pointer */
 	else
 		ExecClearTuple(slot);
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 633580b..052ceb9 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -273,7 +273,6 @@ setop_retrieve_direct(SetOpState *setopstate)
 		 */
 		ExecStoreTuple(setopstate->grp_firstTuple,
 					   resultTupleSlot,
-					   InvalidBuffer,
 					   true);
 		setopstate->grp_firstTuple = NULL;		/* don't keep two pointers */
 
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 2604103..6194453 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -256,7 +256,6 @@ TidNext(TidScanState *node)
 	Relation	heapRelation;
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
-	Buffer		buffer = InvalidBuffer;
 	ItemPointerData *tidList;
 	int			numTids;
 	bool		bBackward;
@@ -318,7 +317,14 @@ TidNext(TidScanState *node)
 		if (node->tss_isCurrentOf)
 			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		/* Release old buffer, if any */
+		if (BufferIsValid(node->tss_buffer))
+		{
+			ReleaseBuffer(node->tss_buffer);
+			node->tss_buffer = InvalidBuffer;
+		}
+
+		if (heap_fetch(heapRelation, snapshot, tuple, &node->tss_buffer, false, NULL))
 		{
 			/*
 			 * store the scanned tuple in the scan tuple slot of the scan
@@ -329,15 +335,8 @@ TidNext(TidScanState *node)
 			 */
 			ExecStoreTuple(tuple,		/* tuple to store */
 						   slot,	/* slot to store in */
-						   buffer,		/* buffer associated with tuple  */
 						   false);		/* don't pfree */
 
-			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreTuple incremented the pin count. Drop our local pin.
-			 */
-			ReleaseBuffer(buffer);
-
 			return slot;
 		}
 		/* Bad TID or failed snapshot qual; try next */
@@ -351,6 +350,12 @@ TidNext(TidScanState *node)
 	 * if we get here it means the tid scan failed so we are at the end of the
 	 * scan..
 	 */
+	if (BufferIsValid(node->tss_buffer))
+	{
+		ReleaseBuffer(node->tss_buffer);
+		node->tss_buffer = InvalidBuffer;
+	}
+
 	return ExecClearTuple(slot);
 }
 
@@ -422,6 +427,15 @@ void
 ExecEndTidScan(TidScanState *node)
 {
 	/*
+	 * Release the buffer
+	 */
+	if (BufferIsValid(node->tss_buffer))
+	{
+		ReleaseBuffer(node->tss_buffer);
+		node->tss_buffer = InvalidBuffer;
+	}
+
+	/*
 	 * Free the exprcontext
 	 */
 	ExecFreeExprContext(&node->ss.ps);
@@ -505,6 +519,7 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 
 	tidstate->ss.ss_currentRelation = currentRelation;
 	tidstate->ss.ss_currentScanDesc = NULL;		/* no heap scan here */
+	tidstate->tss_buffer = InvalidBuffer;
 
 	/*
 	 * get the scan type from the relation descriptor.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 56943f2..dad862b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5138,7 +5138,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 										 indexscandir)) != NULL)
 				{
 					/* Extract the index column values from the heap tuple */
-					ExecStoreTuple(tup, slot, InvalidBuffer, false);
+					ExecStoreTuple(tup, slot, false);
 					FormIndexDatum(indexInfo, slot, estate,
 								   values, isnull);
 
@@ -5170,7 +5170,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 										 -indexscandir)) != NULL)
 				{
 					/* Extract the index column values from the heap tuple */
-					ExecStoreTuple(tup, slot, InvalidBuffer, false);
+					ExecStoreTuple(tup, slot, false);
 					FormIndexDatum(indexInfo, slot, estate,
 								   values, isnull);
 
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index c8fbcf8..3902068 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -4176,11 +4176,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 
 		ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
 
-		ExecStoreTuple(ltup, ecxt_scantuple, InvalidBuffer, false);
+		ExecStoreTuple(ltup, ecxt_scantuple, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   l_index_values, l_index_isnull);
 
-		ExecStoreTuple(rtup, ecxt_scantuple, InvalidBuffer, false);
+		ExecStoreTuple(rtup, ecxt_scantuple, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   r_index_values, r_index_isnull);
 
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 5ac0b6a..7e9ad7b 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -82,12 +82,6 @@
  * When tts_shouldFree is true, the physical tuple is "owned" by the slot
  * and should be freed when the slot's reference to the tuple is dropped.
  *
- * If tts_buffer is not InvalidBuffer, then the slot is holding a pin
- * on the indicated buffer page; drop the pin when we release the
- * slot's reference to that buffer.  (tts_shouldFree should always be
- * false in such a case, since presumably tts_tuple is pointing at the
- * buffer page.)
- *
  * tts_nvalid indicates the number of valid columns in the tts_values/isnull
  * arrays.  When the slot is holding a "virtual" tuple this must be equal
  * to the descriptor's natts.  When the slot is holding a physical tuple
@@ -120,7 +114,6 @@ typedef struct TupleTableSlot
 	HeapTuple	tts_tuple;		/* physical tuple, or NULL if virtual */
 	TupleDesc	tts_tupleDescriptor;	/* slot's tuple descriptor */
 	MemoryContext tts_mcxt;		/* slot itself is in this context */
-	Buffer		tts_buffer;		/* tuple's buffer, or InvalidBuffer */
 	int			tts_nvalid;		/* # of valid values in tts_values */
 	Datum	   *tts_values;		/* current per-attribute values */
 	bool	   *tts_isnull;		/* current per-attribute isnull flags */
@@ -147,7 +140,6 @@ extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
 extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
 extern TupleTableSlot *ExecStoreTuple(HeapTuple tuple,
 			   TupleTableSlot *slot,
-			   Buffer buffer,
 			   bool shouldFree);
 extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup,
 					  TupleTableSlot *slot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e7fd7bd..cb9146e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1457,6 +1457,8 @@ typedef struct BitmapHeapScanState
  *		NumTids		   number of tids in this scan
  *		TidPtr		   index of currently fetched tid
  *		TidList		   evaluated item pointers (array of size NumTids)
+ *		htup		   currently fetched tuple
+ *		buffer		   tuple's buffer, or InvalidBuffer
  * ----------------
  */
 typedef struct TidScanState
@@ -1468,6 +1470,7 @@ typedef struct TidScanState
 	int			tss_TidPtr;
 	ItemPointerData *tss_TidList;
 	HeapTupleData tss_htup;
+	Buffer		tss_buffer;
 } TidScanState;
 
 /* ----------------
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: Pinning a buffer in TupleTableSlot is unnecessary

Heikki Linnakangas <hlinnaka@iki.fi> writes:

While profiling some queries and looking at executor overhead, I
realized that we're not making much use of TupleTableSlot's ability to
hold a buffer pin. In a SeqScan, the buffer is held pinned by the
underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on. You've given no evidence at all
that that scenario doesn't arise (or that we wouldn't need it in future).

At the very least I'd want to see this patch clearing the slot before
advancing the scan, so that it doesn't have actual dangling pointers
laying about.

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a
ExecStoreTuple() and ExecClearTuple(), which get called a lot. I
couldn't measure any actual difference from that, though, but it seems
like a good idea from a readability point of view, anyway.

If you can't measure a clear performance improvement, I'm -1 on the
whole thing. You've got risk and breakage of third-party code, and
what to show for it?

I'll start a different thread on that after
some more experimentation, but if we e.g. get rid of
ExecMaterializeSlot() in its current form, would that be OK?

INSERT/UPDATE, for one, relies on that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#2)
1 attachment(s)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 08/30/2016 02:38 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

While profiling some queries and looking at executor overhead, I
realized that we're not making much use of TupleTableSlot's ability to
hold a buffer pin. In a SeqScan, the buffer is held pinned by the
underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on. You've given no evidence at all
that that scenario doesn't arise (or that we wouldn't need it in future).

For a SeqScan, the buffer is pinned, and the tuple stays valid, because
we have an open HeapScan on it. It will stay valid until the next
heap_getnext() call.

At the very least I'd want to see this patch clearing the slot before
advancing the scan, so that it doesn't have actual dangling pointers
laying about.

Fair enough, although we're not 100% careful about that currently
either. For examples, see gather_getnext, CopyFrom, and others.
Personally I think that's OK, as long as the window between invalidating
the underlying buffer (or other source of the tuple), and storing a new
tuple in it, is small enough. In particular, I think this is OK:

tuple = heap_getnext(); /* this leaves a dangling pointer in slot */
slot = ExecStoreTuple(tuple); /* and this makes it valid again */

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a
ExecStoreTuple() and ExecClearTuple(), which get called a lot. I
couldn't measure any actual difference from that, though, but it seems
like a good idea from a readability point of view, anyway.

If you can't measure a clear performance improvement, I'm -1 on the
whole thing. You've got risk and breakage of third-party code, and
what to show for it?

Well, I think simplicity is a virtue all by itself. But ok, let me try a
bit harder to benchmark this:

I created a test table with:

create table foo as select repeat('x', 500) || g from generate_series(1,
1000000) g;
vacuum freeze;

And then ran:

$ cat count.sql
select count(*) from foo;
$ pgbench -n -f count.sql postgres -t1000

This is pretty much a best case for this patch. A "count(*)" doesn't do
much else than put the tuple in the slot, and the tuples are wide
enough, that you switch from one buffer to another every 15 tuples,
which exercises the buffer pinning code.

I ran that pgbench command three times with and without the patch, and
picked the best times:

Without patch:
tps = 12.413360 (excluding connections establishing)

With the patch:
tps = 13.183164 (excluding connections establishing)

That's about 6% improvement.

This was with the attached version of this patch. Compared to the
previous, I added ExecClearTuple() calls to clear the slots before
advancing the heap/index scan, to avoid the dangling pointers. Doing
that added an extra function call to this hot codepath, however, so I
compensated for that by adding an inline version of ExecStoreTuple(),
for those callers that know that the slot is already empty.

BTW, turning ExecStoreVirtualTuple into a static inline function would
also be an easy way to shave some cycles. But I refrained from including
that change into this patch.

I'll start a different thread on that after
some more experimentation, but if we e.g. get rid of
ExecMaterializeSlot() in its current form, would that be OK?

INSERT/UPDATE, for one, relies on that.

Sure. I'm thinking of refactoring that so that it doesn't. Or maybe add
a new kind of a TupleTableSlot that cannot be materialized, which makes
ExecStore/ClearTuple for that slot simpler, and use that in SeqScan and
friends. INSERT/UPDATE could continue to use ExecMaterializeSlot(), but
it would need to have a slot of its own, instead of materializing the
slot it got from plan tree. Yes, this is still very hand-wavey...

- Heikki

Attachments:

remove-buffer-from-slot-2.patchapplication/x-patch; name=remove-buffer-from-slot-2.patchDownload
commit 6f047ac81172c7ea559c9992bfbc1e05a4c3bedd
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Tue Aug 30 20:11:25 2016 +0300

    Remove support for holding a buffer pin in a TupleTableSlot.

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index daf0438..299fa62 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1387,7 +1387,6 @@ postgresIterateForeignScan(ForeignScanState *node)
 	 */
 	ExecStoreTuple(fsstate->tuples[fsstate->next_tuple++],
 				   slot,
-				   InvalidBuffer,
 				   false);
 
 	return slot;
@@ -3152,7 +3151,7 @@ store_returning_result(PgFdwModifyState *fmstate,
 											NULL,
 											fmstate->temp_cxt);
 		/* tuple will be deleted when it is cleared from the slot */
-		ExecStoreTuple(newtup, slot, InvalidBuffer, true);
+		ExecStoreTuple(newtup, slot, true);
 	}
 	PG_CATCH();
 	{
@@ -3258,7 +3257,7 @@ get_returning_data(ForeignScanState *node)
 												dmstate->retrieved_attrs,
 												NULL,
 												dmstate->temp_cxt);
-			ExecStoreTuple(newtup, slot, InvalidBuffer, false);
+			ExecStoreTuple(newtup, slot, false);
 		}
 		PG_CATCH();
 		{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b0b43cf..91895fd 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2531,7 +2531,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(heapTuple, slot, false);
 
 		/*
 		 * In a partial index, discard tuples that don't satisfy the
@@ -2679,7 +2679,7 @@ IndexCheckExclusion(Relation heapRelation,
 		MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 		/* Set up for predicate or expression evaluation */
-		ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(heapTuple, slot, false);
 
 		/*
 		 * In a partial index, ignore tuples that don't satisfy the predicate.
@@ -3100,7 +3100,7 @@ validate_index_heapscan(Relation heapRelation,
 			MemoryContextReset(econtext->ecxt_per_tuple_memory);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+			ExecStoreTuple(heapTuple, slot, false);
 
 			/*
 			 * In a partial index, discard tuples that don't satisfy the
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index b9fe102..faad4b9 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -96,7 +96,7 @@ CatalogIndexInsert(CatalogIndexState indstate, HeapTuple heapTuple)
 
 	/* Need a slot to hold the tuple being examined */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));
-	ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+	ExecStoreTuple(heapTuple, slot, false);
 
 	/*
 	 * for each index, form and insert the index tuple
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c617abb..6137dda 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -743,7 +743,7 @@ compute_index_stats(Relation onerel, double totalrows,
 			ResetExprContext(econtext);
 
 			/* Set up for predicate or expression evaluation */
-			ExecStoreTuple(heapTuple, slot, InvalidBuffer, false);
+			ExecStoreTuple(heapTuple, slot, false);
 
 			/* If index is partial, check predicate */
 			if (predicate != NIL)
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index 26f9114..baf204c 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -124,7 +124,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	slot = MakeSingleTupleTableSlot(RelationGetDescr(trigdata->tg_relation));
 
-	ExecStoreTuple(new_row, slot, InvalidBuffer, false);
+	ExecStoreTuple(new_row, slot, false);
 
 	/*
 	 * Typically the index won't have expressions, but if it does we need an
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5947e72..a2ab4b2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2435,7 +2435,7 @@ CopyFrom(CopyState cstate)
 
 		/* Place tuple in tuple slot --- but slot shouldn't free it */
 		slot = myslot;
-		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(tuple, slot, false);
 
 		skip_tuple = false;
 
@@ -2603,7 +2603,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			List	   *recheckIndexes;
 
 			cstate->cur_lineno = firstBufferedLineNo + i;
-			ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
+			ExecStoreTuple(bufferedTuples[i], myslot, false);
 			recheckIndexes =
 				ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
 									  estate, false, NULL, NIL);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 86e9814..fdbd86e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4142,7 +4142,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 				 * Process supplied expressions to replace selected columns.
 				 * Expression inputs come from the old tuple.
 				 */
-				ExecStoreTuple(tuple, oldslot, InvalidBuffer, false);
+				ExecStoreTuple(tuple, oldslot, false);
 				econtext->ecxt_scantuple = oldslot;
 
 				foreach(l, tab->newvals)
@@ -4173,7 +4173,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 			}
 
 			/* Now check any constraints on the possibly-changed tuple */
-			ExecStoreTuple(tuple, newslot, InvalidBuffer, false);
+			ExecStoreTuple(tuple, newslot, false);
 			econtext->ecxt_scantuple = newslot;
 
 			foreach(l, notnull_attrs)
@@ -7358,7 +7358,7 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 
 	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
-		ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+		ExecStoreTuple(tuple, slot, false);
 
 		if (!ExecQual(exprstate, econtext, true))
 			ereport(ERROR,
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 9de22a1..4ac7b8f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2058,7 +2058,7 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2133,7 +2133,7 @@ ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2513,7 +2513,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2609,7 +2609,7 @@ ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 
 		if (newslot->tts_tupleDescriptor != tupdesc)
 			ExecSetSlotDescriptor(newslot, tupdesc);
-		ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
+		ExecStoreTuple(newtuple, newslot, false);
 		slot = newslot;
 	}
 	return slot;
@@ -2925,7 +2925,7 @@ TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
 			oldslot = estate->es_trig_oldtup_slot;
 			if (oldslot->tts_tupleDescriptor != tupdesc)
 				ExecSetSlotDescriptor(oldslot, tupdesc);
-			ExecStoreTuple(oldtup, oldslot, InvalidBuffer, false);
+			ExecStoreTuple(oldtup, oldslot, false);
 		}
 		if (HeapTupleIsValid(newtup))
 		{
@@ -2938,7 +2938,7 @@ TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
 			newslot = estate->es_trig_newtup_slot;
 			if (newslot->tts_tupleDescriptor != tupdesc)
 				ExecSetSlotDescriptor(newslot, tupdesc);
-			ExecStoreTuple(newtup, newslot, InvalidBuffer, false);
+			ExecStoreTuple(newtup, newslot, false);
 		}
 
 		/*
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 0e2d834..bcd9894 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -753,7 +753,7 @@ retry:
 		 * Extract the index column values and isnull flags from the existing
 		 * tuple.
 		 */
-		ExecStoreTuple(tup, existing_slot, InvalidBuffer, false);
+		ExecStoreTuple(tup, existing_slot, false);
 		FormIndexDatum(indexInfo, existing_slot, estate,
 					   existing_values, existing_isnull);
 
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index fb0013d..36fb73b 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -79,7 +79,7 @@ ExecScanFetch(ScanState *node,
 
 			/* Store test tuple in the plan node's scan slot */
 			ExecStoreTuple(estate->es_epqTuple[scanrelid - 1],
-						   slot, InvalidBuffer, false);
+						   slot, false);
 
 			/* Check if it meets the access-method conditions */
 			if (!(*recheckMtd) (node, slot))
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 533050d..2c3ba7e 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -2,10 +2,9 @@
  *
  * execTuples.c
  *	  Routines dealing with TupleTableSlots.  These are used for resource
- *	  management associated with tuples (eg, releasing buffer pins for
- *	  tuples in disk buffers, or freeing the memory occupied by transient
- *	  tuples).  Slots also provide access abstraction that lets us implement
- *	  "virtual" tuples to reduce data-copying overhead.
+ *	  management associated with tuples (eg, freeing the memory occupied by
+ *	  transient tuples).  Slots also provide access abstraction that lets us
+ *	  implement "virtual" tuples to reduce data-copying overhead.
  *
  *	  Routines dealing with the type information for tuples. Currently,
  *	  the type information for a tuple is an array of FormData_pg_attribute.
@@ -75,9 +74,9 @@
  *		The important thing to watch in the executor code is how pointers
  *		to the slots containing tuples are passed instead of the tuples
  *		themselves.  This facilitates the communication of related information
- *		(such as whether or not a tuple should be pfreed, what buffer contains
- *		this tuple, the tuple's tuple descriptor, etc).  It also allows us
- *		to avoid physically constructing projection tuples in many cases.
+ *		(such as whether or not a tuple should be pfreed, the tuple's tuple
+ *		descriptor, etc).  It also allows us to avoid physically constructing
+ *		projection tuples in many cases.
  */
 #include "postgres.h"
 
@@ -118,7 +117,6 @@ MakeTupleTableSlot(void)
 	slot->tts_tuple = NULL;
 	slot->tts_tupleDescriptor = NULL;
 	slot->tts_mcxt = CurrentMemoryContext;
-	slot->tts_buffer = InvalidBuffer;
 	slot->tts_nvalid = 0;
 	slot->tts_values = NULL;
 	slot->tts_isnull = NULL;
@@ -146,7 +144,7 @@ ExecAllocTableSlot(List **tupleTable)
 /* --------------------------------
  *		ExecResetTupleTable
  *
- *		This releases any resources (buffer pins, tupdesc refcounts)
+ *		This releases any resources (tupdesc refcounts)
  *		held by the tuple table, and optionally releases the memory
  *		occupied by the tuple table data structure.
  *		It is expected that this routine be called by EndPlan().
@@ -289,14 +287,9 @@ ExecSetSlotDescriptor(TupleTableSlot *slot,		/* slot to change */
  *
  *		tuple:	tuple to store
  *		slot:	slot to store it in
- *		buffer: disk buffer if tuple is in a disk page, else InvalidBuffer
  *		shouldFree: true if ExecClearTuple should pfree() the tuple
  *					when done with it
  *
- * If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin
- * on the buffer which is held until the slot is cleared, so that the tuple
- * won't go away on us.
- *
  * shouldFree is normally set 'true' for tuples constructed on-the-fly.
  * It must always be 'false' for tuples that are stored in disk pages,
  * since we don't want to try to pfree those.
@@ -313,16 +306,14 @@ ExecSetSlotDescriptor(TupleTableSlot *slot,		/* slot to change */
  * Return value is just the passed-in slot pointer.
  *
  * NOTE: before PostgreSQL 8.1, this function would accept a NULL tuple
- * pointer and effectively behave like ExecClearTuple (though you could
- * still specify a buffer to pin, which would be an odd combination).
- * This saved a couple lines of code in a few places, but seemed more likely
- * to mask logic errors than to be really useful, so it's now disallowed.
+ * pointer and effectively behave like ExecClearTuple. This saved a couple
+ * lines of code in a few places, but seemed more likely to mask logic
+ * errors than to be really useful, so it's now disallowed.
  * --------------------------------
  */
 TupleTableSlot *
 ExecStoreTuple(HeapTuple tuple,
 			   TupleTableSlot *slot,
-			   Buffer buffer,
 			   bool shouldFree)
 {
 	/*
@@ -331,8 +322,6 @@ ExecStoreTuple(HeapTuple tuple,
 	Assert(tuple != NULL);
 	Assert(slot != NULL);
 	Assert(slot->tts_tupleDescriptor != NULL);
-	/* passing shouldFree=true for a tuple on a disk page is not sane */
-	Assert(BufferIsValid(buffer) ? (!shouldFree) : true);
 
 	/*
 	 * Free any old physical tuple belonging to the slot.
@@ -354,24 +343,6 @@ ExecStoreTuple(HeapTuple tuple,
 	/* Mark extracted state invalid */
 	slot->tts_nvalid = 0;
 
-	/*
-	 * If tuple is on a disk page, keep the page pinned as long as we hold a
-	 * pointer into it.  We assume the caller already has such a pin.
-	 *
-	 * This is coded to optimize the case where the slot previously held a
-	 * tuple on the same disk page: in that case releasing and re-acquiring
-	 * the pin is a waste of cycles.  This is a common situation during
-	 * seqscans, so it's worth troubling over.
-	 */
-	if (slot->tts_buffer != buffer)
-	{
-		if (BufferIsValid(slot->tts_buffer))
-			ReleaseBuffer(slot->tts_buffer);
-		slot->tts_buffer = buffer;
-		if (BufferIsValid(buffer))
-			IncrBufferRefCount(buffer);
-	}
-
 	return slot;
 }
 
@@ -379,8 +350,6 @@ ExecStoreTuple(HeapTuple tuple,
  *		ExecStoreMinimalTuple
  *
  *		Like ExecStoreTuple, but insert a "minimal" tuple into the slot.
- *
- * No 'buffer' parameter since minimal tuples are never stored in relations.
  * --------------------------------
  */
 TupleTableSlot *
@@ -404,14 +373,6 @@ ExecStoreMinimalTuple(MinimalTuple mtup,
 		heap_free_minimal_tuple(slot->tts_mintuple);
 
 	/*
-	 * Drop the pin on the referenced buffer, if there is one.
-	 */
-	if (BufferIsValid(slot->tts_buffer))
-		ReleaseBuffer(slot->tts_buffer);
-
-	slot->tts_buffer = InvalidBuffer;
-
-	/*
 	 * Store the new tuple into the specified slot.
 	 */
 	slot->tts_isempty = false;
@@ -460,14 +421,6 @@ ExecClearTuple(TupleTableSlot *slot)	/* slot in which to store tuple */
 	slot->tts_shouldFreeMin = false;
 
 	/*
-	 * Drop the pin on the referenced buffer, if there is one.
-	 */
-	if (BufferIsValid(slot->tts_buffer))
-		ReleaseBuffer(slot->tts_buffer);
-
-	slot->tts_buffer = InvalidBuffer;
-
-	/*
 	 * Mark it empty.
 	 */
 	slot->tts_isempty = true;
@@ -755,14 +708,6 @@ ExecMaterializeSlot(TupleTableSlot *slot)
 	MemoryContextSwitchTo(oldContext);
 
 	/*
-	 * Drop the pin on the referenced buffer, if there is one.
-	 */
-	if (BufferIsValid(slot->tts_buffer))
-		ReleaseBuffer(slot->tts_buffer);
-
-	slot->tts_buffer = InvalidBuffer;
-
-	/*
 	 * Mark extracted state invalid.  This is important because the slot is
 	 * not supposed to depend any more on the previous external data; we
 	 * mustn't leave any dangling pass-by-reference datums in tts_values.
@@ -809,7 +754,7 @@ ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
 	newTuple = ExecCopySlotTuple(srcslot);
 	MemoryContextSwitchTo(oldContext);
 
-	return ExecStoreTuple(newTuple, dstslot, InvalidBuffer, true);
+	return ExecStoreTuple(newTuple, dstslot, true);
 }
 
 
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index ce2fc28..8f9b4a0 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2077,7 +2077,6 @@ agg_retrieve_direct(AggState *aggstate)
 				 */
 				ExecStoreTuple(aggstate->grp_firstTuple,
 							   firstSlot,
-							   InvalidBuffer,
 							   true);
 				aggstate->grp_firstTuple = NULL;		/* don't keep two
 														 * pointers */
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 449aacb..55b5756 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -166,6 +166,14 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			}
 
 			/*
+			 * Advancing to the next heap page will invalidate any previous
+			 * reference in the slot. It won't be accessed by anyone between
+			 * here and when we store the next tuple in it, but let's clear
+			 * it just to be sure.
+			 */
+			ExecClearTuple(slot);
+
+			/*
 			 * Fetch the current heap page and identify candidate tuples.
 			 */
 			bitgetpage(scan, tbmres);
@@ -274,7 +282,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 */
 		ExecStoreTuple(&scan->rs_ctup,
 					   slot,
-					   scan->rs_cbuf,
 					   false);
 
 		/*
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 438d1b2..832409c 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -295,8 +295,6 @@ gather_getnext(GatherState *gatherstate)
 			{
 				ExecStoreTuple(tup,		/* tuple to store */
 							   fslot,	/* slot in which to store the tuple */
-							   InvalidBuffer,	/* buffer associated with this
-												 * tuple */
 							   false);	/* slot should not pfree tuple */
 				return fslot;
 			}
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3143bd9..3285cb1 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,17 +102,26 @@ IndexNext(IndexScanState *node)
 	/*
 	 * ok, now that we have what we need, fetch the next tuple.
 	 */
-	while ((tuple = index_getnext(scandesc, direction)) != NULL)
+	for (;;)
 	{
 		/*
+		 * Clear the slot before we advance the scan, as that invalidates the
+		 * previous tuple reference in the slot.
+		 */
+		ExecClearTuple(slot);
+
+		tuple = index_getnext(scandesc, direction);
+		if (tuple == NULL)
+			break;
+
+		/*
 		 * Store the scanned tuple in the scan tuple slot of the scan state.
 		 * Note: we pass 'false' because tuples returned by amgetnext are
 		 * pointers onto disk pages and must not be pfree()'d.
 		 */
-		ExecStoreTuple(tuple,	/* tuple to store */
-					   slot,	/* slot to store in */
-					   scandesc->xs_cbuf,		/* buffer containing tuple */
-					   false);	/* don't pfree */
+		ExecStoreTupleInEmptySlot(tuple,	/* tuple to store */
+								  slot,		/* slot to store in */
+								  false);	/* don't pfree */
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals using
@@ -198,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 				tuple = reorderqueue_pop(node);
 
 				/* Pass 'true', as the tuple in the queue is a palloc'd copy */
-				ExecStoreTuple(tuple, slot, InvalidBuffer, true);
+				ExecStoreTuple(tuple, slot, true);
 				return slot;
 			}
 		}
@@ -212,6 +221,8 @@ IndexNextWithReorder(IndexScanState *node)
 		 * Fetch next tuple from the index.
 		 */
 next_indextuple:
+		ExecClearTuple(slot);
+
 		tuple = index_getnext(scandesc, ForwardScanDirection);
 		if (!tuple)
 		{
@@ -228,10 +239,9 @@ next_indextuple:
 		 * Note: we pass 'false' because tuples returned by amgetnext are
 		 * pointers onto disk pages and must not be pfree()'d.
 		 */
-		ExecStoreTuple(tuple,	/* tuple to store */
-					   slot,	/* slot to store in */
-					   scandesc->xs_cbuf,		/* buffer containing tuple */
-					   false);	/* don't pfree */
+		ExecStoreTupleInEmptySlot(tuple,	/* tuple to store */
+								  slot,	/* slot to store in */
+								  false);	/* don't pfree */
 
 		/*
 		 * If the index was lossy, we have to recheck the index quals and
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index af7b26c..c70288f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -751,7 +751,7 @@ ldelete:;
 
 			if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))
 				ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc));
-			ExecStoreTuple(&deltuple, slot, InvalidBuffer, false);
+			ExecStoreTuple(&deltuple, slot, false);
 		}
 
 		rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
@@ -1173,7 +1173,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 	ExecCheckHeapTupleVisible(estate, &tuple, buffer);
 
 	/* Store target's existing tuple in the state's dedicated slot */
-	ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
+	ExecStoreTuple(&tuple, mtstate->mt_existing, false);
 
 	/*
 	 * Make tuple and any needed join variables available to ExecQual and
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 9ce7c02..9a9c9d2 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -65,7 +65,6 @@ SampleNext(SampleScanState *node)
 	if (tuple)
 		ExecStoreTuple(tuple,	/* tuple to store */
 					   slot,	/* slot to store in */
-					   node->ss.ss_currentScanDesc->rs_cbuf,	/* tuple's buffer */
 					   false);	/* don't pfree this pointer */
 	else
 		ExecClearTuple(slot);
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 00bf3a5..e75776e 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -75,6 +75,12 @@ SeqNext(SeqScanState *node)
 	}
 
 	/*
+	 * Clear the slot before we advance the scan, as that invalidates the
+	 * previous tuple reference in the slot.
+	 */
+	ExecClearTuple(slot);
+
+	/*
 	 * get the next tuple from the table
 	 */
 	tuple = heap_getnext(scandesc, direction);
@@ -83,18 +89,16 @@ SeqNext(SeqScanState *node)
 	 * save the tuple and the buffer returned to us by the access methods in
 	 * our scan tuple slot and return the slot.  Note: we pass 'false' because
 	 * tuples returned by heap_getnext() are pointers onto disk pages and were
-	 * not created with palloc() and so should not be pfree()'d.  Note also
-	 * that ExecStoreTuple will increment the refcount of the buffer; the
-	 * refcount will not be dropped until the tuple table slot is cleared.
+	 * not created with palloc() and so should not be pfree()'d.
 	 */
 	if (tuple)
-		ExecStoreTuple(tuple,	/* tuple to store */
-					   slot,	/* slot to store in */
-					   scandesc->rs_cbuf,		/* buffer associated with this
-												 * tuple */
-					   false);	/* don't pfree this pointer */
+		ExecStoreTupleInEmptySlot(tuple,	/* tuple to store */
+								  slot,	/* slot to store in */
+								  false);	/* don't pfree this pointer */
 	else
-		ExecClearTuple(slot);
+	{
+		/* leave the slot empty */
+	}
 
 	return slot;
 }
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 633580b..052ceb9 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -273,7 +273,6 @@ setop_retrieve_direct(SetOpState *setopstate)
 		 */
 		ExecStoreTuple(setopstate->grp_firstTuple,
 					   resultTupleSlot,
-					   InvalidBuffer,
 					   true);
 		setopstate->grp_firstTuple = NULL;		/* don't keep two pointers */
 
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 2604103..de9b9e4 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -256,7 +256,6 @@ TidNext(TidScanState *node)
 	Relation	heapRelation;
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
-	Buffer		buffer = InvalidBuffer;
 	ItemPointerData *tidList;
 	int			numTids;
 	bool		bBackward;
@@ -318,7 +317,15 @@ TidNext(TidScanState *node)
 		if (node->tss_isCurrentOf)
 			heap_get_latest_tid(heapRelation, snapshot, &tuple->t_self);
 
-		if (heap_fetch(heapRelation, snapshot, tuple, &buffer, false, NULL))
+		/* Clear the slot and release old buffer, if any */
+		ExecClearTuple(slot);
+		if (BufferIsValid(node->tss_buffer))
+		{
+			ReleaseBuffer(node->tss_buffer);
+			node->tss_buffer = InvalidBuffer;
+		}
+
+		if (heap_fetch(heapRelation, snapshot, tuple, &node->tss_buffer, false, NULL))
 		{
 			/*
 			 * store the scanned tuple in the scan tuple slot of the scan
@@ -327,16 +334,9 @@ TidNext(TidScanState *node)
 			 * pointers onto disk pages and were not created with palloc() and
 			 * so should not be pfree()'d.
 			 */
-			ExecStoreTuple(tuple,		/* tuple to store */
-						   slot,	/* slot to store in */
-						   buffer,		/* buffer associated with tuple  */
-						   false);		/* don't pfree */
-
-			/*
-			 * At this point we have an extra pin on the buffer, because
-			 * ExecStoreTuple incremented the pin count. Drop our local pin.
-			 */
-			ReleaseBuffer(buffer);
+			ExecStoreTupleInEmptySlot(tuple,		/* tuple to store */
+									  slot,			/* slot to store in */
+									  false);		/* don't pfree */
 
 			return slot;
 		}
@@ -351,6 +351,12 @@ TidNext(TidScanState *node)
 	 * if we get here it means the tid scan failed so we are at the end of the
 	 * scan..
 	 */
+	if (BufferIsValid(node->tss_buffer))
+	{
+		ReleaseBuffer(node->tss_buffer);
+		node->tss_buffer = InvalidBuffer;
+	}
+
 	return ExecClearTuple(slot);
 }
 
@@ -422,6 +428,15 @@ void
 ExecEndTidScan(TidScanState *node)
 {
 	/*
+	 * Release the buffer
+	 */
+	if (BufferIsValid(node->tss_buffer))
+	{
+		ReleaseBuffer(node->tss_buffer);
+		node->tss_buffer = InvalidBuffer;
+	}
+
+	/*
 	 * Free the exprcontext
 	 */
 	ExecFreeExprContext(&node->ss.ps);
@@ -505,6 +520,7 @@ ExecInitTidScan(TidScan *node, EState *estate, int eflags)
 
 	tidstate->ss.ss_currentRelation = currentRelation;
 	tidstate->ss.ss_currentScanDesc = NULL;		/* no heap scan here */
+	tidstate->tss_buffer = InvalidBuffer;
 
 	/*
 	 * get the scan type from the relation descriptor.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 56943f2..dad862b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5138,7 +5138,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 										 indexscandir)) != NULL)
 				{
 					/* Extract the index column values from the heap tuple */
-					ExecStoreTuple(tup, slot, InvalidBuffer, false);
+					ExecStoreTuple(tup, slot, false);
 					FormIndexDatum(indexInfo, slot, estate,
 								   values, isnull);
 
@@ -5170,7 +5170,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 										 -indexscandir)) != NULL)
 				{
 					/* Extract the index column values from the heap tuple */
-					ExecStoreTuple(tup, slot, InvalidBuffer, false);
+					ExecStoreTuple(tup, slot, false);
 					FormIndexDatum(indexInfo, slot, estate,
 								   values, isnull);
 
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index c8fbcf8..3902068 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -4176,11 +4176,11 @@ comparetup_cluster(const SortTuple *a, const SortTuple *b,
 
 		ecxt_scantuple = GetPerTupleExprContext(state->estate)->ecxt_scantuple;
 
-		ExecStoreTuple(ltup, ecxt_scantuple, InvalidBuffer, false);
+		ExecStoreTuple(ltup, ecxt_scantuple, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   l_index_values, l_index_isnull);
 
-		ExecStoreTuple(rtup, ecxt_scantuple, InvalidBuffer, false);
+		ExecStoreTuple(rtup, ecxt_scantuple, false);
 		FormIndexDatum(state->indexInfo, ecxt_scantuple, state->estate,
 					   r_index_values, r_index_isnull);
 
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 5ac0b6a..e6b29e2 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -21,21 +21,22 @@
 /*----------
  * The executor stores tuples in a "tuple table" which is a List of
  * independent TupleTableSlots.  There are several cases we need to handle:
- *		1. physical tuple in a disk buffer page
- *		2. physical tuple constructed in palloc'ed memory
- *		3. "minimal" physical tuple constructed in palloc'ed memory
- *		4. "virtual" tuple consisting of Datum/isnull arrays
+ *		1. physical heap or minimal tuple, in a disk buffer page
+ *		2. physical heap or minimal tuple, constructed in palloc'ed memory
+ *		3. "virtual" tuple consisting of Datum/isnull arrays
  *
  * The first two cases are similar in that they both deal with "materialized"
- * tuples, but resource management is different.  For a tuple in a disk page
- * we need to hold a pin on the buffer until the TupleTableSlot's reference
- * to the tuple is dropped; while for a palloc'd tuple we usually want the
- * tuple pfree'd when the TupleTableSlot's reference is dropped.
- *
- * A "minimal" tuple is handled similarly to a palloc'd regular tuple.
- * At present, minimal tuples never are stored in buffers, so there is no
- * parallel to case 1.  Note that a minimal tuple has no "system columns".
- * (Actually, it could have an OID, but we have no need to access the OID.)
+ * tuples, but resource management is different.  For a tuple in a disk page,
+ * the executor node that owns the slot needs to ensure that the tuple is valid
+ * for as long as the slot holds a reference to the tuple, by holding the
+ * buffer pinned.  Likewise, for a tuple that came from a tuplesort, you
+ * mustn't close the underlying tuplesort for as long as the slot holds a
+ * reference to it.  But for a palloc'd tuple, the slot itself will pfree the
+ * tuple, when the slot is cleared or another tuple is stored in it.
+ *
+ * A "minimal" tuple is handled similarly to a regular heap tuple. Note that
+ * a minimal tuple has no "system columns". (Actually, it could have an OID,
+ * but we have no need to access the OID.)
  *
  * A "virtual" tuple is an optimization used to minimize physical data
  * copying in a nest of plan nodes.  Any pass-by-reference Datums in the
@@ -68,8 +69,7 @@
  * A TupleTableSlot can also be "empty", holding no valid data.  This is
  * the only valid state for a freshly-created slot that has not yet had a
  * tuple descriptor assigned to it.  In this state, tts_isempty must be
- * TRUE, tts_shouldFree FALSE, tts_tuple NULL, tts_buffer InvalidBuffer,
- * and tts_nvalid zero.
+ * TRUE, tts_shouldFree FALSE, tts_tuple NULL, and tts_nvalid zero.
  *
  * The tupleDescriptor is simply referenced, not copied, by the TupleTableSlot
  * code.  The caller of ExecSetSlotDescriptor() is responsible for providing
@@ -82,12 +82,6 @@
  * When tts_shouldFree is true, the physical tuple is "owned" by the slot
  * and should be freed when the slot's reference to the tuple is dropped.
  *
- * If tts_buffer is not InvalidBuffer, then the slot is holding a pin
- * on the indicated buffer page; drop the pin when we release the
- * slot's reference to that buffer.  (tts_shouldFree should always be
- * false in such a case, since presumably tts_tuple is pointing at the
- * buffer page.)
- *
  * tts_nvalid indicates the number of valid columns in the tts_values/isnull
  * arrays.  When the slot is holding a "virtual" tuple this must be equal
  * to the descriptor's natts.  When the slot is holding a physical tuple
@@ -120,7 +114,6 @@ typedef struct TupleTableSlot
 	HeapTuple	tts_tuple;		/* physical tuple, or NULL if virtual */
 	TupleDesc	tts_tupleDescriptor;	/* slot's tuple descriptor */
 	MemoryContext tts_mcxt;		/* slot itself is in this context */
-	Buffer		tts_buffer;		/* tuple's buffer, or InvalidBuffer */
 	int			tts_nvalid;		/* # of valid values in tts_values */
 	Datum	   *tts_values;		/* current per-attribute values */
 	bool	   *tts_isnull;		/* current per-attribute isnull flags */
@@ -138,6 +131,31 @@ typedef struct TupleTableSlot
 #define TupIsNull(slot) \
 	((slot) == NULL || (slot)->tts_isempty)
 
+/*
+ * A faster, inlined, version of ExecStoreTuple, when you know that the slot is
+ * already empty.
+ */
+static inline void
+ExecStoreTupleInEmptySlot(HeapTuple tuple, TupleTableSlot *slot, bool shouldFree)
+{
+	/*
+	 * sanity checks
+	 */
+	Assert(tuple != NULL);
+	Assert(slot != NULL);
+	Assert(slot->tts_tupleDescriptor != NULL);
+	Assert(slot->tts_isempty);
+
+	/*
+	 * Store the new tuple into the specified slot. Since slot was
+	 * empty before, shouldFree, minTuple, nvalid are already false/0.
+	 */
+	slot->tts_isempty = false;
+	slot->tts_tuple = tuple;
+	if (shouldFree)
+		slot->tts_shouldFree = true;
+}
+
 /* in executor/execTuples.c */
 extern TupleTableSlot *MakeTupleTableSlot(void);
 extern TupleTableSlot *ExecAllocTableSlot(List **tupleTable);
@@ -147,7 +165,6 @@ extern void ExecDropSingleTupleTableSlot(TupleTableSlot *slot);
 extern void ExecSetSlotDescriptor(TupleTableSlot *slot, TupleDesc tupdesc);
 extern TupleTableSlot *ExecStoreTuple(HeapTuple tuple,
 			   TupleTableSlot *slot,
-			   Buffer buffer,
 			   bool shouldFree);
 extern TupleTableSlot *ExecStoreMinimalTuple(MinimalTuple mtup,
 					  TupleTableSlot *slot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e7fd7bd..cb9146e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1457,6 +1457,8 @@ typedef struct BitmapHeapScanState
  *		NumTids		   number of tids in this scan
  *		TidPtr		   index of currently fetched tid
  *		TidList		   evaluated item pointers (array of size NumTids)
+ *		htup		   currently fetched tuple
+ *		buffer		   tuple's buffer, or InvalidBuffer
  * ----------------
  */
 typedef struct TidScanState
@@ -1468,6 +1470,7 @@ typedef struct TidScanState
 	int			tss_TidPtr;
 	ItemPointerData *tss_TidList;
 	HeapTupleData tss_htup;
+	Buffer		tss_buffer;
 } TidScanState;
 
 /* ----------------
#4Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#1)
Re: Pinning a buffer in TupleTableSlot is unnecessary

Hi,

On 2016-08-30 13:12:41 +0300, Heikki Linnakangas wrote:

While profiling some queries and looking at executor overhead, I realized
that we're not making much use of TupleTableSlot's ability to hold a buffer
pin.

FWIW, I came to a similar conclusion, while working on passing around
making the executor batched.

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a ExecStoreTuple()
and ExecClearTuple(), which get called a lot. I couldn't measure any actual
difference from that, though, but it seems like a good idea from a
readability point of view, anyway.

I actually found that rather beneficial from a performance point of
view.

How much do we need to worry about breaking extensions? I.e. to what extent
do we consider the TupleTableSlot API to be stable, for extensions to use?
The FDW API uses TupleTableSlots - this patch had to fix the ExecStoreTuple
calls in postgres_fdw.

I think we're just going to have to deal with the fallout. Trying to
maintain backward compatibility with the TupleTableSlot API seems to
prevent some rather important improvement in the area.

We could refrain from changing the signature of ExecStoreTuple(), and throw
an error if you try to pass a valid buffer to it. But I also have some
bigger changes to TupleTableSlot in mind.

We should probably coordinate ;)

I think we could gain some speed
by making slots "read-only". For example, it would be nice if a SeqScan
could just replace the tts_tuple pointer in the slot, and not have to worry
that the upper nodes might have materialized the slot, and freeing the
copied tuple. Because that happens very rarely in practice. It would be
better if those few places where we currently materialize an existing slot,
we would create a copy of the slot, and leave the original slot unmodified.
I'll start a different thread on that after some more experimentation, but
if we e.g. get rid of ExecMaterializeSlot() in its current form, would that
be OK?

Hm.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Greg Stark
stark@mit.edu
In reply to: Heikki Linnakangas (#1)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Tue, Aug 30, 2016 at 11:12 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

While profiling some queries and looking at executor overhead, I realized
that we're not making much use of TupleTableSlot's ability to hold a buffer
pin. In a SeqScan, the buffer is held pinned by the underlying heap-scan
anyway. Same with an IndexScan, and the SampleScan. The only thing that
relies on that feature is TidScan, but we could easily teach TidScan to hold
the buffer pin directly.

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a ExecStoreTuple()
and ExecClearTuple(), which get called a lot. I couldn't measure any actual
difference from that, though, but it seems like a good idea from a
readability point of view, anyway.

Out of curiosity why go in this direction and not the other? Why not
move those other scans to use the TupleTableSlot API to manage the
pins. Offhand it sounds more readable not less to have the
TupleTableSlot be a self contained data structure that guarantees the
lifetime of the pins matches the slots rather than have a dependency
on the code structure in some far-away scan.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Greg Stark (#5)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 2016-08-30 21:59:44 +0100, Greg Stark wrote:

On Tue, Aug 30, 2016 at 11:12 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

While profiling some queries and looking at executor overhead, I realized
that we're not making much use of TupleTableSlot's ability to hold a buffer
pin. In a SeqScan, the buffer is held pinned by the underlying heap-scan
anyway. Same with an IndexScan, and the SampleScan. The only thing that
relies on that feature is TidScan, but we could easily teach TidScan to hold
the buffer pin directly.

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a ExecStoreTuple()
and ExecClearTuple(), which get called a lot. I couldn't measure any actual
difference from that, though, but it seems like a good idea from a
readability point of view, anyway.

Out of curiosity why go in this direction and not the other? Why not
move those other scans to use the TupleTableSlot API to manage the
pins. Offhand it sounds more readable not less to have the
TupleTableSlot be a self contained data structure that guarantees the
lifetime of the pins matches the slots rather than have a dependency
on the code structure in some far-away scan.

At least for heap scans the pins are page level, and thus longer lived
than the data in a slot. It's important that a scan holds a pin, because
it needs to rely on the page not being hot pruned etc..

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Michael Paquier
michael.paquier@gmail.com
In reply to: Andres Freund (#6)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Wed, Aug 31, 2016 at 6:03 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-08-30 21:59:44 +0100, Greg Stark wrote:

On Tue, Aug 30, 2016 at 11:12 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

While profiling some queries and looking at executor overhead, I realized
that we're not making much use of TupleTableSlot's ability to hold a buffer
pin. In a SeqScan, the buffer is held pinned by the underlying heap-scan
anyway. Same with an IndexScan, and the SampleScan. The only thing that
relies on that feature is TidScan, but we could easily teach TidScan to hold
the buffer pin directly.

So, how about we remove the ability of a TupleTableSlot to hold a buffer
pin, per the attached patch? It shaves a few cycles from a ExecStoreTuple()
and ExecClearTuple(), which get called a lot. I couldn't measure any actual
difference from that, though, but it seems like a good idea from a
readability point of view, anyway.

Out of curiosity why go in this direction and not the other? Why not
move those other scans to use the TupleTableSlot API to manage the
pins. Offhand it sounds more readable not less to have the
TupleTableSlot be a self contained data structure that guarantees the
lifetime of the pins matches the slots rather than have a dependency
on the code structure in some far-away scan.

At least for heap scans the pins are page level, and thus longer lived
than the data in a slot. It's important that a scan holds a pin, because
it needs to rely on the page not being hot pruned etc..

I have moved this patch to next CF. Heikki, are you still planning to
work on it?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 2016-08-30 07:38:10 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

While profiling some queries and looking at executor overhead, I
realized that we're not making much use of TupleTableSlot's ability to
hold a buffer pin. In a SeqScan, the buffer is held pinned by the
underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on.

FWIW, that's not safe to assume in upper layers *anyway*. If you want to
do that, the slot has to be materialized, and that'd make a local
copy. If you don't materialize tts_values/isnull can point into random
old memory (common e.g. for projections and virtual tuples in general).

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#8)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Sat, Nov 12, 2016 at 10:28 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-08-30 07:38:10 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

While profiling some queries and looking at executor overhead, I
realized that we're not making much use of TupleTableSlot's ability to
hold a buffer pin. In a SeqScan, the buffer is held pinned by the
underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on.

FWIW, that's not safe to assume in upper layers *anyway*. If you want to
do that, the slot has to be materialized, and that'd make a local
copy. If you don't materialize tts_values/isnull can point into random
old memory (common e.g. for projections and virtual tuples in general).

So, I think you are arguing in favor of proceeding with this patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: Pinning a buffer in TupleTableSlot is unnecessary

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Nov 12, 2016 at 10:28 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-08-30 07:38:10 -0400, Tom Lane wrote:

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on.

FWIW, that's not safe to assume in upper layers *anyway*. If you want to
do that, the slot has to be materialized, and that'd make a local
copy. If you don't materialize tts_values/isnull can point into random
old memory (common e.g. for projections and virtual tuples in general).

So, I think you are arguing in favor of proceeding with this patch?

I don't believe Andres' claim anyway. There are certainly cases where an
allegedly-valid slot could be pointing at garbage, but table scans aren't
one of them, precisely because of the pin held by the slot. It would take
a fairly wide-ranging code review to convince me that it's okay to lose
that mechanism.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#9)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 2016-11-14 10:09:02 -0500, Robert Haas wrote:

On Sat, Nov 12, 2016 at 10:28 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-08-30 07:38:10 -0400, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

While profiling some queries and looking at executor overhead, I
realized that we're not making much use of TupleTableSlot's ability to
hold a buffer pin. In a SeqScan, the buffer is held pinned by the
underlying heap-scan anyway. Same with an IndexScan, and the SampleScan.

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on.

FWIW, that's not safe to assume in upper layers *anyway*. If you want to
do that, the slot has to be materialized, and that'd make a local
copy. If you don't materialize tts_values/isnull can point into random
old memory (common e.g. for projections and virtual tuples in general).

So, I think you are arguing in favor of proceeding with this patch?

Not really, now. I don't buy the argument here against it. I do think
the overhead is quite noticeable. But I also think it has quite the
potential for subtle bugs. I think I'd feel better if we had some form
of instrumentation trapping buffer accesses without pins present. We've
previously discussed doing that with valgrind...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Nov 14, 2016 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Nov 12, 2016 at 10:28 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-08-30 07:38:10 -0400, Tom Lane wrote:

I think this is probably wrong, or at least very dangerous to remove.
The reason for the feature is that the slot may continue to point at
the tuple after the scan has moved on.

FWIW, that's not safe to assume in upper layers *anyway*. If you want to
do that, the slot has to be materialized, and that'd make a local
copy. If you don't materialize tts_values/isnull can point into random
old memory (common e.g. for projections and virtual tuples in general).

So, I think you are arguing in favor of proceeding with this patch?

I don't believe Andres' claim anyway. There are certainly cases where an
allegedly-valid slot could be pointing at garbage, but table scans aren't
one of them, precisely because of the pin held by the slot. It would take
a fairly wide-ranging code review to convince me that it's okay to lose
that mechanism.

I don't understand your objection. It seems to me that the
TupleTableSlot is holding a pin, and the scan is also holding a pin,
so one of them is redundant. You speculated that the slot could
continue to point at the tuple after the scan has moved on, but how
could such a thing actually happen? Once the scan finds a tuple, it's
going to store the tuple in the slot and return. It won't get control
back to advance the scan until the next time it's asked for a tuple,
and then it has to update the slot before returning. So I don't see
the problem. If in the future somebody wants to write an executor
node that does random extra work - like advancing the scan - before
returning the tuple the scan already found, they'll have to take
special precautions, but why should anybody want to do that?

I'm kind of puzzled, here. Perhaps I am missing something obvious.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: Pinning a buffer in TupleTableSlot is unnecessary

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 14, 2016 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't believe Andres' claim anyway. There are certainly cases where an
allegedly-valid slot could be pointing at garbage, but table scans aren't
one of them, precisely because of the pin held by the slot. It would take
a fairly wide-ranging code review to convince me that it's okay to lose
that mechanism.

I don't understand your objection. It seems to me that the
TupleTableSlot is holding a pin, and the scan is also holding a pin,
so one of them is redundant. You speculated that the slot could
continue to point at the tuple after the scan has moved on, but how
could such a thing actually happen?

You're implicitly assuming that a scan always returns its results in the
same slot, and that no other slot could contain a copy of that data, but
there is no guarantee of either. See bug #14344 and d8589946d for a
pretty recent example where that failed to be true --- admittedly, for
a tuplesort scan not a table scan.

We could certainly set up some global convention that would make this
universally true, but I think we'd have to do a lot of code-reading
to ensure that everything is following that convention.

Also, there might well be places that are relying on the ability of a
slot to hold a pin for slots that are not simply the return slot of
a plan node. We could perhaps remove the *use* of this slot feature in
the normal table-scan case, without removing the feature itself.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#13)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Nov 14, 2016 at 11:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 14, 2016 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't believe Andres' claim anyway. There are certainly cases where an
allegedly-valid slot could be pointing at garbage, but table scans aren't
one of them, precisely because of the pin held by the slot. It would take
a fairly wide-ranging code review to convince me that it's okay to lose
that mechanism.

I don't understand your objection. It seems to me that the
TupleTableSlot is holding a pin, and the scan is also holding a pin,
so one of them is redundant. You speculated that the slot could
continue to point at the tuple after the scan has moved on, but how
could such a thing actually happen?

You're implicitly assuming that a scan always returns its results in the
same slot, and that no other slot could contain a copy of that data, but
there is no guarantee of either. See bug #14344 and d8589946d for a
pretty recent example where that failed to be true --- admittedly, for
a tuplesort scan not a table scan.

We could certainly set up some global convention that would make this
universally true, but I think we'd have to do a lot of code-reading
to ensure that everything is following that convention.

Also, there might well be places that are relying on the ability of a
slot to hold a pin for slots that are not simply the return slot of
a plan node. We could perhaps remove the *use* of this slot feature in
the normal table-scan case, without removing the feature itself.

I'm still not getting it. We don't have to guess who is using this
feature; we can find it out by looking at every place that calls
ExecStoreTuple and looking at whether they are passing InvalidBuffer.
As Heikki's original patch upthread demonstrates, most of them are.
The exceptions are in BitmapHeapNext, IndexNext, ExecOnConflictUpdate,
SampleNext, SeqNext, and TidNext. If all of those places are or can
be made to hold a buffer pin for as long as the slot would have held
the same pin, we're done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Tom Lane (#13)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 11/14/2016 06:18 PM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 14, 2016 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't believe Andres' claim anyway. There are certainly cases where an
allegedly-valid slot could be pointing at garbage, but table scans aren't
one of them, precisely because of the pin held by the slot. It would take
a fairly wide-ranging code review to convince me that it's okay to lose
that mechanism.

I don't understand your objection. It seems to me that the
TupleTableSlot is holding a pin, and the scan is also holding a pin,
so one of them is redundant. You speculated that the slot could
continue to point at the tuple after the scan has moved on, but how
could such a thing actually happen?

You're implicitly assuming that a scan always returns its results in the
same slot, and that no other slot could contain a copy of that data, but
there is no guarantee of either. See bug #14344 and d8589946d for a
pretty recent example where that failed to be true --- admittedly, for
a tuplesort scan not a table scan.

It's the other way round. ExecProcNode might not always return its
result in the same slot, but all the callers must assume that it might.

The tuplesort interface is slightly different: the caller passes a slot
to tuplesort_gettupleslot() as argument, and tuplesort_gettupleslot()
places the next tuple in that slot. With executor nodes, a node returns
a slot that it allocated itself, or it got from a child node. After you
call ExecProcNode(), you can *not* assume that the old tuple in the slot
is still valid. The child node can, and in most cases will, reuse the
same slot, overwriting its contents.

I think that difference in the API is exactly what caught Peter by
surprise and led to bug #14344. And I didn't see it either, until you
two debugged it.

Also, there might well be places that are relying on the ability of a
slot to hold a pin for slots that are not simply the return slot of
a plan node. We could perhaps remove the *use* of this slot feature in
the normal table-scan case, without removing the feature itself.

I didn't see any such use.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#15)
Re: Pinning a buffer in TupleTableSlot is unnecessary

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 11/14/2016 06:18 PM, Tom Lane wrote:

You're implicitly assuming that a scan always returns its results in the
same slot, and that no other slot could contain a copy of that data, but
there is no guarantee of either. See bug #14344 and d8589946d for a
pretty recent example where that failed to be true --- admittedly, for
a tuplesort scan not a table scan.

It's the other way round. ExecProcNode might not always return its
result in the same slot, but all the callers must assume that it might.

Basically my concern is that this restriction isn't documented anywhere
and I'm not entirely certain it's been adhered to everywhere. I'd feel
much better about it if there were some way we could verify that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Peter Geoghegan
pg@heroku.com
In reply to: Heikki Linnakangas (#15)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Nov 14, 2016 at 9:22 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I think that difference in the API is exactly what caught Peter by surprise
and led to bug #14344. And I didn't see it either, until you two debugged
it.

That is accurate, of course.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 2016-11-14 12:32:53 -0500, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 11/14/2016 06:18 PM, Tom Lane wrote:

You're implicitly assuming that a scan always returns its results in the
same slot, and that no other slot could contain a copy of that data, but
there is no guarantee of either. See bug #14344 and d8589946d for a
pretty recent example where that failed to be true --- admittedly, for
a tuplesort scan not a table scan.

It's the other way round. ExecProcNode might not always return its
result in the same slot, but all the callers must assume that it might.

Basically my concern is that this restriction isn't documented anywhere
and I'm not entirely certain it's been adhered to everywhere. I'd feel
much better about it if there were some way we could verify that.

Would support for valgrind complaining about access to unpinned buffers
suffice?

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: Pinning a buffer in TupleTableSlot is unnecessary

Andres Freund <andres@anarazel.de> writes:

On 2016-11-14 12:32:53 -0500, Tom Lane wrote:

Basically my concern is that this restriction isn't documented anywhere
and I'm not entirely certain it's been adhered to everywhere. I'd feel
much better about it if there were some way we could verify that.

Would support for valgrind complaining about access to unpinned buffers
suffice?

I don't think it directly addresses the issue, but certainly it'd help.
Do you think that's easily doable?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On 2016-11-14 13:12:28 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-11-14 12:32:53 -0500, Tom Lane wrote:

Basically my concern is that this restriction isn't documented anywhere
and I'm not entirely certain it's been adhered to everywhere. I'd feel
much better about it if there were some way we could verify that.

Would support for valgrind complaining about access to unpinned buffers
suffice?

I don't think it directly addresses the issue, but certainly it'd help.

Well, it detects situations where removed pins cause "unprotected
access", but of course that doesn't protect against cases where
independent pins hide that issue.

Do you think that's easily doable?

I think so, yes. IIRC I discussed it with Noah and Peter G. at a
conference recently. We'd basically mark the content of shared buffers
inaccessible at backend startup, and mark it accessible whenever a
PinBuffer() happens, and then inaccessible during unpinning. We probably
have to exclude the page header though, as we intentionally access them
unpinned in some cases IIRC.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Peter Geoghegan
pg@heroku.com
In reply to: Andres Freund (#20)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Nov 14, 2016 at 10:17 AM, Andres Freund <andres@anarazel.de> wrote:

I think so, yes. IIRC I discussed it with Noah and Peter G. at a
conference recently. We'd basically mark the content of shared buffers
inaccessible at backend startup, and mark it accessible whenever a
PinBuffer() happens, and then inaccessible during unpinning. We probably
have to exclude the page header though, as we intentionally access them
unpinned in some cases IIRC.

BTW, I recently noticed that the latest version of Valgrind, 3.12,
added this new feature:

* Memcheck:

- Added meta mempool support for describing a custom allocator which:
- Auto-frees all chunks assuming that destroying a pool destroys all
objects in the pool
- Uses itself to allocate other memory blocks

It occurred to me that we might be able to make good use of this. To
be clear, I don't think that there is reason to tie it to adding the
PinBuffer() stuff, which we've been talking about for years now. It
just caught my eye.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Noah Misch
noah@leadboat.com
In reply to: Peter Geoghegan (#21)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Nov 14, 2016 at 10:21:53AM -0800, Peter Geoghegan wrote:

BTW, I recently noticed that the latest version of Valgrind, 3.12,
added this new feature:

* Memcheck:

- Added meta mempool support for describing a custom allocator which:
- Auto-frees all chunks assuming that destroying a pool destroys all
objects in the pool
- Uses itself to allocate other memory blocks

It occurred to me that we might be able to make good use of this.

PostgreSQL memory contexts don't acquire blocks from other contexts; they all
draw from malloc() directly. Therefore, I don't see this feature giving us
something that the older Memcheck MEMPOOL support does not give us.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Noah Misch (#22)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Tue, Nov 15, 2016 at 1:17 PM, Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 14, 2016 at 10:21:53AM -0800, Peter Geoghegan wrote:

BTW, I recently noticed that the latest version of Valgrind, 3.12,
added this new feature:

* Memcheck:

- Added meta mempool support for describing a custom allocator which:
- Auto-frees all chunks assuming that destroying a pool destroys all
objects in the pool
- Uses itself to allocate other memory blocks

It occurred to me that we might be able to make good use of this.

PostgreSQL memory contexts don't acquire blocks from other contexts; they
all
draw from malloc() directly. Therefore, I don't see this feature giving us
something that the older Memcheck MEMPOOL support does not give us.

Moved to next CF with "needs review" status.
Please feel free to update the status if the current status doesn't reflect
the status
of the patch.

Regards,
Hari Babu
Fujitsu Australia

#24Michael Paquier
michael.paquier@gmail.com
In reply to: Haribabu Kommi (#23)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Dec 5, 2016 at 11:32 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

On Tue, Nov 15, 2016 at 1:17 PM, Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 14, 2016 at 10:21:53AM -0800, Peter Geoghegan wrote:

BTW, I recently noticed that the latest version of Valgrind, 3.12,
added this new feature:

* Memcheck:

- Added meta mempool support for describing a custom allocator which:
- Auto-frees all chunks assuming that destroying a pool destroys
all
objects in the pool
- Uses itself to allocate other memory blocks

It occurred to me that we might be able to make good use of this.

PostgreSQL memory contexts don't acquire blocks from other contexts; they
all
draw from malloc() directly. Therefore, I don't see this feature giving
us
something that the older Memcheck MEMPOOL support does not give us.

Moved to next CF with "needs review" status.
Please feel free to update the status if the current status doesn't reflect
the status of the patch.

The latest patch available still applies, one person has added his
name (John G) in October though there have been no reviews. There have
been a couple of arguments against this patch, and the thread has had
no activity for the last month, so I would be incline to mark that as
returned with feedback. Thoughts?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#24)
Re: Pinning a buffer in TupleTableSlot is unnecessary

On Wed, Jan 25, 2017 at 2:53 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

The latest patch available still applies, one person has added his
name (John G) in October though there have been no reviews. There have
been a couple of arguments against this patch, and the thread has had
no activity for the last month, so I would be incline to mark that as
returned with feedback. Thoughts?

And done as such. Feel free to update if there is something fresh.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers