rewrite HeapSatisfiesHOTAndKey

Started by Alvaro Herreraabout 9 years ago15 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com
1 attachment(s)

Pursuant to my comments at
/messages/by-id/20161223192245.hx4rbrxbrwtgwj6i@alvherre.pgsql
and because of a stupid bug I found in my indirect indexes patch, I
rewrote (read: removed) HeapSatisfiesHOTAndKey. The replacement
function HeapDetermineModifiedColumns returns a bitmapset with a bit set
for each modified column, for those columns that are listed as
"interesting" -- currently that set is the ID columns, the "key"
columns, and the indexed columns. The new code is much simpler, at the
expense of a few bytes of additional memory used during heap_update().

All tests pass.

Both WARM and indirect indexes should be able to use this infrastructure
in a way that better serves them than the current HeapSatisfiesHOTAndKey
(both patches modify that function in a rather ugly way). I intend to
get this pushed shortly unless objections are raised.

The new coding prevents stopping the check early as soon as the three
result booleans could be determined. However, both patches were adding
reasons to avoid stopping early anyway, so I don't think this is a
significant loss.

Pavan, please rebase your WARM patch on top of this and let me know how
you like it. I'll post a new version of indirect indexes later this
week.

Comments welcome.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

interesting-attrs.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ea579a0..1dbefda 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,11 +95,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup,
 				HeapTuple newtup, HeapTuple old_key_tup,
 				bool all_visible_cleared, bool new_all_visible_cleared);
-static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
-							 Bitmapset *hot_attrs,
-							 Bitmapset *key_attrs, Bitmapset *id_attrs,
-							 bool *satisfies_hot, bool *satisfies_key,
-							 bool *satisfies_id,
+static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
+							 Bitmapset *interesting_cols,
 							 HeapTuple oldtup, HeapTuple newtup);
 static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
 					 LockTupleMode mode, LockWaitPolicy wait_policy,
@@ -3443,6 +3440,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	Bitmapset  *hot_attrs;
 	Bitmapset  *key_attrs;
 	Bitmapset  *id_attrs;
+	Bitmapset  *interesting_attrs;
+	Bitmapset  *modified_attrs;
 	ItemId		lp;
 	HeapTupleData oldtup;
 	HeapTuple	heaptup;
@@ -3460,9 +3459,6 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				pagefree;
 	bool		have_tuple_lock = false;
 	bool		iscombo;
-	bool		satisfies_hot;
-	bool		satisfies_key;
-	bool		satisfies_id;
 	bool		use_hot_update = false;
 	bool		key_intact;
 	bool		all_visible_cleared = false;
@@ -3504,6 +3500,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
 	id_attrs = RelationGetIndexAttrBitmap(relation,
 										  INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	interesting_attrs = bms_add_members(NULL, hot_attrs);
+	interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
+	interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+
 
 	block = ItemPointerGetBlockNumber(otid);
 	buffer = ReadBuffer(relation, block);
@@ -3524,7 +3524,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	Assert(ItemIdIsNormal(lp));
 
 	/*
-	 * Fill in enough data in oldtup for HeapSatisfiesHOTandKeyUpdate to work
+	 * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
 	 * properly.
 	 */
 	oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3551,6 +3551,13 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	}
 
 	/*
+	 * Determine which columns are being modified by the update.
+	 */
+
+	modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
+												  &oldtup, newtup);
+
+	/*
 	 * If we're not updating any "key" column, we can grab a weaker lock type.
 	 * This allows for more concurrency when we are running simultaneously
 	 * with foreign key checks.
@@ -3561,10 +3568,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * is updates that don't manipulate key columns, not those that
 	 * serendipitiously arrive at the same key values.
 	 */
-	HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, id_attrs,
-								 &satisfies_hot, &satisfies_key,
-								 &satisfies_id, &oldtup, newtup);
-	if (satisfies_key)
+	if (!bms_overlap(modified_attrs, key_attrs))
 	{
 		*lockmode = LockTupleNoKeyExclusive;
 		mxact_status = MultiXactStatusNoKeyUpdate;
@@ -3803,6 +3807,8 @@ l2:
 		bms_free(hot_attrs);
 		bms_free(key_attrs);
 		bms_free(id_attrs);
+		bms_free(modified_attrs);
+		bms_free(interesting_attrs);
 		return result;
 	}
 
@@ -4107,7 +4113,7 @@ l2:
 		 * to do a HOT update.  Check if any of the index columns have been
 		 * changed.  If not, then HOT update is possible.
 		 */
-		if (satisfies_hot)
+		if (!bms_overlap(modified_attrs, hot_attrs))
 			use_hot_update = true;
 	}
 	else
@@ -4122,7 +4128,9 @@ l2:
 	 * ExtractReplicaIdentity() will return NULL if nothing needs to be
 	 * logged.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &oldtup, !satisfies_id, &old_key_copied);
+	old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
+										   bms_overlap(modified_attrs, id_attrs),
+										   &old_key_copied);
 
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
@@ -4270,6 +4278,8 @@ l2:
 	bms_free(hot_attrs);
 	bms_free(key_attrs);
 	bms_free(id_attrs);
+	bms_free(modified_attrs);
+	bms_free(interesting_attrs);
 
 	return HeapTupleMayBeUpdated;
 }
@@ -4369,100 +4379,24 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
  * modify columns used in the key; id_result is set to TRUE if the update does
  * not modify columns in any index marked as the REPLICA IDENTITY.
  */
-static void
-HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs,
-							 Bitmapset *key_attrs, Bitmapset *id_attrs,
-							 bool *satisfies_hot, bool *satisfies_key,
-							 bool *satisfies_id,
+static Bitmapset *
+HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
 							 HeapTuple oldtup, HeapTuple newtup)
 {
-	int			next_hot_attnum;
-	int			next_key_attnum;
-	int			next_id_attnum;
-	bool		hot_result = true;
-	bool		key_result = true;
-	bool		id_result = true;
+	int		attnum;
+	Bitmapset *modified = NULL;
 
-	/* If REPLICA IDENTITY is set to FULL, id_attrs will be empty. */
-	Assert(bms_is_subset(id_attrs, key_attrs));
-	Assert(bms_is_subset(key_attrs, hot_attrs));
-
-	/*
-	 * If one of these sets contains no remaining bits, bms_first_member will
-	 * return -1, and after adding FirstLowInvalidHeapAttributeNumber (which
-	 * is negative!)  we'll get an attribute number that can't possibly be
-	 * real, and thus won't match any actual attribute number.
-	 */
-	next_hot_attnum = bms_first_member(hot_attrs);
-	next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
-	next_key_attnum = bms_first_member(key_attrs);
-	next_key_attnum += FirstLowInvalidHeapAttributeNumber;
-	next_id_attnum = bms_first_member(id_attrs);
-	next_id_attnum += FirstLowInvalidHeapAttributeNumber;
-
-	for (;;)
+	while ((attnum = bms_first_member(interesting_cols)) >= 0)
 	{
-		bool		changed;
-		int			check_now;
+		attnum += FirstLowInvalidHeapAttributeNumber;
 
-		/*
-		 * Since the HOT attributes are a superset of the key attributes and
-		 * the key attributes are a superset of the id attributes, this logic
-		 * is guaranteed to identify the next column that needs to be checked.
-		 */
-		if (hot_result && next_hot_attnum > FirstLowInvalidHeapAttributeNumber)
-			check_now = next_hot_attnum;
-		else if (key_result && next_key_attnum > FirstLowInvalidHeapAttributeNumber)
-			check_now = next_key_attnum;
-		else if (id_result && next_id_attnum > FirstLowInvalidHeapAttributeNumber)
-			check_now = next_id_attnum;
-		else
-			break;
-
-		/* See whether it changed. */
-		changed = !heap_tuple_attr_equals(RelationGetDescr(relation),
-										  check_now, oldtup, newtup);
-		if (changed)
-		{
-			if (check_now == next_hot_attnum)
-				hot_result = false;
-			if (check_now == next_key_attnum)
-				key_result = false;
-			if (check_now == next_id_attnum)
-				id_result = false;
-
-			/* if all are false now, we can stop checking */
-			if (!hot_result && !key_result && !id_result)
-				break;
-		}
-
-		/*
-		 * Advance the next attribute numbers for the sets that contain the
-		 * attribute we just checked.  As we work our way through the columns,
-		 * the next_attnum values will rise; but when each set becomes empty,
-		 * bms_first_member() will return -1 and the attribute number will end
-		 * up with a value less than FirstLowInvalidHeapAttributeNumber.
-		 */
-		if (hot_result && check_now == next_hot_attnum)
-		{
-			next_hot_attnum = bms_first_member(hot_attrs);
-			next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
-		}
-		if (key_result && check_now == next_key_attnum)
-		{
-			next_key_attnum = bms_first_member(key_attrs);
-			next_key_attnum += FirstLowInvalidHeapAttributeNumber;
-		}
-		if (id_result && check_now == next_id_attnum)
-		{
-			next_id_attnum = bms_first_member(id_attrs);
-			next_id_attnum += FirstLowInvalidHeapAttributeNumber;
-		}
+		if (!heap_tuple_attr_equals(RelationGetDescr(relation),
+								   attnum, oldtup, newtup))
+			modified = bms_add_member(modified,
+									  attnum - FirstLowInvalidHeapAttributeNumber);
 	}
 
-	*satisfies_hot = hot_result;
-	*satisfies_key = key_result;
-	*satisfies_id = id_result;
+	return modified;
 }
 
 /*
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
1 attachment(s)
Re: rewrite HeapSatisfiesHOTAndKey

Here's a version with fixed comments.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

interesting-attrs-2.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ea579a0..19edbdf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,11 +95,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 				Buffer newbuf, HeapTuple oldtup,
 				HeapTuple newtup, HeapTuple old_key_tup,
 				bool all_visible_cleared, bool new_all_visible_cleared);
-static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
-							 Bitmapset *hot_attrs,
-							 Bitmapset *key_attrs, Bitmapset *id_attrs,
-							 bool *satisfies_hot, bool *satisfies_key,
-							 bool *satisfies_id,
+static Bitmapset *HeapDetermineModifiedColumns(Relation relation,
+							 Bitmapset *interesting_cols,
 							 HeapTuple oldtup, HeapTuple newtup);
 static bool heap_acquire_tuplock(Relation relation, ItemPointer tid,
 					 LockTupleMode mode, LockWaitPolicy wait_policy,
@@ -3443,6 +3440,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	Bitmapset  *hot_attrs;
 	Bitmapset  *key_attrs;
 	Bitmapset  *id_attrs;
+	Bitmapset  *interesting_attrs;
+	Bitmapset  *modified_attrs;
 	ItemId		lp;
 	HeapTupleData oldtup;
 	HeapTuple	heaptup;
@@ -3460,9 +3459,6 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				pagefree;
 	bool		have_tuple_lock = false;
 	bool		iscombo;
-	bool		satisfies_hot;
-	bool		satisfies_key;
-	bool		satisfies_id;
 	bool		use_hot_update = false;
 	bool		key_intact;
 	bool		all_visible_cleared = false;
@@ -3489,21 +3485,30 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 				 errmsg("cannot update tuples during a parallel operation")));
 
 	/*
-	 * Fetch the list of attributes to be checked for HOT update.  This is
-	 * wasted effort if we fail to update or have to put the new tuple on a
-	 * different page.  But we must compute the list before obtaining buffer
-	 * lock --- in the worst case, if we are doing an update on one of the
-	 * relevant system catalogs, we could deadlock if we try to fetch the list
-	 * later.  In any case, the relcache caches the data so this is usually
-	 * pretty cheap.
+	 * Fetch the list of attributes to be checked for various operations.
 	 *
-	 * Note that we get a copy here, so we need not worry about relcache flush
-	 * happening midway through.
+	 * For HOT considerations, this is wasted effort if we fail to update or
+	 * have to put the new tuple on a different page.  But we must compute the
+	 * list before obtaining buffer lock --- in the worst case, if we are doing
+	 * an update on one of the relevant system catalogs, we could deadlock if
+	 * we try to fetch the list later.  In any case, the relcache caches the
+	 * data so this is usually pretty cheap.
+	 *
+	 * We also need columns used by the replica identity, the columns that
+	 * are considered the "key" of rows in the table, and columns that are
+	 * part of indirect indexes.
+	 *
+	 * Note that we get copies of each bitmap, so we need not worry about
+	 * relcache flush happening midway through.
 	 */
 	hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_ALL);
 	key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
 	id_attrs = RelationGetIndexAttrBitmap(relation,
 										  INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	interesting_attrs = bms_add_members(NULL, hot_attrs);
+	interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
+	interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+
 
 	block = ItemPointerGetBlockNumber(otid);
 	buffer = ReadBuffer(relation, block);
@@ -3524,7 +3529,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	Assert(ItemIdIsNormal(lp));
 
 	/*
-	 * Fill in enough data in oldtup for HeapSatisfiesHOTandKeyUpdate to work
+	 * Fill in enough data in oldtup for HeapDetermineModifiedColumns to work
 	 * properly.
 	 */
 	oldtup.t_tableOid = RelationGetRelid(relation);
@@ -3550,6 +3555,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 		Assert(!(newtup->t_data->t_infomask & HEAP_HASOID));
 	}
 
+	/* Determine columns modified by the update. */
+	modified_attrs = HeapDetermineModifiedColumns(relation, interesting_attrs,
+												  &oldtup, newtup);
+
 	/*
 	 * If we're not updating any "key" column, we can grab a weaker lock type.
 	 * This allows for more concurrency when we are running simultaneously
@@ -3561,10 +3570,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 	 * is updates that don't manipulate key columns, not those that
 	 * serendipitiously arrive at the same key values.
 	 */
-	HeapSatisfiesHOTandKeyUpdate(relation, hot_attrs, key_attrs, id_attrs,
-								 &satisfies_hot, &satisfies_key,
-								 &satisfies_id, &oldtup, newtup);
-	if (satisfies_key)
+	if (!bms_overlap(modified_attrs, key_attrs))
 	{
 		*lockmode = LockTupleNoKeyExclusive;
 		mxact_status = MultiXactStatusNoKeyUpdate;
@@ -3803,6 +3809,8 @@ l2:
 		bms_free(hot_attrs);
 		bms_free(key_attrs);
 		bms_free(id_attrs);
+		bms_free(modified_attrs);
+		bms_free(interesting_attrs);
 		return result;
 	}
 
@@ -4107,7 +4115,7 @@ l2:
 		 * to do a HOT update.  Check if any of the index columns have been
 		 * changed.  If not, then HOT update is possible.
 		 */
-		if (satisfies_hot)
+		if (!bms_overlap(modified_attrs, hot_attrs))
 			use_hot_update = true;
 	}
 	else
@@ -4122,7 +4130,9 @@ l2:
 	 * ExtractReplicaIdentity() will return NULL if nothing needs to be
 	 * logged.
 	 */
-	old_key_tuple = ExtractReplicaIdentity(relation, &oldtup, !satisfies_id, &old_key_copied);
+	old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
+										   bms_overlap(modified_attrs, id_attrs),
+										   &old_key_copied);
 
 	/* NO EREPORT(ERROR) from here till changes are logged */
 	START_CRIT_SECTION();
@@ -4270,13 +4280,15 @@ l2:
 	bms_free(hot_attrs);
 	bms_free(key_attrs);
 	bms_free(id_attrs);
+	bms_free(modified_attrs);
+	bms_free(interesting_attrs);
 
 	return HeapTupleMayBeUpdated;
 }
 
 /*
  * Check if the specified attribute's value is same in both given tuples.
- * Subroutine for HeapSatisfiesHOTandKeyUpdate.
+ * Subroutine for HeapDetermineModifiedColumns.
  */
 static bool
 heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
@@ -4310,7 +4322,7 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
 
 	/*
 	 * Extract the corresponding values.  XXX this is pretty inefficient if
-	 * there are many indexed columns.  Should HeapSatisfiesHOTandKeyUpdate do
+	 * there are many indexed columns.  Should HeapDetermineModifiedColumns do
 	 * a single heap_deform_tuple call on each tuple, instead?	But that
 	 * doesn't work for system columns ...
 	 */
@@ -4355,114 +4367,30 @@ heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum,
 /*
  * Check which columns are being updated.
  *
- * This simultaneously checks conditions for HOT updates, for FOR KEY
- * SHARE updates, and REPLICA IDENTITY concerns.  Since much of the time they
- * will be checking very similar sets of columns, and doing the same tests on
- * them, it makes sense to optimize and do them together.
+ * Given an updated tuple, determine (and return into the output bitmapset),
+ * from those listed as interesting, the set of columns that changed.
  *
- * We receive three bitmapsets comprising the three sets of columns we're
- * interested in.  Note these are destructively modified; that is OK since
- * this is invoked at most once in heap_update.
- *
- * hot_result is set to TRUE if it's okay to do a HOT update (i.e. it does not
- * modified indexed columns); key_result is set to TRUE if the update does not
- * modify columns used in the key; id_result is set to TRUE if the update does
- * not modify columns in any index marked as the REPLICA IDENTITY.
+ * The input bitmapset is destructively modified; that is OK since this is
+ * invoked at most once in heap_update.
  */
-static void
-HeapSatisfiesHOTandKeyUpdate(Relation relation, Bitmapset *hot_attrs,
-							 Bitmapset *key_attrs, Bitmapset *id_attrs,
-							 bool *satisfies_hot, bool *satisfies_key,
-							 bool *satisfies_id,
+static Bitmapset *
+HeapDetermineModifiedColumns(Relation relation, Bitmapset *interesting_cols,
 							 HeapTuple oldtup, HeapTuple newtup)
 {
-	int			next_hot_attnum;
-	int			next_key_attnum;
-	int			next_id_attnum;
-	bool		hot_result = true;
-	bool		key_result = true;
-	bool		id_result = true;
+	int		attnum;
+	Bitmapset *modified = NULL;
 
-	/* If REPLICA IDENTITY is set to FULL, id_attrs will be empty. */
-	Assert(bms_is_subset(id_attrs, key_attrs));
-	Assert(bms_is_subset(key_attrs, hot_attrs));
-
-	/*
-	 * If one of these sets contains no remaining bits, bms_first_member will
-	 * return -1, and after adding FirstLowInvalidHeapAttributeNumber (which
-	 * is negative!)  we'll get an attribute number that can't possibly be
-	 * real, and thus won't match any actual attribute number.
-	 */
-	next_hot_attnum = bms_first_member(hot_attrs);
-	next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
-	next_key_attnum = bms_first_member(key_attrs);
-	next_key_attnum += FirstLowInvalidHeapAttributeNumber;
-	next_id_attnum = bms_first_member(id_attrs);
-	next_id_attnum += FirstLowInvalidHeapAttributeNumber;
-
-	for (;;)
+	while ((attnum = bms_first_member(interesting_cols)) >= 0)
 	{
-		bool		changed;
-		int			check_now;
+		attnum += FirstLowInvalidHeapAttributeNumber;
 
-		/*
-		 * Since the HOT attributes are a superset of the key attributes and
-		 * the key attributes are a superset of the id attributes, this logic
-		 * is guaranteed to identify the next column that needs to be checked.
-		 */
-		if (hot_result && next_hot_attnum > FirstLowInvalidHeapAttributeNumber)
-			check_now = next_hot_attnum;
-		else if (key_result && next_key_attnum > FirstLowInvalidHeapAttributeNumber)
-			check_now = next_key_attnum;
-		else if (id_result && next_id_attnum > FirstLowInvalidHeapAttributeNumber)
-			check_now = next_id_attnum;
-		else
-			break;
-
-		/* See whether it changed. */
-		changed = !heap_tuple_attr_equals(RelationGetDescr(relation),
-										  check_now, oldtup, newtup);
-		if (changed)
-		{
-			if (check_now == next_hot_attnum)
-				hot_result = false;
-			if (check_now == next_key_attnum)
-				key_result = false;
-			if (check_now == next_id_attnum)
-				id_result = false;
-
-			/* if all are false now, we can stop checking */
-			if (!hot_result && !key_result && !id_result)
-				break;
-		}
-
-		/*
-		 * Advance the next attribute numbers for the sets that contain the
-		 * attribute we just checked.  As we work our way through the columns,
-		 * the next_attnum values will rise; but when each set becomes empty,
-		 * bms_first_member() will return -1 and the attribute number will end
-		 * up with a value less than FirstLowInvalidHeapAttributeNumber.
-		 */
-		if (hot_result && check_now == next_hot_attnum)
-		{
-			next_hot_attnum = bms_first_member(hot_attrs);
-			next_hot_attnum += FirstLowInvalidHeapAttributeNumber;
-		}
-		if (key_result && check_now == next_key_attnum)
-		{
-			next_key_attnum = bms_first_member(key_attrs);
-			next_key_attnum += FirstLowInvalidHeapAttributeNumber;
-		}
-		if (id_result && check_now == next_id_attnum)
-		{
-			next_id_attnum = bms_first_member(id_attrs);
-			next_id_attnum += FirstLowInvalidHeapAttributeNumber;
-		}
+		if (!heap_tuple_attr_equals(RelationGetDescr(relation),
+								   attnum, oldtup, newtup))
+			modified = bms_add_member(modified,
+									  attnum - FirstLowInvalidHeapAttributeNumber);
 	}
 
-	*satisfies_hot = hot_result;
-	*satisfies_key = key_result;
-	*satisfies_id = id_result;
+	return modified;
 }
 
 /*
#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#1)
Re: rewrite HeapSatisfiesHOTAndKey

On Thu, Dec 29, 2016 at 4:50 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Pursuant to my comments at
/messages/by-id/20161223192245.hx4rbrxbrwtgwj6i@alvherre.pgsql
and because of a stupid bug I found in my indirect indexes patch, I
rewrote (read: removed) HeapSatisfiesHOTAndKey. The replacement
function HeapDetermineModifiedColumns returns a bitmapset with a bit set
for each modified column, for those columns that are listed as
"interesting" -- currently that set is the ID columns, the "key"
columns, and the indexed columns. The new code is much simpler, at the
expense of a few bytes of additional memory used during heap_update().

All tests pass.

Both WARM and indirect indexes should be able to use this infrastructure
in a way that better serves them than the current HeapSatisfiesHOTAndKey
(both patches modify that function in a rather ugly way). I intend to
get this pushed shortly unless objections are raised.

The new coding prevents stopping the check early as soon as the three
result booleans could be determined.

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table). The reason for such a suspicion
is that heap_getattr() is not so cheap that doing it multiple times is
free. Do you think it is worth to do few tests before committing the
patch?

Noticed below comment in interesting-attrs-2.patch
+ * are considered the "key" of rows in the table, and columns that are
+ * part of indirect indexes.

Is it right to mention about indirect indexes in above comment
considering indirect indexes are still not part of core code?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Amit Kapila (#3)
Re: rewrite HeapSatisfiesHOTAndKey

On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).

I don't see that as a problem because the routine only checks for columns
that are passed as "interesting_cols".

Noticed below comment in interesting-attrs-2.patch

+ * are considered the "key" of rows in the table, and columns that are
+ * part of indirect indexes.

Is it right to mention about indirect indexes in above comment
considering indirect indexes are still not part of core code?

I agree. We can add details about indirect indexes or WARM later, as and
when those patches get committed.

Pavan, please rebase your WARM patch on top of this and let me know how
you like it. I'll post a new version of indirect indexes later this
week.

I've rebased WARM on top of this patch and the proposed changes look fine
from WARM's perspective too. I'll send rebased patches separately.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Pavan Deolasee (#4)
Re: rewrite HeapSatisfiesHOTAndKey

On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).

I don't see that as a problem because the routine only checks for columns
that are passed as "interesting_cols".

Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Amit Kapila (#5)
Re: rewrite HeapSatisfiesHOTAndKey

On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:

On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).

I don't see that as a problem because the routine only checks for columns
that are passed as "interesting_cols".

Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.

Ah ok. I read your comment "consider index is on first and
eight column in a ten column table" as s/eight/eighth. But may be you're
referring to
the case where there is an index on eight or nine columns of a ten column
table.

You're right. That's an additional cost as Alvaro himself explained in the
original
post. But both indirect indexes and WARM needs to know information about all
modified columns. So assuming either of these patches are going to make it,
we've to bear that cost. Having said that, given that attributes are
usually cached,
the cost may not be significant since for non-HOT updates, the
attributes will be later fetched anyways while preparing index tuples. So
we're
probably doing that work in advance.

Obviously, I'm not against doing additional performance tests to ensure
that the
cost is not significant, especially if neither indirect indexes nor WARM
gets
committed in 10.0

Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Pavan Deolasee (#6)
Re: rewrite HeapSatisfiesHOTAndKey

On Mon, Jan 2, 2017 at 10:59 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Mon, Jan 2, 2017 at 10:17 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Mon, Jan 2, 2017 at 9:28 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:

On Mon, Jan 2, 2017 at 8:52 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

I think there is some chance that such a change could induce
regression for the cases when there are many index columns or I think
even when index is on multiple columns (consider index is on first and
eight column in a ten column table).

I don't see that as a problem because the routine only checks for
columns
that are passed as "interesting_cols".

Right, but now it will evaluate for all interesting_cols whereas
previously it would just stop at first if that column is changed.

Ah ok. I read your comment "consider index is on first and
eight column in a ten column table" as s/eight/eighth. But may be you're
referring to
the case where there is an index on eight or nine columns of a ten column
table.

I am talking about both kinds of cases. The scenario where we can see
some performance impact is when there is variable-width column before
the index column (in above context before the eighth column) as there
will be cached offset optimization won't work for such a column.

You're right. That's an additional cost as Alvaro himself explained in the
original
post. But both indirect indexes and WARM needs to know information about all
modified columns. So assuming either of these patches are going to make it,
we've to bear that cost.

Okay, but I think if we know how much is the additional cost in
average and worst case, then we can take a better call. Also, if we
agree, then doing an update-intensive test on a unlogged table or with
asynchronous commit mode can show us the overhead if there is any.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#7)
Re: rewrite HeapSatisfiesHOTAndKey

On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but I think if we know how much is the additional cost in
average and worst case, then we can take a better call.

Yeah. We shouldn't just rip out optimizations that are inconvenient
without doing some test of what the impact is on the cases where those
optimizations are likely to matter. I don't think it needs to be
anything incredibly laborious and if there's no discernable impact,
great. But it doesn't make sense to change things for the benefit of
WARM and indirect indexes and then discover later that we regressed
other cases we care about and have no plan to fix it. Better to find
out any potential problems of that kind now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Robert Haas (#8)
Re: rewrite HeapSatisfiesHOTAndKey

On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

Okay, but I think if we know how much is the additional cost in
average and worst case, then we can take a better call.

Yeah. We shouldn't just rip out optimizations that are inconvenient
without doing some test of what the impact is on the cases where those
optimizations are likely to matter. I don't think it needs to be
anything incredibly laborious and if there's no discernable impact,
great.

So I performed some tests to measure if this causes any noticeable
regression. I used the following simple schema:

DROP TABLE IF EXISTS testtab;
CREATE UNLOGGED TABLE testtab (
col1 integer,
col2 text,
col3 float,
col4 text,
col5 text,
col6 char(30),
col7 text,
col8 date,
col9 text,
col10 text
);
INSERT INTO testtab
SELECT generate_series(1,100000),
md5(random()::text),
random(),
md5(random()::text),
md5(random()::text),
md5(random()::text)::char(30),
md5(random()::text),
now(),
md5(random()::text),
md5(random()::text);
CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7,
col8, col9);

I used a rather wide UNLOGGED table with an index on first 9 columns, as
suggested by Amit. Also, the table has reasonable number of rows, but not
more than what shared buffers (set to 512MB for these tests) can hold. This
should make the test mostly CPU bound.

A transaction then updates the second column in the table. So the
refactored patch will do heap_getattr() on more columns that the master
while checking if HOT update is possible and before giving up. I believe we
are probably testing a somewhat worst case with this setup, though may be I
could have tuned some other configuration parameters.

\set value random(1, 100000)
UPDATE testtab SET col2 = md5(random()::text) WHERE col1 = :value;

I tested with -c1 and -c8 -j4 and the results are:

1-client
Master Refactored
Run1 8774.089935 8979.068604
Run2 8509.2661 8943.613575
Run3 8879.484019 8950.994425

8-clients
Master Refactored
Run1 22520.422448 22672.798871
Run2 21967.812303 22022.969747
Run3 22305.073223 21909.945623

So at best there is some improvement with the patch, though I don't see any
reason why it should positively affect the performance. The results with
more number of clients look almost identical, probably because the
bottleneck shifts somewhere else. For all these tests, table was dropped
and recreated in every iteration, so I don't think there was any error in
testing. It might be a good idea for someone else to repeat the tests to
confirm the improvement that I noticed.

Apart from this, I also ran "make check" multiple times and couldn't find
any significant difference in the average time.

I will leave it to Alvaro's judgement to decide whether it's worth to
commit the patch now or later when he or other committer looks at
committing WARM/indirect indexes because without either of those patches
this change probably does not bring up much value, if we ignore the slight
improvement we see here.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pavan Deolasee (#9)
Re: rewrite HeapSatisfiesHOTAndKey

Pavan Deolasee wrote:

A transaction then updates the second column in the table. So the
refactored patch will do heap_getattr() on more columns that the master
while checking if HOT update is possible and before giving up.

Thanks.

1-client
Master Refactored
Run1 8774.089935 8979.068604
Run2 8509.2661 8943.613575
Run3 8879.484019 8950.994425

8-clients
Master Refactored
Run1 22520.422448 22672.798871
Run2 21967.812303 22022.969747
Run3 22305.073223 21909.945623

Wow, this is very surprising. I was expecting a slight performance
decrease, not this. I will try to reproduce these numbers.

One thing worth mentioning is that the current implementation is not
very good -- I just kept the attribute check loop as it was in the
original, which uses heap_getattr. If we used incremental extraction
such as what we do in slot_getattr, it could be made a bit faster too.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Pavan Deolasee (#9)
Re: rewrite HeapSatisfiesHOTAndKey

On Wed, Jan 4, 2017 at 11:45 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

Okay, but I think if we know how much is the additional cost in
average and worst case, then we can take a better call.

Yeah. We shouldn't just rip out optimizations that are inconvenient
without doing some test of what the impact is on the cases where those
optimizations are likely to matter. I don't think it needs to be
anything incredibly laborious and if there's no discernable impact,
great.

So I performed some tests to measure if this causes any noticeable
regression.

Your test and results look good, what kind of m/c you have used to
test this. Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Amit Kapila (#11)
Re: rewrite HeapSatisfiesHOTAndKey

On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Your test and results look good, what kind of m/c you have used to
test this.

I ran it on my Macbook Pro, so nothing fancy. The code was compiled with
simple ./confgure and with no special flags. The only non-default setting
was shared_buffers = 512MB to ensure the table/index fits in memory.

Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.

Sure. That'll be helpful given a slight unexpected result. May be it's just
a noise or we may see different result on a high end m/c.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#11)
Re: rewrite HeapSatisfiesHOTAndKey

On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Your test and results look good, what kind of m/c you have used to
test this. Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.

As discussed with Amit, I have tried to run the same tests with below
modification,
1. Increased the total rows to 10milion.
2. Set fsync off;
3. Changed tests as below. Updated all rows at a time.

VACUUM FULL;
BEGIN;
UPDATE testtab SET col2 = md5(random()::text);
ROLLBACK;

I have run these tests on IBM power2 which have sufficient ram. I have
set shared_buffer=32GB.

My results show after this patch there is a slight increase in
response time (psql \timing was used) for the above update statement.
Which is around 5 to 10% delay.

Runs Response time in ms for update base. Response
Time in ms for update new patch. %INC

1 158863.501
167443.767
5.4010304104

2 151061.793
168908.536
11.8142004312

3 153750.577
164071.994
6.7130915548

4 153639.165
168364.225
9.5841838245

5 149607.139
166498.44
11.2904378179

Under the same condition running original tests, that is, updating
rows which satisfy a condition col1 = :value1. I did not see any
regression.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#13)
1 attachment(s)
Re: rewrite HeapSatisfiesHOTAndKey

On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Sorry Auto plain text setting has disturbed the table indentation.
Attaching the spreadsheet for same.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

HeapSatisfiesHOTAndKey_perf_results.odsapplication/vnd.oasis.opendocument.spreadsheet; name=HeapSatisfiesHOTAndKey_perf_results.odsDownload
#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#13)
Re: rewrite HeapSatisfiesHOTAndKey

On Sat, Jan 7, 2017 at 11:27 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Thu, Jan 5, 2017 at 6:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Your test and results look good, what kind of m/c you have used to
test this. Let me see if I or one of my colleague can do this and
similar test on some high-end m/c.

As discussed with Amit, I have tried to run the same tests with below
modification,
1. Increased the total rows to 10milion.
2. Set fsync off;
3. Changed tests as below. Updated all rows at a time.

VACUUM FULL;
BEGIN;
UPDATE testtab SET col2 = md5(random()::text);
ROLLBACK;

I have run these tests on IBM power2 which have sufficient ram. I have
set shared_buffer=32GB.

My results show after this patch there is a slight increase in
response time (psql \timing was used) for the above update statement.
Which is around 5 to 10% delay.

I would like to add here, that the intention of the test was to stress
the changes of the patch to see the overhead patch can incur. Now,
surely this is a synthetic test prepared to test this patch, but I
think it indicates that the changes have some overhead which might or
might not be ignorable depending on how important is to get this
patch. I think if Warm tuples or indirect indexes need this patch and
they can't do without this, then it is worth considering this patch
along with those patches. OTOH, if we can reduce the overhead, then
it might be okay proceed with this patch on the basis of simplicity it
can bring.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers