hash_search and out of memory

Started by Hitoshi Haradaabout 13 years ago7 messages
#1Hitoshi Harada
umi.tanuki@gmail.com

If OOM happens during expand_table() in hash_search_with_hash_value()
for RelationCacheInsert, the hash table entry is allocated and stored
in the hash table, but idhentry->reldesc remains NULL. Since OOM
causes AbortTransaction(), in AtEOXact_RelationCache() this NULL
pointer is referenced and we hit SIGSEGV.

The fix would be either catch OOM error with PG_TRY() and undo the
hash entry, or do the expansion first and put the entry later. The
latter is a bit ugly because we have to re-calculate hash bucket after
we decided to expand, so the former looks better. Do you think of
other solutions?

Thanks,
--
Hitoshi Harada

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#1)
Re: hash_search and out of memory

Hitoshi Harada <umi.tanuki@gmail.com> writes:

If OOM happens during expand_table() in hash_search_with_hash_value()
for RelationCacheInsert,

What OOM? expand_table is supposed to return without doing anything
if it can't expand the table. If that's not happening, that's a bug
in the hash code.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: hash_search and out of memory

I wrote:

Hitoshi Harada <umi.tanuki@gmail.com> writes:

If OOM happens during expand_table() in hash_search_with_hash_value()
for RelationCacheInsert,

What OOM? expand_table is supposed to return without doing anything
if it can't expand the table. If that's not happening, that's a bug
in the hash code.

Oh, wait, I take that back --- the palloc-based allocator does throw
errors. I think that when that was designed, we were thinking that
palloc-based hash tables would be thrown away anyway after an error,
but of course that's not true for long-lived tables such as the relcache
hash table.

I'm not terribly comfortable with trying to use a PG_TRY block to catch
an OOM error - there are too many ways that could break, and this code
path is by definition not very testable. I think moving up the
expand_table action is probably the best bet. Will you submit a patch?

regards, tom lane

#4Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Tom Lane (#3)
1 attachment(s)
Re: hash_search and out of memory

On Thu, Oct 18, 2012 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Hitoshi Harada <umi.tanuki@gmail.com> writes:

If OOM happens during expand_table() in hash_search_with_hash_value()
for RelationCacheInsert,

the palloc-based allocator does throw
errors. I think that when that was designed, we were thinking that
palloc-based hash tables would be thrown away anyway after an error,
but of course that's not true for long-lived tables such as the relcache
hash table.

I'm not terribly comfortable with trying to use a PG_TRY block to catch
an OOM error - there are too many ways that could break, and this code
path is by definition not very testable. I think moving up the
expand_table action is probably the best bet. Will you submit a patch?

Here it is. I factored out the bucket finding code to re-calculate it
after expansion.

Thanks,
--
Hitoshi Harada

Attachments:

hashoom.patchapplication/octet-stream; name=hashoom.patchDownload
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 6d0e0f5..210e1f2 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -195,6 +195,9 @@ static HASHSEGMENT seg_alloc(HTAB *hashp);
 static bool element_alloc(HTAB *hashp, int nelem);
 static bool dir_realloc(HTAB *hashp);
 static bool expand_table(HTAB *hashp);
+static HASHBUCKET get_matched_bucket(HTAB *hashp,
+				   const void *keyPtr, uint32 hashvalue,
+				   HASHBUCKET **prevBucketPtr);
 static HASHBUCKET get_hash_entry(HTAB *hashp);
 static void hdefault(HTAB *hashp);
 static int	choose_nelem_alloc(Size entrysize);
@@ -806,54 +809,15 @@ hash_search_with_hash_value(HTAB *hashp,
 							bool *foundPtr)
 {
 	HASHHDR    *hctl = hashp->hctl;
-	Size		keysize;
-	uint32		bucket;
-	long		segment_num;
-	long		segment_ndx;
-	HASHSEGMENT segp;
 	HASHBUCKET	currBucket;
 	HASHBUCKET *prevBucketPtr;
-	HashCompareFunc match;
 
 #if HASH_STATISTICS
 	hash_accesses++;
 	hctl->accesses++;
 #endif
 
-	/*
-	 * Do the initial lookup
-	 */
-	bucket = calc_bucket(hctl, hashvalue);
-
-	segment_num = bucket >> hashp->sshift;
-	segment_ndx = MOD(bucket, hashp->ssize);
-
-	segp = hashp->dir[segment_num];
-
-	if (segp == NULL)
-		hash_corrupted(hashp);
-
-	prevBucketPtr = &segp[segment_ndx];
-	currBucket = *prevBucketPtr;
-
-	/*
-	 * Follow collision chain looking for matching key
-	 */
-	match = hashp->match;		/* save one fetch in inner loop */
-	keysize = hashp->keysize;	/* ditto */
-
-	while (currBucket != NULL)
-	{
-		if (currBucket->hashvalue == hashvalue &&
-			match(ELEMENTKEY(currBucket), keyPtr, keysize) == 0)
-			break;
-		prevBucketPtr = &(currBucket->link);
-		currBucket = *prevBucketPtr;
-#if HASH_STATISTICS
-		hash_collisions++;
-		hctl->collisions++;
-#endif
-	}
+	currBucket = get_matched_bucket(hashp, keyPtr, hashvalue, &prevBucketPtr);
 
 	if (foundPtr)
 		*foundPtr = (bool) (currBucket != NULL);
@@ -915,6 +879,31 @@ hash_search_with_hash_value(HTAB *hashp,
 				elog(ERROR, "cannot insert into frozen hashtable \"%s\"",
 					 hashp->tabname);
 
+			/*
+			 * Check if it is time to split a bucket.  Can't split if running
+			 * in partitioned mode, nor if table is the subject of any active
+			 * hash_seq_search scans.  Strange order of these tests is to try
+			 * to check cheaper conditions first.
+			 */
+			if (!IS_PARTITIONED(hctl) &&
+			(hctl->nentries + 1) / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+				!has_seq_scans(hashp))
+			{
+				/*
+				 * We do this before allocating a hash entry, because some
+				 * long-lived hash tables like relcache could run into
+				 * trouble in transaction abort clean up if expand_table
+				 * errors out after we allocate an entry that has not been
+				 * filled out.
+				 */
+				expand_table(hashp);
+
+				/*
+				 * Calculate bucket to find the spot to insert the new entry.
+				 */
+				get_matched_bucket(hashp, keyPtr, hashvalue, &prevBucketPtr);
+			}
+
 			currBucket = get_hash_entry(hashp);
 			if (currBucket == NULL)
 			{
@@ -938,27 +927,10 @@ hash_search_with_hash_value(HTAB *hashp,
 
 			/* copy key into record */
 			currBucket->hashvalue = hashvalue;
-			hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize);
+			hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, hashp->keysize);
 
 			/* caller is expected to fill the data field on return */
 
-			/*
-			 * Check if it is time to split a bucket.  Can't split if running
-			 * in partitioned mode, nor if table is the subject of any active
-			 * hash_seq_search scans.  Strange order of these tests is to try
-			 * to check cheaper conditions first.
-			 */
-			if (!IS_PARTITIONED(hctl) &&
-			hctl->nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
-				!has_seq_scans(hashp))
-			{
-				/*
-				 * NOTE: failure to expand table is not a fatal error, it just
-				 * means we have to run at higher fill factor than we wanted.
-				 */
-				expand_table(hashp);
-			}
-
 			return (void *) ELEMENTKEY(currBucket);
 	}
 
@@ -968,6 +940,60 @@ hash_search_with_hash_value(HTAB *hashp,
 }
 
 /*
+ * Get the entry that matches key, with the previous entry in the same bucket.
+ */
+static HASHBUCKET
+get_matched_bucket(HTAB *hashp, const void *keyPtr, uint32 hashvalue,
+				   HASHBUCKET **prevBucketPtr)
+{
+	HASHHDR	   *hctl = hashp->hctl;
+	Size		keysize;
+	uint32		bucket;
+	long		segment_num;
+	long		segment_ndx;
+	HASHSEGMENT	segp;
+	HASHBUCKET	currBucket;
+	HashCompareFunc match;
+
+	/*
+	 * Do the initial lookup
+	 */
+	bucket = calc_bucket(hctl, hashvalue);
+
+	segment_num = bucket >> hashp->sshift;
+	segment_ndx = MOD(bucket, hashp->ssize);
+
+	segp = hashp->dir[segment_num];
+
+	if (segp == NULL)
+		hash_corrupted(hashp);
+
+	*prevBucketPtr = &segp[segment_ndx];
+	currBucket = **prevBucketPtr;
+
+	/*
+	 * Follow collision chain looking for matching key
+	 */
+	match = hashp->match;		/* save one fetch in inner loop */
+	keysize = hashp->keysize;	/* ditto */
+
+	while (currBucket != NULL)
+	{
+		if (currBucket->hashvalue == hashvalue &&
+			match(ELEMENTKEY(currBucket), keyPtr, keysize) == 0)
+			break;
+		*prevBucketPtr = &(currBucket->link);
+		currBucket = **prevBucketPtr;
+#if HASH_STATISTICS
+		hash_collisions++;
+		hctl->collisions++;
+#endif
+	}
+
+	return currBucket;
+}
+
+/*
  * create a new entry if possible
  */
 static HASHBUCKET
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#4)
Re: hash_search and out of memory

Hitoshi Harada <umi.tanuki@gmail.com> writes:

On Thu, Oct 18, 2012 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not terribly comfortable with trying to use a PG_TRY block to catch
an OOM error - there are too many ways that could break, and this code
path is by definition not very testable. I think moving up the
expand_table action is probably the best bet. Will you submit a patch?

Here it is. I factored out the bucket finding code to re-calculate it
after expansion.

I didn't like that too much. I think a better solution is just to do
the table expansion at the very start of the function, along the lines
of the attached patch. This requires an extra test of the "action"
parameter, but I think that's probably cheaper than an extra function
call. It's definitely cheaper than recalculating the hash etc when
a split does occur.

regards, tom lane

#6Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Tom Lane (#5)
Re: hash_search and out of memory

On Fri, Oct 19, 2012 at 11:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hitoshi Harada <umi.tanuki@gmail.com> writes:

On Thu, Oct 18, 2012 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not terribly comfortable with trying to use a PG_TRY block to catch
an OOM error - there are too many ways that could break, and this code
path is by definition not very testable. I think moving up the
expand_table action is probably the best bet. Will you submit a patch?

Here it is. I factored out the bucket finding code to re-calculate it
after expansion.

I didn't like that too much. I think a better solution is just to do
the table expansion at the very start of the function, along the lines
of the attached patch. This requires an extra test of the "action"
parameter, but I think that's probably cheaper than an extra function
call. It's definitely cheaper than recalculating the hash etc when
a split does occur.

OK. Looks better. But nentries should be bogus a little now?

+ 		/*
+ 		 * Can't split if running in partitioned mode, nor if frozen, nor if
+ 		 * table is the subject of any active hash_seq_search scans.  Strange
+ 		 * order of these tests is to try to check cheaper conditions first.
+ 		 */
+ 		if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
+ 			hctl->nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+ 			!has_seq_scans(hashp))
+ 			(void) expand_table(hashp);

hctl->nentries + 1 sounds appropriate?

Thanks,
--
Hitoshi Harada

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#6)
Re: hash_search and out of memory

Hitoshi Harada <umi.tanuki@gmail.com> writes:

OK. Looks better. But nentries should be bogus a little now?

No, I think it's fine as is. Essentially this logic says "attempt to
split when the new insertion would make us go over the target fill
factor", whereas the old logic split when the just-completed insertion
reached the target. There's not a lot of reason to care about the
precise boundary anyway, but to the extent that you believe that the
target is exact, I think this behavior is actually a bit more precise.

regards, tom lane