Speeding Up Bitmapset

Started by Ranier Vilelaover 2 years ago5 messages

ranier.vf@gmail.com

over 2 years ago

10 attachment(s)

Hi,

Yuya Watari presented a series of patches, with the objective of improving
the Bitmapset [1]/messages/by-id/CAJ2pMkZ0HfhfA0QNa_msknL=_-PavZmPHRWnW7yOb3_PWUoB+g@mail.gmail.com.
After reading through the patches, I saw a lot of good ideas and thought
I'd help.
Unfortunately, my suggestions were not well received.
Even so, I decided to work on these patches and see what could be improved.

Eventually it arrived at the attached patch, which I'm naming v7, because
of the sequence it had established.

Those who follow the other thread will see that the original patch actually
improves the overall performance of Bitmapset,
but I believe there is room for further improvement.

I ran the same tests provided by Yuya Watari and the results are attached.
Both on Windows and Linux Ubuntu, the performance of v7 outperforms head
and v4.
At Ubuntu 64 bits:
v7 outperforms v4 by 29% (Query A)
v7 outperforms v4 by 19% (Query B)

At Windows 64 bits:
v7 outperforms v4 by 22% (Query A)
v7 outperforms v4 by 33% (Query B)

I believe patch v7 leaves the Bitmapset in good shape and readable, well
written.

Questions arose regarding possible regression when using the backward loop,
but in all the tests I ran this version with backward performed better.
Possibly because of the smaller number of variables and the efficient test
(--i <= 0), which both the msvc and gcc compilers successfully optimize.

If the v7 version with loop forward performs worse on x86_64 cpus,
I don't see how it will perform better on other architectures, since the
vast majority of modern ones and with great cache support.

Another question is related to an alleged "spurius test", whose objective
is to avoid test and set instructions for each element of the array.
Again I ran the tests without the test and the performance was worse,
showing its value.

regards,
Ranier Vilela

[1]: /messages/by-id/CAJ2pMkZ0HfhfA0QNa_msknL=_-PavZmPHRWnW7yOb3_PWUoB+g@mail.gmail.com
/messages/by-id/CAJ2pMkZ0HfhfA0QNa_msknL=_-PavZmPHRWnW7yOb3_PWUoB+g@mail.gmail.com

Attachments:

bench_windows_bitmapset.txttext/plain; charset=US-ASCII; name=bench_windows_bitmapset.txtDownload

speeding-up-bitmapset-v7.patchapplication/octet-stream; name=speeding-up-bitmapset-v7.patchDownload

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 7ba3cf635b..2e8258140c 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -5,8 +5,16 @@
  *
  * A bitmap set can represent any set of nonnegative integers, although
  * it is mainly intended for sets where the maximum value is not large,
- * say at most a few hundred.  By convention, we always represent the
- * empty set by a NULL pointer.
+ * say at most a few hundred.  By convention, we always represent a set with
+ * the minimum possible number of words, i.e, there are never any trailing
+ * zero words. Enforcing this requires that an empty set is represented as
+ * NULL. Because an empty Bitmapset is represented as NULL, a non-NULL
+ * Bitmapset always has at least 1 Bitmapword. We can exploit this fact to
+ * speedup various loops over the Bitmapset's words array by using "do while"
+ * loops instead of "for" loops. This means the code does not waste time
+ * checking the loop condition before the first iteration. For Bitmapsets
+ * containing only a single word (likely the majority of them) this reduces
+ * the loop condition tests by half.
  *
  *
  * Copyright (c) 2003-2023, PostgreSQL Global Development Group
@@ -64,8 +72,6 @@
 #error "invalid BITS_PER_BITMAPWORD"
 #endif
 
-static bool bms_is_empty_internal(const Bitmapset *a);
-
 
 /*
  * bms_copy - make a palloc'd copy of a bitmapset
@@ -85,18 +91,11 @@ bms_copy(const Bitmapset *a)
 }
 
 /*
- * bms_equal - are two bitmapsets equal?
- *
- * This is logical not physical equality; in particular, a NULL pointer will
- * be reported as equal to a palloc'd value containing no members.
+ * bms_equal - are two bitmapsets equal? or both NULL?
  */
 bool
 bms_equal(const Bitmapset *a, const Bitmapset *b)
 {
-	const Bitmapset *shorter;
-	const Bitmapset *longer;
-	int			shortlen;
-	int			longlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -108,30 +107,19 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 	}
 	else if (b == NULL)
 		return false;
-	/* Identify shorter and longer input */
-	if (a->nwords <= b->nwords)
-	{
-		shorter = a;
-		longer = b;
-	}
-	else
-	{
-		shorter = b;
-		longer = a;
-	}
-	/* And process */
-	shortlen = shorter->nwords;
-	for (i = 0; i < shortlen; i++)
-	{
-		if (shorter->words[i] != longer->words[i])
-			return false;
-	}
-	longlen = longer->nwords;
-	for (; i < longlen; i++)
+
+	/* can't be equal if the word counts don't match */
+	if (a->nwords != b->nwords)
+		return false;
+
+	/* check each word matches */
+	i = a->nwords - 1;
+	do
 	{
-		if (longer->words[i] != 0)
+		if (a->words[i] != b->words[i])
 			return false;
-	}
+	} while (--i >= 0);
+
 	return true;
 }
 
@@ -146,7 +134,6 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 int
 bms_compare(const Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -154,28 +141,21 @@ bms_compare(const Bitmapset *a, const Bitmapset *b)
 		return (b == NULL) ? 0 : -1;
 	else if (b == NULL)
 		return +1;
-	/* Handle cases where one input is longer than the other */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = shortlen; i < a->nwords; i++)
-	{
-		if (a->words[i] != 0)
-			return +1;
-	}
-	for (i = shortlen; i < b->nwords; i++)
-	{
-		if (b->words[i] != 0)
-			return -1;
-	}
-	/* Process words in common */
-	i = shortlen;
-	while (--i >= 0)
+
+	/* the set with the most words must be greater */
+	if (a->nwords != b->nwords)
+		return (a->nwords - b->nwords);
+
+	i = a->nwords - 1;
+	do
 	{
 		bitmapword	aw = a->words[i];
 		bitmapword	bw = b->words[i];
 
 		if (aw != bw)
-			return (aw > bw) ? +1 : -1;
-	}
+			return (aw - bw);
+	} while (--i >= 0);
+
 	return 0;
 }
 
@@ -227,7 +207,6 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 {
 	Bitmapset  *result;
 	const Bitmapset *other;
-	int			otherlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -235,6 +214,7 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 		return bms_copy(b);
 	if (b == NULL)
 		return bms_copy(a);
+
 	/* Identify shorter and longer input; copy the longer one */
 	if (a->nwords <= b->nwords)
 	{
@@ -246,10 +226,14 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 		result = bms_copy(a);
 		other = b;
 	}
+
 	/* And union the shorter input into the result */
-	otherlen = other->nwords;
-	for (i = 0; i < otherlen; i++)
+	i = other->nwords - 1;
+	do
+	{
 		result->words[i] |= other->words[i];
+	} while (--i >= 0);
+
 	return result;
 }
 
@@ -261,12 +245,13 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 {
 	Bitmapset  *result;
 	const Bitmapset *other;
-	int			resultlen;
+	int			lastnonzero;
 	int			i;
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
 		return NULL;
+
 	/* Identify shorter and longer input; copy the shorter one */
 	if (a->nwords <= b->nwords)
 	{
@@ -278,16 +263,27 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 		result = bms_copy(b);
 		other = a;
 	}
+
 	/* And intersect the longer input with the result */
-	resultlen = result->nwords;
-	for (i = 0; i < resultlen; i++)
+	lastnonzero = -1;
+	i = result->nwords - 1;
+	do
+	{
 		result->words[i] &= other->words[i];
+
+		if (lastnonzero == -1 && result->words[i])
+			lastnonzero = i;
+	} while (--i >= 0);
+
 	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(result))
+	if (lastnonzero == -1)
 	{
 		pfree(result);
 		return NULL;
 	}
+
+	/* get rid of trailing zero words */
+	result->nwords = lastnonzero + 1;
 	return result;
 }
 
@@ -298,7 +294,6 @@ Bitmapset *
 bms_difference(const Bitmapset *a, const Bitmapset *b)
 {
 	Bitmapset  *result;
-	int			shortlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -317,10 +312,40 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 
 	/* Copy the left input */
 	result = bms_copy(a);
+
 	/* And remove b's bits from result */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
-		result->words[i] &= ~b->words[i];
+	if (result->nwords > b->nwords)
+	{
+		/*
+		 * We'll never need to remove trailing zero words when 'a' has more
+		 * words than 'b' as the additional words must be non-zero.
+		 */
+		i = b->nwords - 1;
+		do
+		{
+			result->words[i] &= ~b->words[i];
+		} while (--i >= 0);
+	}
+	else
+	{
+		int			lastnonzero = -1;
+
+		/* we may need to remove trailing zero words from the result. */
+		i = result->nwords - 1;
+		do
+		{
+			result->words[i] &= ~b->words[i];
+
+			/* remember the last non-zero word */
+			if (lastnonzero == -1 && result->words[i])
+				lastnonzero = i;
+		} while (--i >= 0);
+
+		/* trim off trailing zero words */
+		result->nwords = lastnonzero + 1;
+	}
+	Assert(result->nwords != 0);
+
 	/* Need not check for empty result, since we handled that case above */
 	return result;
 }
@@ -331,8 +356,6 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 bool
 bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
-	int			longlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -340,23 +363,19 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 		return true;			/* empty set is a subset of anything */
 	if (b == NULL)
 		return false;
-	/* Check common words */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+
+	/* 'a' can't be a subset of 'b' if it contains more words */
+	if (a->nwords > b->nwords)
+		return false;
+
+	/* Check all 'a' members are set in 'b' */
+	i = a->nwords - 1;
+	do
 	{
 		if ((a->words[i] & ~b->words[i]) != 0)
 			return false;
-	}
-	/* Check extra words */
-	if (a->nwords > b->nwords)
-	{
-		longlen = a->nwords;
-		for (; i < longlen; i++)
-		{
-			if (a->words[i] != 0)
-				return false;
-		}
-	}
+	} while (--i >= 0);
+
 	return true;
 }
 
@@ -370,7 +389,6 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 {
 	BMS_Comparison result;
 	int			shortlen;
-	int			longlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -382,10 +400,12 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	}
 	if (b == NULL)
 		return BMS_SUBSET2;
+
 	/* Check common words */
 	result = BMS_EQUAL;			/* status so far */
 	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	i = 0;
+	do
 	{
 		bitmapword	aword = a->words[i];
 		bitmapword	bword = b->words[i];
@@ -404,35 +424,22 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 				return BMS_DIFFERENT;
 			result = BMS_SUBSET1;
 		}
-	}
+	} while (++i < shortlen);
+
 	/* Check extra words */
 	if (a->nwords > b->nwords)
 	{
-		longlen = a->nwords;
-		for (; i < longlen; i++)
-		{
-			if (a->words[i] != 0)
-			{
-				/* a is not a subset of b */
-				if (result == BMS_SUBSET1)
-					return BMS_DIFFERENT;
-				result = BMS_SUBSET2;
-			}
-		}
+		/* if a has more words then a is not a subset of b */
+		if (result == BMS_SUBSET1)
+			return BMS_DIFFERENT;
+		return BMS_SUBSET2;
 	}
 	else if (a->nwords < b->nwords)
 	{
-		longlen = b->nwords;
-		for (; i < longlen; i++)
-		{
-			if (b->words[i] != 0)
-			{
-				/* b is not a subset of a */
-				if (result == BMS_SUBSET2)
-					return BMS_DIFFERENT;
-				result = BMS_SUBSET1;
-			}
-		}
+		/* if b has more words then b is not a subset of a */
+		if (result == BMS_SUBSET2)
+			return BMS_DIFFERENT;
+		return BMS_SUBSET1;
 	}
 	return result;
 }
@@ -467,7 +474,7 @@ bms_is_member(int x, const Bitmapset *a)
  * Returns (-1) when x is not a member.
  */
 int
-bms_member_index(Bitmapset *a, int x)
+bms_member_index(const Bitmapset *a, int x)
 {
 	int			i;
 	int			bitnum;
@@ -488,7 +495,7 @@ bms_member_index(Bitmapset *a, int x)
 		bitmapword	w = a->words[i];
 
 		/* No need to count the bits in a zero word */
-		if (w != 0)
+		if (w)
 			result += bmw_popcount(w);
 	}
 
@@ -516,13 +523,16 @@ bms_overlap(const Bitmapset *a, const Bitmapset *b)
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
 		return false;
+
 	/* Check words in common */
 	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	i = shortlen - 1;
+	do
 	{
 		if ((a->words[i] & b->words[i]) != 0)
 			return true;
-	}
+	} while (--i >= 0);
+
 	return false;
 }
 
@@ -533,8 +543,6 @@ bool
 bms_overlap_list(const Bitmapset *a, const List *b)
 {
 	ListCell   *lc;
-	int			wordnum,
-				bitnum;
 
 	if (a == NULL || b == NIL)
 		return false;
@@ -542,6 +550,8 @@ bms_overlap_list(const Bitmapset *a, const List *b)
 	foreach(lc, b)
 	{
 		int			x = lfirst_int(lc);
+		int			wordnum,
+					bitnum;
 
 		if (x < 0)
 			elog(ERROR, "negative bitmapset member not allowed");
@@ -563,7 +573,6 @@ bms_overlap_list(const Bitmapset *a, const List *b)
 bool
 bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -571,19 +580,19 @@ bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 		return false;
 	if (b == NULL)
 		return true;
-	/* Check words in common */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+
+	/* if 'a' has more words then it must contain additional members */
+	if (a->nwords > b->nwords)
+		return true;
+
+	/* Check all 'a' members are set in 'b' */
+	i = a->nwords - 1;
+	do
 	{
 		if ((a->words[i] & ~b->words[i]) != 0)
 			return true;
-	}
-	/* Check extra words in a */
-	for (; i < a->nwords; i++)
-	{
-		if (a->words[i] != 0)
-			return true;
-	}
+	} while (--i >= 0);
+
 	return false;
 }
 
@@ -602,20 +611,22 @@ bms_singleton_member(const Bitmapset *a)
 	if (a == NULL)
 		elog(ERROR, "bitmapset is empty");
 	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+	wordnum = 0;
+	do
 	{
 		bitmapword	w = a->words[wordnum];
 
-		if (w != 0)
+		if (w)
 		{
 			if (result >= 0 || HAS_MULTIPLE_ONES(w))
 				elog(ERROR, "bitmapset has multiple members");
 			result = wordnum * BITS_PER_BITMAPWORD;
 			result += bmw_rightmost_one_pos(w);
 		}
-	}
-	if (result < 0)
-		elog(ERROR, "bitmapset is empty");
+	} while (++wordnum < nwords);
+
+	/* we don't expect non-NULL sets to be empty */
+	Assert(result >= 0);
 	return result;
 }
 
@@ -640,20 +651,22 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 	if (a == NULL)
 		return false;
 	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+	wordnum = 0;
+	do
 	{
 		bitmapword	w = a->words[wordnum];
 
-		if (w != 0)
+		if (w)
 		{
 			if (result >= 0 || HAS_MULTIPLE_ONES(w))
 				return false;
 			result = wordnum * BITS_PER_BITMAPWORD;
 			result += bmw_rightmost_one_pos(w);
 		}
-	}
-	if (result < 0)
-		return false;
+	} while (++wordnum < nwords);
+
+	/* we don't expect non-NULL sets to be empty */
+	Assert(result >= 0);
 	*member = result;
 	return true;
 }
@@ -665,20 +678,21 @@ int
 bms_num_members(const Bitmapset *a)
 {
 	int			result = 0;
-	int			nwords;
-	int			wordnum;
+	int			i;
 
 	if (a == NULL)
 		return 0;
-	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+
+	i = a->nwords - 1;
+	do
 	{
-		bitmapword	w = a->words[wordnum];
+		bitmapword	w = a->words[i];
 
 		/* No need to count the bits in a zero word */
-		if (w != 0)
+		if (w)
 			result += bmw_popcount(w);
-	}
+	} while (--i >= 0);
+
 	return result;
 }
 
@@ -690,49 +704,27 @@ bms_num_members(const Bitmapset *a)
 BMS_Membership
 bms_membership(const Bitmapset *a)
 {
-	BMS_Membership result = BMS_EMPTY_SET;
-	int			nwords;
-	int			wordnum;
+	BMS_Membership result;
+	int			i;
 
 	if (a == NULL)
 		return BMS_EMPTY_SET;
-	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+
+	result = BMS_EMPTY_SET;
+	i = a->nwords - 1;
+	do
 	{
-		bitmapword	w = a->words[wordnum];
+		bitmapword	w = a->words[i];
 
-		if (w != 0)
+		if (w)
 		{
 			if (result != BMS_EMPTY_SET || HAS_MULTIPLE_ONES(w))
 				return BMS_MULTIPLE;
 			result = BMS_SINGLETON;
 		}
-	}
-	return result;
-}
-
-/*
- * bms_is_empty_internal - is a set empty?
- *
- * This is now used only locally, to detect cases where a function has
- * computed an empty set that we must now get rid of.  Hence, we can
- * assume the input isn't NULL.
- */
-static bool
-bms_is_empty_internal(const Bitmapset *a)
-{
-	int			nwords;
-	int			wordnum;
+	} while (--i >= 0);
 
-	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
-	{
-		bitmapword	w = a->words[wordnum];
-
-		if (w != 0)
-			return false;
-	}
-	return true;
+	return result;
 }
 
 
@@ -767,14 +759,11 @@ bms_add_member(Bitmapset *a, int x)
 	/* enlarge the set if necessary */
 	if (wordnum >= a->nwords)
 	{
-		int			oldnwords = a->nwords;
-		int			i;
-
-		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(wordnum + 1));
+		size_t	oldsize;
+	
+		oldsize = BITMAPSET_SIZE(a->nwords);
+		a = (Bitmapset *) repalloc0(a, oldsize, BITMAPSET_SIZE(wordnum + 1));
 		a->nwords = wordnum + 1;
-		/* zero out the enlarged portion */
-		for (i = oldnwords; i < a->nwords; i++)
-			a->words[i] = 0;
 	}
 
 	a->words[wordnum] |= ((bitmapword) 1 << bitnum);
@@ -792,7 +781,8 @@ Bitmapset *
 bms_del_member(Bitmapset *a, int x)
 {
 	int			wordnum,
-				bitnum;
+				bitnum,
+				i;
 
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
@@ -800,14 +790,32 @@ bms_del_member(Bitmapset *a, int x)
 		return NULL;
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
-	if (wordnum < a->nwords)
-		a->words[wordnum] &= ~((bitmapword) 1 << bitnum);
-	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(a))
+
+	/* member can't exist.  Return 'a' unmodified */
+	if (unlikely(wordnum >= a->nwords))
+		return a;
+
+	a->words[wordnum] &= ~((bitmapword) 1 << bitnum);
+
+	/* when last word becomes empty, trim off all trailing empty words */
+	i = a->nwords - 1;
+	if (wordnum == i && a->words[i] == 0)
 	{
+		/* find the last non-empty word and make that the new final word */
+		do
+		{
+			if (a->words[i])
+			{
+				a->nwords = i + 1;
+				return a;
+			}
+		} while (--i >= 0);
+
+		/* the set is now empty */
 		pfree(a);
 		return NULL;
 	}
+
 	return a;
 }
 
@@ -819,7 +827,6 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 {
 	Bitmapset  *result;
 	const Bitmapset *other;
-	int			otherlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -827,6 +834,7 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 		return bms_copy(b);
 	if (b == NULL)
 		return a;
+
 	/* Identify shorter and longer input; copy the longer one if needed */
 	if (a->nwords < b->nwords)
 	{
@@ -838,10 +846,14 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 		result = a;
 		other = b;
 	}
+
 	/* And union the shorter input into the result */
-	otherlen = other->nwords;
-	for (i = 0; i < otherlen; i++)
+	i = other->nwords - 1;
+	do
+	{
 		result->words[i] |= other->words[i];
+	} while (--i >= 0);
+
 	if (result != a)
 		pfree(a);
 	return result;
@@ -880,15 +892,12 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 	}
 	else if (uwordnum >= a->nwords)
 	{
-		int			oldnwords = a->nwords;
-		int			i;
+		size_t	oldsize;
 
 		/* ensure we have enough words to store the upper bit */
-		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+		oldsize = BITMAPSET_SIZE(a->nwords);
+		a = (Bitmapset *) repalloc0(a, oldsize, BITMAPSET_SIZE(uwordnum + 1));
 		a->nwords = uwordnum + 1;
-		/* zero out the enlarged portion */
-		for (i = oldnwords; i < a->nwords; i++)
-			a->words[i] = 0;
 	}
 
 	wordnum = lwordnum = WORDNUM(lower);
@@ -927,6 +936,7 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 Bitmapset *
 bms_int_members(Bitmapset *a, const Bitmapset *b)
 {
+	int			lastnonzero;
 	int			shortlen;
 	int			i;
 
@@ -938,18 +948,28 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 		pfree(a);
 		return NULL;
 	}
+
 	/* Intersect b into a; we need never copy */
 	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	lastnonzero = -1;
+	i = shortlen - 1;
+	do
+	{
 		a->words[i] &= b->words[i];
-	for (; i < a->nwords; i++)
-		a->words[i] = 0;
+
+		if (lastnonzero == -1 && a->words[i])
+			lastnonzero = i;
+	} while (--i >= 0);
+
 	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(a))
+	if (lastnonzero == -1)
 	{
 		pfree(a);
 		return NULL;
 	}
+
+	/* get rid of trailing zero words */
+	a->nwords = lastnonzero + 1;
 	return a;
 }
 
@@ -959,7 +979,6 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 Bitmapset *
 bms_del_members(Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -967,16 +986,46 @@ bms_del_members(Bitmapset *a, const Bitmapset *b)
 		return NULL;
 	if (b == NULL)
 		return a;
+
 	/* Remove b's bits from a; we need never copy */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
-		a->words[i] &= ~b->words[i];
-	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(a))
+	if (a->nwords > b->nwords)
 	{
-		pfree(a);
-		return NULL;
+		/*
+		 * We'll never need to remove trailing zero words when 'a' has more
+		 * words than 'b'.
+		 */
+		i = b->nwords - 1;
+		do
+		{
+			a->words[i] &= ~b->words[i];
+		} while (--i >= 0);
 	}
+	else
+	{
+		int			lastnonzero = -1;
+
+		/* we may need to remove trailing zero words from the result. */
+		i = a->nwords - 1;
+		do
+		{
+			a->words[i] &= ~b->words[i];
+
+			/* remember the last non-zero word */
+			if (lastnonzero == -1 && a->words[i])
+				lastnonzero = i;
+		} while (--i >= 0);
+
+		/* check if 'a' has become empty */
+		if (lastnonzero == -1)
+		{
+			pfree(a);
+			return NULL;
+		}
+
+		/* trim off any trailing zero words */
+		a->nwords = lastnonzero + 1;
+	}
+
 	return a;
 }
 
@@ -988,7 +1037,6 @@ bms_join(Bitmapset *a, Bitmapset *b)
 {
 	Bitmapset  *result;
 	Bitmapset  *other;
-	int			otherlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -996,6 +1044,7 @@ bms_join(Bitmapset *a, Bitmapset *b)
 		return b;
 	if (b == NULL)
 		return a;
+
 	/* Identify shorter and longer input; use longer one as result */
 	if (a->nwords < b->nwords)
 	{
@@ -1007,10 +1056,14 @@ bms_join(Bitmapset *a, Bitmapset *b)
 		result = a;
 		other = b;
 	}
+
 	/* And union the shorter input into the result */
-	otherlen = other->nwords;
-	for (i = 0; i < otherlen; i++)
+	i = other->nwords - 1;
+	do
+	{
 		result->words[i] |= other->words[i];
+	} while (--i >= 0);
+
 	if (other != result)		/* pure paranoia */
 		pfree(other);
 	return result;
@@ -1054,7 +1107,7 @@ bms_next_member(const Bitmapset *a, int prevbit)
 		/* ignore bits before prevbit */
 		w &= mask;
 
-		if (w != 0)
+		if (w)
 		{
 			int			result;
 
@@ -1123,7 +1176,7 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 		/* mask out bits left of prevbit */
 		w &= mask;
 
-		if (w != 0)
+		if (w)
 		{
 			int			result;
 
@@ -1140,28 +1193,14 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 
 /*
  * bms_hash_value - compute a hash key for a Bitmapset
- *
- * Note: we must ensure that any two bitmapsets that are bms_equal() will
- * hash to the same value; in practice this means that trailing all-zero
- * words must not affect the result.  Hence we strip those before applying
- * hash_any().
  */
 uint32
 bms_hash_value(const Bitmapset *a)
 {
-	int			lastword;
-
 	if (a == NULL)
 		return 0;				/* All empty sets hash to 0 */
-	for (lastword = a->nwords; --lastword >= 0;)
-	{
-		if (a->words[lastword] != 0)
-			break;
-	}
-	if (lastword < 0)
-		return 0;				/* All empty sets hash to 0 */
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
-								   (lastword + 1) * sizeof(bitmapword)));
+								   a->nwords * sizeof(bitmapword)));
 }
 
 /*
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 14de6a9ff1..a4c2ba6712 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,7 +90,7 @@ extern Bitmapset *bms_difference(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
 extern BMS_Comparison bms_subset_compare(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_member(int x, const Bitmapset *a);
-extern int	bms_member_index(Bitmapset *a, int x);
+extern int	bms_member_index(const Bitmapset *a, int x);
 extern bool bms_overlap(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_overlap_list(const Bitmapset *a, const struct List *b);
 extern bool bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b);

create-tables-a.sqlapplication/octet-stream; name=create-tables-a.sqlDownload

bench_bitmapset_ubuntu.txttext/plain; charset=US-ASCII; name=bench_bitmapset_ubuntu.txtDownload

bench_bitmapset_ubuntu.odsapplication/vnd.oasis.opendocument.spreadsheet; name=bench_bitmapset_ubuntu.odsDownload

bench_bitmapset_windows.odsapplication/vnd.oasis.opendocument.spreadsheet; name=bench_bitmapset_windows.odsDownload

create-tables-b.sqlapplication/octet-stream; name=create-tables-b.sqlDownload

query-a.sqlapplication/octet-stream; name=query-a.sqlDownload

pgbench_prepared1.txttext/plain; charset=US-ASCII; name=pgbench_prepared1.txtDownload

query-b.sqlapplication/octet-stream; name=query-b.sqlDownload

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Ranier Vilela (#1)

Re: Speeding Up Bitmapset

On Mon, 26 Jun 2023 at 00:29, Ranier Vilela <ranier.vf@gmail.com> wrote:

Yuya Watari presented a series of patches, with the objective of improving the Bitmapset [1].
After reading through the patches, I saw a lot of good ideas and thought I'd help.
Unfortunately, my suggestions were not well received.

For the future, I recommend if you do have ideas that you wish to
convey and think a patch is the easiest way to do that, then it's best
to send those with an extension that won't cause the CFbot to pickup
your patch instead. You patch did and still does seem to have a load
of extra changes that are unjustified by your claims. There are quite
a lot of white space changes going on. There's no reason in the world
that those will speed up Bitmapsets, so why include them?

v7 outperforms v4 by 29% (Query A)

I tested v7 with query-a and I also see additional gains. However,
it's entirely down to your changes to bms_is_subset(). It seems, by
chance, with the given Bitmapsets that looping backwards for the given
sets is able to determine the result more quickly

Here's some results from "perf top"

query-a
v4

30.08% postgres [.] bms_is_subset
15.84% postgres [.] create_join_clause
13.54% postgres [.] bms_equal
11.03% postgres [.] get_eclass_for_sort_expr
8.53% postgres [.] generate_implied_equalities_for_column
3.11% postgres [.] generate_join_implied_equalities_normal
1.03% postgres [.] add_child_rel_equivalences
0.82% postgres [.] SearchCatCacheInternal
0.73% postgres [.] AllocSetAlloc
0.53% postgres [.] find_ec_member_matching_expr
0.40% postgres [.] hash_search_with_hash_value
0.36% postgres [.] palloc
0.36% postgres [.] palloc0

latency average = 452.480 ms

v7
20.51% postgres [.] create_join_clause
15.33% postgres [.] bms_equal
14.17% postgres [.] get_eclass_for_sort_expr
12.05% postgres [.] bms_is_subset
10.40% postgres [.] generate_implied_equalities_for_column
3.90% postgres [.] generate_join_implied_equalities_normal
1.34% postgres [.] add_child_rel_equivalences
1.06% postgres [.] AllocSetAlloc
1.00% postgres [.] SearchCatCacheInternal
0.72% postgres [.] find_ec_member_matching_expr
0.58% postgres [.] palloc0
0.49% postgres [.] palloc
0.47% postgres [.] hash_search_with_hash_value
0.44% libc.so.6 [.] __memmove_avx_unaligned_erms

latency average = 350.543 ms

modified v7's bms_is_subset to go forwards then I get latency average
= 445.987 ms.

If I add some debugging to bms_is_subset to have it record how many
words it checks, I see:

postgres=# select sum(nwords) from forward;
sum
-----------
181490660
(1 row)

postgres=# select sum(nwords) from backwards;
sum
----------
11322564
(1 row)

So, it took about 181 million loops in bms_is_member to plan query-a
when looping forward, but only 11 million when looping backwards.

I think unless you've got some reason that you're able to justify why
we're always more likely to have to perform fewer word checks in
bms_is_subset() by looping backwards instead of forwards then I think
the performance gains you're showing here only happen to show better
results due to the given workload. It's just as easy to imagine that
you'll apply the equivalent slowdown for some other workload where the
initial words differ but all the remaining words all match.

This seems equivalent to someone suggesting that looping backwards in
list_member() is better than looping forwards because it'll find the
member more quickly. I can't imagine us ever considering that would
be a good change to make and I think the same for bms_is_member().

David

Ranier Vilela

ranier.vf@gmail.com

over 2 years ago

In reply to: David Rowley (#2)

1 attachment(s)

Re: Speeding Up Bitmapset

Em dom., 25 de jun. de 2023 às 17:49, David Rowley <dgrowleyml@gmail.com>
escreveu:

There's no reason in the world
that those will speed up Bitmapsets, so why include them?

Of course optimization is the most important thing,
but since you're going to touch the source, why not make it more readable.

v7 outperforms v4 by 29% (Query A)

I tested v7 with query-a and I also see additional gains. However,
it's entirely down to your changes to bms_is_subset(). It seems, by
chance, with the given Bitmapsets that looping backwards for the given
sets is able to determine the result more quickly

I redid the tests and it seems that most of the difference comes from
bms_subset.

Here's some results from "perf top"

query-a
v4

30.08% postgres [.] bms_is_subset
15.84% postgres [.] create_join_clause
13.54% postgres [.] bms_equal
11.03% postgres [.] get_eclass_for_sort_expr
8.53% postgres [.] generate_implied_equalities_for_column
3.11% postgres [.] generate_join_implied_equalities_normal
1.03% postgres [.] add_child_rel_equivalences
0.82% postgres [.] SearchCatCacheInternal
0.73% postgres [.] AllocSetAlloc
0.53% postgres [.] find_ec_member_matching_expr
0.40% postgres [.] hash_search_with_hash_value
0.36% postgres [.] palloc
0.36% postgres [.] palloc0

latency average = 452.480 ms

v7
20.51% postgres [.] create_join_clause
15.33% postgres [.] bms_equal
14.17% postgres [.] get_eclass_for_sort_expr
12.05% postgres [.] bms_is_subset
10.40% postgres [.] generate_implied_equalities_for_column
3.90% postgres [.] generate_join_implied_equalities_normal
1.34% postgres [.] add_child_rel_equivalences
1.06% postgres [.] AllocSetAlloc
1.00% postgres [.] SearchCatCacheInternal
0.72% postgres [.] find_ec_member_matching_expr
0.58% postgres [.] palloc0
0.49% postgres [.] palloc
0.47% postgres [.] hash_search_with_hash_value
0.44% libc.so.6 [.] __memmove_avx_unaligned_erms

latency average = 350.543 ms

modified v7's bms_is_subset to go forwards then I get latency average
= 445.987 ms.

If I add some debugging to bms_is_subset to have it record how many
words it checks, I see:

postgres=# select sum(nwords) from forward;
sum
-----------
181490660
(1 row)

postgres=# select sum(nwords) from backwards;
sum
----------
11322564
(1 row)

So, it took about 181 million loops in bms_is_member to plan query-a
when looping forward, but only 11 million when looping backwards.

I think unless you've got some reason that you're able to justify why
we're always more likely to have to perform fewer word checks in
bms_is_subset() by looping backwards instead of forwards then I think
the performance gains you're showing here only happen to show better
results due to the given workload. It's just as easy to imagine that
you'll apply the equivalent slowdown for some other workload where the
initial words differ but all the remaining words all match.

Have you seen bms_compare?
For some reason someone thought it would be better to loop backward the
array.

Since bms_subset performed very well with backward, I think that's a good
reason to leave it as bms_compare.
As in most cases, the array size is small, in general in both modes the
performance will be equivalent.

Anyway, I made a new patch v8, based on v4 but with some changes that I
believe improve it.

Windows 64 bits
msvc 2019 64 bits

== Query A ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-a.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-a.sql
=============

patched v4:
Time: 2305,445 ms (00:02,305)
Time: 2185,972 ms (00:02,186)
Time: 2177,434 ms (00:02,177)
Time: 2169,883 ms (00:02,170)

patched v8:
Time: 2143,532 ms (00:02,144)
Time: 2140,313 ms (00:02,140)
Time: 2138,481 ms (00:02,138)
Time: 2130,290 ms (00:02,130)

== Query B ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-b.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-b.sql

patched v4:
Time: 2684,360 ms (00:02,684)
Time: 2482,571 ms (00:02,483)
Time: 2452,699 ms (00:02,453)
Time: 2465,223 ms (00:02,465)

patched v8:
Time: 2493,281 ms (00:02,493)
Time: 2490,090 ms (00:02,490)
Time: 2432,515 ms (00:02,433)
Time: 2426,860 ms (00:02,427)

linux Ubuntu 64 bit
gcc 64 bits

== Query A ==
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/create-tables-a.sql
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/query-a.sql
=============

patched v4:
Time: 933,181 ms
Time: 931,520 ms
Time: 906,496 ms
Time: 872,446 ms

patched v8:
Time: 937,079 ms
Time: 930,408 ms
Time: 865,548 ms
Time: 865,382 ms

== Query B ==
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/create-tables-b.sql
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/query-b.sql

patched v4:
Time: 1581,317 ms (00:01,581)
Time: 1568,371 ms (00:01,568)
Time: 1468,036 ms (00:01,468)
Time: 1445,698 ms (00:01,446)

patched v8:
Time: 1437,997 ms (00:01,438)
Time: 1437,435 ms (00:01,437)
Time: 1440,422 ms (00:01,440)
Time: 1436,112 ms (00:01,436)

regards,
Ranier Vilela

Attachments:

speeding-up-bitmapset-v8.patchapplication/octet-stream; name=speeding-up-bitmapset-v8.patchDownload

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 7ba3cf635b..a41347b7c1 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -5,8 +5,16 @@
  *
  * A bitmap set can represent any set of nonnegative integers, although
  * it is mainly intended for sets where the maximum value is not large,
- * say at most a few hundred.  By convention, we always represent the
- * empty set by a NULL pointer.
+ * say at most a few hundred.  By convention, we always represent a set with
+ * the minimum possible number of words, i.e, there are never any trailing
+ * zero words.  Enforcing this requires that an empty set is represented as
+ * NULL.  Because an empty Bitmapset is represented as NULL, a non-NULL
+ * Bitmapset always has at least 1 Bitmapword.  We can exploit this fact to
+ * speedup various loops over the Bitmapset's words array by using "do while"
+ * loops instead of "for" loops.  This means the code does not waste time
+ * checking the loop condition before the first iteration.  For Bitmapsets
+ * containing only a single word (likely the majority of them) this reduces
+ * the loop condition tests by half.
  *
  *
  * Copyright (c) 2003-2023, PostgreSQL Global Development Group
@@ -64,8 +72,6 @@
 #error "invalid BITS_PER_BITMAPWORD"
 #endif
 
-static bool bms_is_empty_internal(const Bitmapset *a);
-
 
 /*
  * bms_copy - make a palloc'd copy of a bitmapset
@@ -85,18 +91,12 @@ bms_copy(const Bitmapset *a)
 }
 
 /*
- * bms_equal - are two bitmapsets equal?
- *
- * This is logical not physical equality; in particular, a NULL pointer will
- * be reported as equal to a palloc'd value containing no members.
+ * bms_equal - are two bitmapsets equal? or both NULL?
  */
 bool
 bms_equal(const Bitmapset *a, const Bitmapset *b)
 {
-	const Bitmapset *shorter;
-	const Bitmapset *longer;
-	int			shortlen;
-	int			longlen;
+	int 		nwords;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -108,30 +108,20 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 	}
 	else if (b == NULL)
 		return false;
-	/* Identify shorter and longer input */
-	if (a->nwords <= b->nwords)
-	{
-		shorter = a;
-		longer = b;
-	}
-	else
-	{
-		shorter = b;
-		longer = a;
-	}
-	/* And process */
-	shortlen = shorter->nwords;
-	for (i = 0; i < shortlen; i++)
-	{
-		if (shorter->words[i] != longer->words[i])
-			return false;
-	}
-	longlen = longer->nwords;
-	for (; i < longlen; i++)
+
+	/* can't be equal if the word counts don't match */
+	if (a->nwords != b->nwords)
+		return false;
+
+	/* check each word matches */
+	nwords = a->nwords;
+	i = 0;
+	do
 	{
-		if (longer->words[i] != 0)
+		if (a->words[i] != b->words[i])
 			return false;
-	}
+	} while (++i < nwords);
+
 	return true;
 }
 
@@ -146,7 +136,6 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 int
 bms_compare(const Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -154,28 +143,20 @@ bms_compare(const Bitmapset *a, const Bitmapset *b)
 		return (b == NULL) ? 0 : -1;
 	else if (b == NULL)
 		return +1;
-	/* Handle cases where one input is longer than the other */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = shortlen; i < a->nwords; i++)
-	{
-		if (a->words[i] != 0)
-			return +1;
-	}
-	for (i = shortlen; i < b->nwords; i++)
-	{
-		if (b->words[i] != 0)
-			return -1;
-	}
-	/* Process words in common */
-	i = shortlen;
-	while (--i >= 0)
+
+	/* the set with the most words must be greater */
+	if (a->nwords != b->nwords)
+		return (a->nwords > b->nwords) ? +1 : -1;
+
+	i = a->nwords - 1;
+	do
 	{
 		bitmapword	aw = a->words[i];
 		bitmapword	bw = b->words[i];
 
 		if (aw != bw)
 			return (aw > bw) ? +1 : -1;
-	}
+	} while (--i >= 0);
 	return 0;
 }
 
@@ -248,8 +229,11 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 	}
 	/* And union the shorter input into the result */
 	otherlen = other->nwords;
-	for (i = 0; i < otherlen; i++)
+	i = 0;
+	do
+	{
 		result->words[i] |= other->words[i];
+	} while (++i < otherlen);
 	return result;
 }
 
@@ -261,6 +245,7 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 {
 	Bitmapset  *result;
 	const Bitmapset *other;
+	int			lastnonzero;
 	int			resultlen;
 	int			i;
 
@@ -280,14 +265,24 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 	}
 	/* And intersect the longer input with the result */
 	resultlen = result->nwords;
-	for (i = 0; i < resultlen; i++)
+	lastnonzero = -1;
+	i = 0;
+	do
+	{
 		result->words[i] &= other->words[i];
+
+		if (result->words[i])
+			lastnonzero = i;
+	} while (++i < resultlen);
 	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(result))
+	if (lastnonzero == -1)
 	{
 		pfree(result);
 		return NULL;
 	}
+
+	/* get rid of trailing zero words */
+	result->nwords = lastnonzero + 1;
 	return result;
 }
 
@@ -298,7 +293,7 @@ Bitmapset *
 bms_difference(const Bitmapset *a, const Bitmapset *b)
 {
 	Bitmapset  *result;
-	int			shortlen;
+	int 		nwords;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -317,10 +312,42 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 
 	/* Copy the left input */
 	result = bms_copy(a);
+
 	/* And remove b's bits from result */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
-		result->words[i] &= ~b->words[i];
+	if (result->nwords > b->nwords)
+	{
+		/*
+		 * We'll never need to remove trailing zero words when 'a' has more
+		 * words than 'b' as the additional words must be non-zero.
+		 */
+		nwords = b->nwords;
+		i = 0;
+		do
+		{
+			result->words[i] &= ~b->words[i];
+		} while (++i < nwords);
+	}
+	else
+	{
+		int			lastnonzero = -1;
+
+		/* we may need to remove trailing zero words from the result. */
+		nwords = result->nwords;
+		i = 0;
+		do
+		{
+			result->words[i] &= ~b->words[i];
+
+			/* remember the last non-zero word */
+			if (result->words[i])
+				lastnonzero = i;
+		} while (++i < nwords);
+
+		/* trim off trailing zero words */
+		result->nwords = lastnonzero + 1;
+	}
+	Assert(result->nwords != 0);
+
 	/* Need not check for empty result, since we handled that case above */
 	return result;
 }
@@ -331,8 +358,7 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 bool
 bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
-	int			longlen;
+	int			nwords;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -340,23 +366,20 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 		return true;			/* empty set is a subset of anything */
 	if (b == NULL)
 		return false;
-	/* Check common words */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+
+	/* 'a' can't be a subset of 'b' if it contains more words */
+	if (a->nwords > b->nwords)
+		return false;
+
+	/* Check all 'a' members are set in 'b' */
+	nwords = a->nwords;
+	i = 0;
+	do
 	{
 		if ((a->words[i] & ~b->words[i]) != 0)
 			return false;
-	}
-	/* Check extra words */
-	if (a->nwords > b->nwords)
-	{
-		longlen = a->nwords;
-		for (; i < longlen; i++)
-		{
-			if (a->words[i] != 0)
-				return false;
-		}
-	}
+	} while (++i < nwords);
+
 	return true;
 }
 
@@ -370,7 +393,6 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 {
 	BMS_Comparison result;
 	int			shortlen;
-	int			longlen;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -385,7 +407,8 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	/* Check common words */
 	result = BMS_EQUAL;			/* status so far */
 	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	i = 0;
+	do
 	{
 		bitmapword	aword = a->words[i];
 		bitmapword	bword = b->words[i];
@@ -404,35 +427,21 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 				return BMS_DIFFERENT;
 			result = BMS_SUBSET1;
 		}
-	}
+	} while (++i < shortlen);
 	/* Check extra words */
 	if (a->nwords > b->nwords)
 	{
-		longlen = a->nwords;
-		for (; i < longlen; i++)
-		{
-			if (a->words[i] != 0)
-			{
-				/* a is not a subset of b */
-				if (result == BMS_SUBSET1)
-					return BMS_DIFFERENT;
-				result = BMS_SUBSET2;
-			}
-		}
+		/* if a has more words then a is not a subset of b */
+		if (result == BMS_SUBSET1)
+			return BMS_DIFFERENT;
+		return BMS_SUBSET2;
 	}
 	else if (a->nwords < b->nwords)
 	{
-		longlen = b->nwords;
-		for (; i < longlen; i++)
-		{
-			if (b->words[i] != 0)
-			{
-				/* b is not a subset of a */
-				if (result == BMS_SUBSET2)
-					return BMS_DIFFERENT;
-				result = BMS_SUBSET1;
-			}
-		}
+		/* if b has more words then b is not a subset of a */
+		if (result == BMS_SUBSET2)
+			return BMS_DIFFERENT;
+		return BMS_SUBSET1;
 	}
 	return result;
 }
@@ -467,7 +476,7 @@ bms_is_member(int x, const Bitmapset *a)
  * Returns (-1) when x is not a member.
  */
 int
-bms_member_index(Bitmapset *a, int x)
+bms_member_index(const Bitmapset *a, int x)
 {
 	int			i;
 	int			bitnum;
@@ -518,11 +527,12 @@ bms_overlap(const Bitmapset *a, const Bitmapset *b)
 		return false;
 	/* Check words in common */
 	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	i = 0;
+	do
 	{
 		if ((a->words[i] & b->words[i]) != 0)
 			return true;
-	}
+	} while (++i < shortlen);
 	return false;
 }
 
@@ -563,7 +573,7 @@ bms_overlap_list(const Bitmapset *a, const List *b)
 bool
 bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
+	int			nwords;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -571,19 +581,17 @@ bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 		return false;
 	if (b == NULL)
 		return true;
-	/* Check words in common */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	/* if 'a' has more words then it must contain additional members */
+	if (a->nwords > b->nwords)
+		return true;
+	/* Check all 'a' members are set in 'b' */
+	nwords = a->nwords;
+	i = 0;
+	do
 	{
 		if ((a->words[i] & ~b->words[i]) != 0)
 			return true;
-	}
-	/* Check extra words in a */
-	for (; i < a->nwords; i++)
-	{
-		if (a->words[i] != 0)
-			return true;
-	}
+	} while (++i < nwords);
 	return false;
 }
 
@@ -602,7 +610,8 @@ bms_singleton_member(const Bitmapset *a)
 	if (a == NULL)
 		elog(ERROR, "bitmapset is empty");
 	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+	wordnum = 0;
+	do
 	{
 		bitmapword	w = a->words[wordnum];
 
@@ -613,9 +622,10 @@ bms_singleton_member(const Bitmapset *a)
 			result = wordnum * BITS_PER_BITMAPWORD;
 			result += bmw_rightmost_one_pos(w);
 		}
-	}
-	if (result < 0)
-		elog(ERROR, "bitmapset is empty");
+	} while (++wordnum < nwords);
+
+	/* we don't expect non-NULL sets to be empty */
+	Assert(result >= 0);
 	return result;
 }
 
@@ -640,7 +650,8 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 	if (a == NULL)
 		return false;
 	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+	wordnum = 0;
+	do
 	{
 		bitmapword	w = a->words[wordnum];
 
@@ -651,9 +662,10 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 			result = wordnum * BITS_PER_BITMAPWORD;
 			result += bmw_rightmost_one_pos(w);
 		}
-	}
-	if (result < 0)
-		return false;
+	} while (++wordnum < nwords);
+
+	/* we don't expect non-NULL sets to be empty */
+	Assert(result >= 0);
 	*member = result;
 	return true;
 }
@@ -671,14 +683,15 @@ bms_num_members(const Bitmapset *a)
 	if (a == NULL)
 		return 0;
 	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+	wordnum = 0;
+	do
 	{
 		bitmapword	w = a->words[wordnum];
 
 		/* No need to count the bits in a zero word */
 		if (w != 0)
 			result += bmw_popcount(w);
-	}
+	} while (++wordnum < nwords);
 	return result;
 }
 
@@ -697,7 +710,8 @@ bms_membership(const Bitmapset *a)
 	if (a == NULL)
 		return BMS_EMPTY_SET;
 	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
+	wordnum = 0;
+	do
 	{
 		bitmapword	w = a->words[wordnum];
 
@@ -707,34 +721,10 @@ bms_membership(const Bitmapset *a)
 				return BMS_MULTIPLE;
 			result = BMS_SINGLETON;
 		}
-	}
+	} while (++wordnum < nwords);
 	return result;
 }
 
-/*
- * bms_is_empty_internal - is a set empty?
- *
- * This is now used only locally, to detect cases where a function has
- * computed an empty set that we must now get rid of.  Hence, we can
- * assume the input isn't NULL.
- */
-static bool
-bms_is_empty_internal(const Bitmapset *a)
-{
-	int			nwords;
-	int			wordnum;
-
-	nwords = a->nwords;
-	for (wordnum = 0; wordnum < nwords; wordnum++)
-	{
-		bitmapword	w = a->words[wordnum];
-
-		if (w != 0)
-			return false;
-	}
-	return true;
-}
-
 
 /*
  * These operations all "recycle" their non-const inputs, ie, either
@@ -767,14 +757,11 @@ bms_add_member(Bitmapset *a, int x)
 	/* enlarge the set if necessary */
 	if (wordnum >= a->nwords)
 	{
-		int			oldnwords = a->nwords;
-		int			i;
+		size_t	oldsize;
 
-		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(wordnum + 1));
+		oldsize = BITMAPSET_SIZE(a->nwords);
+		a = (Bitmapset *) repalloc0(a, oldsize, BITMAPSET_SIZE(wordnum + 1));
 		a->nwords = wordnum + 1;
-		/* zero out the enlarged portion */
-		for (i = oldnwords; i < a->nwords; i++)
-			a->words[i] = 0;
 	}
 
 	a->words[wordnum] |= ((bitmapword) 1 << bitnum);
@@ -792,7 +779,8 @@ Bitmapset *
 bms_del_member(Bitmapset *a, int x)
 {
 	int			wordnum,
-				bitnum;
+				bitnum,
+				i;
 
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
@@ -800,14 +788,32 @@ bms_del_member(Bitmapset *a, int x)
 		return NULL;
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
-	if (wordnum < a->nwords)
-		a->words[wordnum] &= ~((bitmapword) 1 << bitnum);
-	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(a))
+
+	/* member can't exist.  Return 'a' unmodified */
+	if (unlikely(wordnum >= a->nwords))
+		return a;
+
+	a->words[wordnum] &= ~((bitmapword) 1 << bitnum);
+
+	/* when last word becomes empty, trim off all trailing empty words */
+	i = a->nwords - 1;
+	if (wordnum == i && a->words[i] == 0)
 	{
+		/* find the last non-empty word and make that the new final word */
+		while (--i >= 0)
+		{
+			if (a->words[i])
+			{
+				a->nwords = i + 1;
+				return a;
+			}
+		}
+
+		/* the set is now empty */
 		pfree(a);
 		return NULL;
 	}
+
 	return a;
 }
 
@@ -840,8 +846,11 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	}
 	/* And union the shorter input into the result */
 	otherlen = other->nwords;
-	for (i = 0; i < otherlen; i++)
+	i = 0;
+	do
+	{
 		result->words[i] |= other->words[i];
+	} while (++i < otherlen);
 	if (result != a)
 		pfree(a);
 	return result;
@@ -880,15 +889,12 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 	}
 	else if (uwordnum >= a->nwords)
 	{
-		int			oldnwords = a->nwords;
-		int			i;
+		size_t	oldsize;
 
 		/* ensure we have enough words to store the upper bit */
-		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+		oldsize = BITMAPSET_SIZE(a->nwords);
+		a = (Bitmapset *) repalloc0(a, oldsize, BITMAPSET_SIZE(uwordnum + 1));
 		a->nwords = uwordnum + 1;
-		/* zero out the enlarged portion */
-		for (i = oldnwords; i < a->nwords; i++)
-			a->words[i] = 0;
 	}
 
 	wordnum = lwordnum = WORDNUM(lower);
@@ -927,6 +933,7 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 Bitmapset *
 bms_int_members(Bitmapset *a, const Bitmapset *b)
 {
+	int			lastnonzero;
 	int			shortlen;
 	int			i;
 
@@ -940,16 +947,25 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 	}
 	/* Intersect b into a; we need never copy */
 	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
+	lastnonzero = -1;
+	i = 0;
+	do
+	{
 		a->words[i] &= b->words[i];
-	for (; i < a->nwords; i++)
-		a->words[i] = 0;
+
+		if (a->words[i])
+			lastnonzero = i;
+	} while (++i < shortlen);
+
 	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(a))
+	if (lastnonzero == -1)
 	{
 		pfree(a);
 		return NULL;
 	}
+
+	/* get rid of trailing zero words */
+	a->nwords = lastnonzero + 1;
 	return a;
 }
 
@@ -959,7 +975,7 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 Bitmapset *
 bms_del_members(Bitmapset *a, const Bitmapset *b)
 {
-	int			shortlen;
+	int			nwords;
 	int			i;
 
 	/* Handle cases where either input is NULL */
@@ -968,15 +984,46 @@ bms_del_members(Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return a;
 	/* Remove b's bits from a; we need never copy */
-	shortlen = Min(a->nwords, b->nwords);
-	for (i = 0; i < shortlen; i++)
-		a->words[i] &= ~b->words[i];
-	/* If we computed an empty result, we must return NULL */
-	if (bms_is_empty_internal(a))
+	if (a->nwords > b->nwords)
 	{
-		pfree(a);
-		return NULL;
+		/*
+		 * We'll never need to remove trailing zero words when 'a' has more
+		 * words than 'b'.
+		 */
+		nwords = b->nwords;
+		i = 0;
+		do
+		{
+			a->words[i] &= ~b->words[i];
+		} while (++i < nwords);
 	}
+	else
+	{
+		int			lastnonzero = -1;
+
+		/* we may need to remove trailing zero words from the result. */
+		nwords = a->nwords;
+		i = 0;
+		do
+		{
+			a->words[i] &= ~b->words[i];
+
+			/* remember the last non-zero word */
+			if (a->words[i])
+				lastnonzero = i;
+		} while (++i < nwords);
+
+		/* check if 'a' has become empty */
+		if (lastnonzero == -1)
+		{
+			pfree(a);
+			return NULL;
+		}
+
+		/* trim off any trailing zero words */
+		a->nwords = lastnonzero + 1;
+	}
+
 	return a;
 }
 
@@ -1009,8 +1056,11 @@ bms_join(Bitmapset *a, Bitmapset *b)
 	}
 	/* And union the shorter input into the result */
 	otherlen = other->nwords;
-	for (i = 0; i < otherlen; i++)
+	i = 0;
+	do
+	{
 		result->words[i] |= other->words[i];
+	} while (++i < otherlen);
 	if (other != result)		/* pure paranoia */
 		pfree(other);
 	return result;
@@ -1140,28 +1190,14 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 
 /*
  * bms_hash_value - compute a hash key for a Bitmapset
- *
- * Note: we must ensure that any two bitmapsets that are bms_equal() will
- * hash to the same value; in practice this means that trailing all-zero
- * words must not affect the result.  Hence we strip those before applying
- * hash_any().
  */
 uint32
 bms_hash_value(const Bitmapset *a)
 {
-	int			lastword;
-
 	if (a == NULL)
 		return 0;				/* All empty sets hash to 0 */
-	for (lastword = a->nwords; --lastword >= 0;)
-	{
-		if (a->words[lastword] != 0)
-			break;
-	}
-	if (lastword < 0)
-		return 0;				/* All empty sets hash to 0 */
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
-								   (lastword + 1) * sizeof(bitmapword)));
+								   a->nwords * sizeof(bitmapword)));
 }
 
 /*
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 14de6a9ff1..a4c2ba6712 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,7 +90,7 @@ extern Bitmapset *bms_difference(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
 extern BMS_Comparison bms_subset_compare(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_is_member(int x, const Bitmapset *a);
-extern int	bms_member_index(Bitmapset *a, int x);
+extern int	bms_member_index(const Bitmapset *a, int x);
 extern bool bms_overlap(const Bitmapset *a, const Bitmapset *b);
 extern bool bms_overlap_list(const Bitmapset *a, const struct List *b);
 extern bool bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b);

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Ranier Vilela (#3)

Re: Speeding Up Bitmapset

On Mon, 26 Jun 2023 at 12:55, Ranier Vilela <ranier.vf@gmail.com> wrote:

Have you seen bms_compare?
For some reason someone thought it would be better to loop backward the array.

That's nothing to do with efficiency. It's related to behaviour. Have
a look at the function's header comment, it's trying to find the set
with the highest value member. It wouldn't make much sense to start
at the lowest-order words for that.

David

David Rowley

dgrowleyml@gmail.com

over 2 years ago

In reply to: Ranier Vilela (#3)

Re: Speeding Up Bitmapset

On Mon, 26 Jun 2023 at 12:55, Ranier Vilela <ranier.vf@gmail.com> wrote:

Em dom., 25 de jun. de 2023 às 17:49, David Rowley <dgrowleyml@gmail.com> escreveu:

There's no reason in the world
that those will speed up Bitmapsets, so why include them?

Of course optimization is the most important thing,
but since you're going to touch the source, why not make it more readable.

Have a look at [1]https://wiki.postgresql.org/wiki/Submitting_a_Patch. In particular "Reasons your patch might be
returned" and precisely "Reformatting lines that haven't changed"

That entire page is quite valuable and I'd recommend having a read of all of it.

David

[1]: https://wiki.postgresql.org/wiki/Submitting_a_Patch