Revise the Asserts added to bimapset manipulation functions

Started by Richard Guoabout 2 years ago11 messages
#1Richard Guo
guofenglinux@gmail.com
1 attachment(s)

The Asserts added to bitmapset.c by commits 71a3e8c43b and 7d58f2342b
contain some duplicates, such as in bms_difference, bms_is_subset,
bms_subset_compare, bms_int_members and bms_join. For instance,

@@ -953,6 +1033,15 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
        int                     lastnonzero;
        int                     shortlen;
        int                     i;
+#ifdef REALLOCATE_BITMAPSETS
+       Bitmapset  *tmp = a;
+#endif
+
+       Assert(a == NULL || IsA(a, Bitmapset));
+       Assert(b == NULL || IsA(b, Bitmapset));
+
+       Assert(a == NULL || IsA(a, Bitmapset));
+       Assert(b == NULL || IsA(b, Bitmapset));

Sorry that I failed to notice those duplicates when reviewing the
patchset, mainly because they were introduced in different patches.

While removing those duplicates, I think we can add checks in the new
Asserts to ensure that Bitmapsets should not contain trailing zero
words, as the old Asserts did. That makes the Asserts in the form of:

Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));

I think we can define a new macro for this form and use it to check that
a Bitmapset is valid.

In passing, I prefer to move the Asserts to the beginning of functions,
just for paranoia's sake.

Hence, proposed patch attached.

Thanks
Richard

Attachments:

v1-0001-Revise-the-Asserts-added-to-bimapset-manipulation-functions.patchapplication/octet-stream; name=v1-0001-Revise-the-Asserts-added-to-bimapset-manipulation-functions.patchDownload
From 7c1832135698abcc0218f6a86e0c1760d4496b8a Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 27 Dec 2023 15:37:41 +0800
Subject: [PATCH v1] Revise the Asserts added to bimapset manipulation
 functions

The Asserts added to bitmapset.c by commits 71a3e8c43b and 7d58f2342b
contain some duplicates, such as in bms_difference, bms_is_subset,
bms_subset_compare, bms_int_members and bms_join.  Sorry that I failed
to notice those duplicates when reviewing the patchset, mainly because
they were introduced in different patches.

While removing those duplicates, this patch adds checks in the new
Asserts to ensure that Bitmapsets should not contain trailing zero
words, as the old Asserts did.  This patch also defines a macro
BITMAPSET_IS_VALID to help check that a Bitmapset is valid.

In passing, this patch moves the Asserts to the beginning of functions,
just for paranoia's sake.
---
 src/backend/nodes/bitmapset.c | 108 ++++++++++++++++------------------
 1 file changed, 51 insertions(+), 57 deletions(-)

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index e13ecaa155..13ad47ca89 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -38,6 +38,9 @@
 #define BITMAPSET_SIZE(nwords)	\
 	(offsetof(Bitmapset, words) + (nwords) * sizeof(bitmapword))
 
+#define BITMAPSET_IS_VALID(bms)	\
+	((bms) == NULL || (IsA((bms), Bitmapset) && (bms)->words[(bms)->nwords - 1] != 0))
+
 /*----------
  * This is a well-known cute trick for isolating the rightmost one-bit
  * in a word.  It assumes two's complement arithmetic.  Consider any
@@ -82,9 +85,10 @@ bms_copy(const Bitmapset *a)
 	Bitmapset  *result;
 	size_t		size;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (a == NULL)
 		return NULL;
-	Assert(IsA(a, Bitmapset));
 	size = BITMAPSET_SIZE(a->nwords);
 	result = (Bitmapset *) palloc(size);
 	memcpy(result, a, size);
@@ -99,8 +103,8 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -140,8 +144,8 @@ bms_compare(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -216,8 +220,8 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 	int			otherlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -257,8 +261,8 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 	int			resultlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
@@ -307,8 +311,8 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 	Bitmapset  *result;
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -316,8 +320,6 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return bms_copy(a);
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/*
 	 * In Postgres' usage, an empty result is a very common case, so it's
 	 * worth optimizing for that by testing bms_nonempty_difference().  This
@@ -374,8 +376,8 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -383,8 +385,6 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return false;
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/* 'a' can't be a subset of 'b' if it contains more words */
 	if (a->nwords > b->nwords)
 		return false;
@@ -411,8 +411,8 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	int			shortlen;
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -424,8 +424,6 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return BMS_SUBSET2;
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/* Check common words */
 	result = BMS_EQUAL;			/* status so far */
 	shortlen = Min(a->nwords, b->nwords);
@@ -477,14 +475,13 @@ bms_is_member(int x, const Bitmapset *a)
 	int			wordnum,
 				bitnum;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	/* XXX better to just return false for x<0 ? */
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return false;
-
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 	if (wordnum >= a->nwords)
@@ -509,12 +506,12 @@ bms_member_index(Bitmapset *a, int x)
 	int			result = 0;
 	bitmapword	mask;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	/* return -1 if not a member of the bitmap */
 	if (!bms_is_member(x, a))
 		return -1;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -549,8 +546,8 @@ bms_overlap(const Bitmapset *a, const Bitmapset *b)
 	int			shortlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
@@ -576,7 +573,7 @@ bms_overlap_list(const Bitmapset *a, const List *b)
 	int			wordnum,
 				bitnum;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
 
 	if (a == NULL || b == NIL)
 		return false;
@@ -607,8 +604,8 @@ bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -640,11 +637,10 @@ bms_singleton_member(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (a == NULL)
 		elog(ERROR, "bitmapset is empty");
-
-	Assert(IsA(a, Bitmapset));
-
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -683,9 +679,10 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 	int			nwords;
 	int			wordnum;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (a == NULL)
 		return false;
-	Assert(IsA(a, Bitmapset));
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -717,9 +714,10 @@ bms_num_members(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (a == NULL)
 		return 0;
-	Assert(IsA(a, Bitmapset));
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -745,9 +743,10 @@ bms_membership(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (a == NULL)
 		return BMS_EMPTY_SET;
-	Assert(IsA(a, Bitmapset));
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -786,11 +785,12 @@ bms_add_member(Bitmapset *a, int x)
 	int			wordnum,
 				bitnum;
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return bms_make_singleton(x);
-	Assert(IsA(a, Bitmapset));
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -847,13 +847,13 @@ bms_del_member(Bitmapset *a, int x)
 	Bitmapset  *tmp = a;
 #endif
 
+	Assert(BITMAPSET_IS_VALID(a));
+
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return NULL;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -900,8 +900,8 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	int			otherlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -955,7 +955,7 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 				ushiftbits,
 				wordnum;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
 
 	/* do nothing if nothing is called for, without further checking */
 	if (upper < lower)
@@ -1037,11 +1037,8 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 	Bitmapset  *tmp = a;
 #endif
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
-
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1093,8 +1090,8 @@ bms_del_members(Bitmapset *a, const Bitmapset *b)
 	Bitmapset  *tmp = a;
 #endif
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1164,11 +1161,8 @@ bms_join(Bitmapset *a, Bitmapset *b)
 	Bitmapset  *tmp = a;
 #endif
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
-
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
+	Assert(BITMAPSET_IS_VALID(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1231,7 +1225,7 @@ bms_next_member(const Bitmapset *a, int prevbit)
 	int			wordnum;
 	bitmapword	mask;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
 
 	if (a == NULL)
 		return -2;
@@ -1292,7 +1286,7 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 	int			ushiftbits;
 	bitmapword	mask;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(BITMAPSET_IS_VALID(a));
 
 	/*
 	 * If set is NULL or if there are no more bits to the right then we've
-- 
2.31.0

#2David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#1)
Re: Revise the Asserts added to bimapset manipulation functions

On Wed, 27 Dec 2023 at 22:30, Richard Guo <guofenglinux@gmail.com> wrote:

The Asserts added to bitmapset.c by commits 71a3e8c43b and 7d58f2342b
contain some duplicates, such as in bms_difference, bms_is_subset,
bms_subset_compare, bms_int_members and bms_join. For instance,

I'm just learning of these changes now. I wonder why that wasn't
done more like:

#ifdef REALLOCATE_BITMAPSETS
static Bitmapset *
bms_clone_and_free(Bitmapset *a)
{
Bitmapset *c = bms_copy(a);

bms_free(a);
return c;
}
#endif

then instead of having to do:

#ifdef REALLOCATE_BITMAPSETS
a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
pfree(tmp);
#endif

all over the place. Couldn't we do:

#ifdef REALLOCATE_BITMAPSETS
return bms_clone_and_free(a);
#else
return a;
#endif

I think it's best to leave at least that and not hide the behaviour
inside a macro.

It would also be good if REALLOCATE_BITMAPSETS was documented in
bitmapset.c to offer some guidance to people modifying the code so
they know under what circumstances they need to return a copy. There
are no comments that offer any indication of what the intentions are
with this :-( What's written in pg_config_manual.h isn't going to
help anyone that's modifying bitmapset.c

While removing those duplicates, I think we can add checks in the new
Asserts to ensure that Bitmapsets should not contain trailing zero
words, as the old Asserts did. That makes the Asserts in the form of:

Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));

I think we can define a new macro for this form and use it to check that
a Bitmapset is valid.

I think that's an improvement. I did have a function for this in [1]/messages/by-id/CAApHDvr5O41MuUjw0DQKqmAnv7QqvmLqXReEd5o4nXTzWp8-+w@mail.gmail.com,
but per [2]/messages/by-id/2686153.1677881312@sss.pgh.pa.us, Tom wasn't a fan. I likely shouldn't have bothered with
the loop there. It seems fine just to ensure the final word isn't 0.

David

[1]: /messages/by-id/CAApHDvr5O41MuUjw0DQKqmAnv7QqvmLqXReEd5o4nXTzWp8-+w@mail.gmail.com
[2]: /messages/by-id/2686153.1677881312@sss.pgh.pa.us

#3David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#2)
1 attachment(s)
Re: Revise the Asserts added to bimapset manipulation functions

On Thu, 28 Dec 2023 at 23:21, David Rowley <dgrowleyml@gmail.com> wrote:

then instead of having to do:

#ifdef REALLOCATE_BITMAPSETS
a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
pfree(tmp);
#endif

all over the place. Couldn't we do:

#ifdef REALLOCATE_BITMAPSETS
return bms_clone_and_free(a);
#else
return a;
#endif

I looked into this a bit more and I just can't see why the current
code feels like it must do the reallocation in functions such as
bms_del_members(). We don't repalloc the set there, ever, so why do
we need to do it when building with REALLOCATE_BITMAPSETS? It seems
to me the point of REALLOCATE_BITMAPSETS is to ensure that *if* we
possibly could end up reallocating the set that we *do* reallocate.

There's also a few cases where you could argue that the
REALLOCATE_BITMAPSETS code has introduced bugs. For example,
bms_del_members(), bms_join() and bms_int_members() are meant to
guarantee that their left input (both inputs in the case of
bms_join()) are recycled. Compiling with REALLOCATE_BITMAPSETS
invalidates that, so it seems about as likely that building with
REALLOCATE_BITMAPSETS could cause bugs rather than find bugs.

I've hacked a bit on this to fix these problems and also added some
documentation to try to offer future people modifying bitmapset.c some
guidance on if they need to care about REALLOCATE_BITMAPSETS.

I also don't think "hangling" is a word. So I've adjusted the comment
in pg_config_manual.h to fix that.

David

Attachments:

bitmapset_fixes_v2.patchtext/plain; charset=US-ASCII; name=bitmapset_fixes_v2.patchDownload
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index e13ecaa155..e0eeefa8bd 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -16,6 +16,15 @@
  * containing only a single word (likely the majority of them) this halves the
  * number of loop condition checks.
  *
+ * Because manipulation of a bitmapset can often be done without having to
+ * reallocate the memory for the set, we support REALLOCATE_BITMAPSETS as a
+ * debug option to have the code *always* reallocate the bitmapset when the
+ * set *could* be reallocated as a result of the modification.  This increases
+ * the likelihood of finding bugs in cases where there's more than one
+ * reference to a set and the additional ones are left pointing to freed
+ * memory.  Such bugs are unlikely to be otherwise found by our regression
+ * tests as the vast majority of bitmapsets will only ever contain a single
+ * bitmapword.
  *
  * Copyright (c) 2003-2023, PostgreSQL Global Development Group
  *
@@ -72,6 +81,45 @@
 #error "invalid BITS_PER_BITMAPWORD"
 #endif
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * bms_is_valid_set - for cassert builds to check for valid sets
+ */
+static bool
+bms_is_valid_set(const Bitmapset *a)
+{
+	/* NULL is the correct representation of an empty set */
+	if (a == NULL)
+		return true;
+
+	/* check the node tag is set correctly.  Freed pointer, maybe? */
+	if (!IsA(a, Bitmapset))
+		return false;
+
+	/* trailing zero words are not allowed */
+	if (a->words[a->nwords - 1] == 0)
+		return false;
+
+	return true;
+}
+#endif
+
+#ifdef REALLOCATE_BITMAPSETS
+/*
+ * bms_copy_and_free
+ *		Only required in REALLOCATE_BITMAPSETS builds.  Provide a simple way to
+ *		return a freshly allocated set similar to what would happen if the set
+ *		had to be enlarged to add additional words.
+ */
+static Bitmapset *
+bms_copy_and_free(Bitmapset *a)
+{
+	Bitmapset  *c = bms_copy(a);
+
+	bms_free(a);
+	return c;
+}
+#endif
 
 /*
  * bms_copy - make a palloc'd copy of a bitmapset
@@ -82,9 +130,11 @@ bms_copy(const Bitmapset *a)
 	Bitmapset  *result;
 	size_t		size;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return NULL;
-	Assert(IsA(a, Bitmapset));
+
 	size = BITMAPSET_SIZE(a->nwords);
 	result = (Bitmapset *) palloc(size);
 	memcpy(result, a, size);
@@ -99,8 +149,8 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -140,8 +190,8 @@ bms_compare(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -216,8 +266,8 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 	int			otherlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -257,8 +307,8 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 	int			resultlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
@@ -307,8 +357,8 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 	Bitmapset  *result;
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -316,8 +366,6 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return bms_copy(a);
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/*
 	 * In Postgres' usage, an empty result is a very common case, so it's
 	 * worth optimizing for that by testing bms_nonempty_difference().  This
@@ -374,8 +422,8 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -383,8 +431,6 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return false;
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/* 'a' can't be a subset of 'b' if it contains more words */
 	if (a->nwords > b->nwords)
 		return false;
@@ -411,8 +457,8 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	int			shortlen;
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -424,8 +470,6 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return BMS_SUBSET2;
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/* Check common words */
 	result = BMS_EQUAL;			/* status so far */
 	shortlen = Min(a->nwords, b->nwords);
@@ -477,14 +521,14 @@ bms_is_member(int x, const Bitmapset *a)
 	int			wordnum,
 				bitnum;
 
+	Assert(bms_is_valid_set(a));
+
 	/* XXX better to just return false for x<0 ? */
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return false;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 	if (wordnum >= a->nwords)
@@ -509,12 +553,12 @@ bms_member_index(Bitmapset *a, int x)
 	int			result = 0;
 	bitmapword	mask;
 
+	Assert(bms_is_valid_set(a));
+
 	/* return -1 if not a member of the bitmap */
 	if (!bms_is_member(x, a))
 		return -1;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -549,8 +593,8 @@ bms_overlap(const Bitmapset *a, const Bitmapset *b)
 	int			shortlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
@@ -576,7 +620,7 @@ bms_overlap_list(const Bitmapset *a, const List *b)
 	int			wordnum,
 				bitnum;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	if (a == NULL || b == NIL)
 		return false;
@@ -607,8 +651,8 @@ bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -640,11 +684,11 @@ bms_singleton_member(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		elog(ERROR, "bitmapset is empty");
 
-	Assert(IsA(a, Bitmapset));
-
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -683,9 +727,11 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return false;
-	Assert(IsA(a, Bitmapset));
+
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -717,9 +763,11 @@ bms_num_members(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return 0;
-	Assert(IsA(a, Bitmapset));
+
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -745,9 +793,11 @@ bms_membership(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return BMS_EMPTY_SET;
-	Assert(IsA(a, Bitmapset));
+
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -786,11 +836,13 @@ bms_add_member(Bitmapset *a, int x)
 	int			wordnum,
 				bitnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return bms_make_singleton(x);
-	Assert(IsA(a, Bitmapset));
+
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -799,15 +851,8 @@ bms_add_member(Bitmapset *a, int x)
 	{
 		int			oldnwords = a->nwords;
 		int			i;
-#ifdef REALLOCATE_BITMAPSETS
-		Bitmapset  *tmp = a;
 
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(wordnum + 1));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-#else
 		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(wordnum + 1));
-#endif
 		a->nwords = wordnum + 1;
 		/* zero out the enlarged portion */
 		i = oldnwords;
@@ -816,19 +861,14 @@ bms_add_member(Bitmapset *a, int x)
 			a->words[i] = 0;
 		} while (++i < a->nwords);
 	}
-#ifdef REALLOCATE_BITMAPSETS
-	else
-	{
-		Bitmapset  *tmp = a;
-
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-	}
-#endif
 
 	a->words[wordnum] |= ((bitmapword) 1 << bitnum);
+
+#ifdef REALLOCATE_BITMAPSETS
+	return bms_copy_and_free(a);
+#else
 	return a;
+#endif
 }
 
 /*
@@ -843,26 +883,17 @@ bms_del_member(Bitmapset *a, int x)
 {
 	int			wordnum,
 				bitnum;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
+
+	Assert(bms_is_valid_set(a));
 
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return NULL;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
-#ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
-#endif
-
 	/* member can't exist.  Return 'a' unmodified */
 	if (unlikely(wordnum >= a->nwords))
 		return a;
@@ -900,8 +931,8 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	int			otherlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -916,13 +947,6 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	}
 	else
 	{
-#ifdef REALLOCATE_BITMAPSETS
-		Bitmapset  *tmp = a;
-
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-#endif
 		result = a;
 		other = b;
 	}
@@ -935,7 +959,12 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	} while (++i < otherlen);
 	if (result != a)
 		pfree(a);
+
+#ifdef REALLOCATE_BITMAPSETS
+	return bms_copy_and_free(result);
+#else
 	return result;
+#endif
 }
 
 /*
@@ -955,7 +984,7 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 				ushiftbits,
 				wordnum;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	/* do nothing if nothing is called for, without further checking */
 	if (upper < lower)
@@ -975,16 +1004,9 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 	{
 		int			oldnwords = a->nwords;
 		int			i;
-#ifdef REALLOCATE_BITMAPSETS
-		Bitmapset  *tmp = a;
 
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(uwordnum + 1));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-#else
 		/* ensure we have enough words to store the upper bit */
 		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
-#endif
 		a->nwords = uwordnum + 1;
 		/* zero out the enlarged portion */
 		i = oldnwords;
@@ -1021,7 +1043,11 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 		a->words[uwordnum] |= (~(bitmapword) 0) >> ushiftbits;
 	}
 
+#ifdef REALLOCATE_BITMAPSETS
+	return bms_copy_and_free(a);
+#else
 	return a;
+#endif
 }
 
 /*
@@ -1033,15 +1059,9 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 	int			lastnonzero;
 	int			shortlen;
 	int			i;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
-
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1052,12 +1072,6 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 		return NULL;
 	}
 
-#ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
-#endif
-
 	/* Intersect b into a; we need never copy */
 	shortlen = Min(a->nwords, b->nwords);
 	lastnonzero = -1;
@@ -1089,12 +1103,9 @@ Bitmapset *
 bms_del_members(Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1102,12 +1113,6 @@ bms_del_members(Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return a;
 
-#ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
-#endif
-
 	/* Remove b's bits from a; we need never copy */
 	if (a->nwords > b->nwords)
 	{
@@ -1160,15 +1165,9 @@ bms_join(Bitmapset *a, Bitmapset *b)
 	Bitmapset  *other;
 	int			otherlen;
 	int			i;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
-
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1176,12 +1175,6 @@ bms_join(Bitmapset *a, Bitmapset *b)
 	if (b == NULL)
 		return a;
 
-#ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
-#endif
-
 	/* Identify shorter and longer input; use longer one as result */
 	if (a->nwords < b->nwords)
 	{
@@ -1231,7 +1224,7 @@ bms_next_member(const Bitmapset *a, int prevbit)
 	int			wordnum;
 	bitmapword	mask;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	if (a == NULL)
 		return -2;
@@ -1292,7 +1285,7 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 	int			ushiftbits;
 	bitmapword	mask;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	/*
 	 * If set is NULL or if there are no more bits to the right then we've
@@ -1337,6 +1330,8 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 uint32
 bms_hash_value(const Bitmapset *a)
 {
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return 0;				/* All empty sets hash to 0 */
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 16c383ba7f..3920a535d2 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -336,8 +336,9 @@
 /* #define COPY_PARSE_PLAN_TREES */
 
 /*
- * Define this to force Bitmapset reallocation on each modification.  Helps
- * to find hangling pointers to Bitmapset's.
+ * Define this to force Bitmapset reallocation on modifications which *could*
+ * result in the set having to be reallocated.  This helps find dangling
+ * pointers to Bitmapset's.
  */
 /* #define REALLOCATE_BITMAPSETS */
 
#4Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#3)
Re: Revise the Asserts added to bimapset manipulation functions

On Fri, Dec 29, 2023 at 9:15 AM David Rowley <dgrowleyml@gmail.com> wrote:

I looked into this a bit more and I just can't see why the current
code feels like it must do the reallocation in functions such as
bms_del_members(). We don't repalloc the set there, ever, so why do
we need to do it when building with REALLOCATE_BITMAPSETS? It seems
to me the point of REALLOCATE_BITMAPSETS is to ensure that *if* we
possibly could end up reallocating the set that we *do* reallocate.

I think the argument here is whether the REALLOCATE_BITMAPSETS option is
intended to force a reallocation for every modification of a bitmapset,
or only for modifications that could potentially require the set to be
reallocated.

IIUC, the v2 patch addresses the latter scenario. I agree that it can
help find bugs in cases where there's more than one reference to a set,
and a modification that could reallocate the bitmapset might leave the
other references being dangling pointers.

It seems to me that the former scenario also makes sense in some cases.
For instance, let's say there are two pointers in two structs, s1->p and
s2->p, both referencing the same bitmapset. If we modify the bitmapset
via s1->p (even if no reallocation could occur), s2 would see different
content via its pointer 'p'. Sometimes this is just wrong and could
cause problems. If we can force a reallocation for every modification
of the bitmapset, it'd be much easier to find such bugs.

Having said that, I think the codes checked in by 71a3e8c43b and
7d58f2342b are far from perfect. And I agree that the bms_copy_and_free
in v2 patch is a good idea, as well as the bms_is_valid_set.

Thanks
Richard

#5David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#4)
Re: Revise the Asserts added to bimapset manipulation functions

On Fri, 29 Dec 2023 at 21:07, Richard Guo <guofenglinux@gmail.com> wrote:

It seems to me that the former scenario also makes sense in some cases.
For instance, let's say there are two pointers in two structs, s1->p and
s2->p, both referencing the same bitmapset. If we modify the bitmapset
via s1->p (even if no reallocation could occur), s2 would see different
content via its pointer 'p'.

That statement simply isn't true. If there's no reallocation then
both pointers point to the same memory and, therefore have the same
content, not different content. In the absence of a reallocation,
then the only time s1->p and s2->p could differ is if s1->p became an
empty set as a result of the modification.

Sometimes this is just wrong and could
cause problems. If we can force a reallocation for every modification
of the bitmapset, it'd be much easier to find such bugs.

Whether it's intended or unintended, at least it's consistent,
therefore isn't going to behave differently if the number of
bitmapwords in the set changes. All REALLOCATE_BITMAPSETS does for
bms_int_members(), bms_join() and bms_del_members() is change one
consistent behaviour (we never reallocate) into some other consistent
behaviour (we always reallocate).

If we make REALLOCATE_BITMAPSETS work how I propose in my patch then
the reallocation is just limited to cases where the set *could* be
repalloced by a modification. That change gives us consistent
behaviour as the set is *always* reallocated when it *could* be
reallocated. Making it consistent to me, seems good as a debug
option. Swapping one consistent behaviour for another (as you propose)
seems pointless and more likely to force us to change code that works
perfectly fine today.

In any case, the comments in bms_int_members(), bms_join() and
bms_del_members() are now only true when REALLOCATE_BITMAPSETS is not
defined. Are you proposing we leave those comments outdated? or do you
propose that we mention that the left inputs are not recycled when
building with REALLOCATE_BITMAPSETS? In my view, neither of these is
acceptable as often the primary reason to use something like
bms_int_members() over bms_intersect() is exactly because the left
input is recycled. I don't want people to have to start contorting
code because bms_int_members()'s left input recycling cannot be relied
on.

David

#6Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#5)
Re: Revise the Asserts added to bimapset manipulation functions

On Fri, Dec 29, 2023 at 5:22 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 29 Dec 2023 at 21:07, Richard Guo <guofenglinux@gmail.com> wrote:

It seems to me that the former scenario also makes sense in some cases.
For instance, let's say there are two pointers in two structs, s1->p and
s2->p, both referencing the same bitmapset. If we modify the bitmapset
via s1->p (even if no reallocation could occur), s2 would see different
content via its pointer 'p'.

That statement simply isn't true. If there's no reallocation then
both pointers point to the same memory and, therefore have the same
content, not different content. In the absence of a reallocation,
then the only time s1->p and s2->p could differ is if s1->p became an
empty set as a result of the modification.

Sorry I didn't make myself clear. By "different content via s2->p" I
mean different content than what came before, not than s1->p. There was
issue caused by such modifications reported before in [1]/messages/by-id/CAMbWs4_wJthNtYBL+SsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw@mail.gmail.com. In that
case, the problematic query is

explain (costs off)
select * from t t1
inner join t t2 on t1.a = t2.a
left join t t3 on t1.b > 1 and t1.b < 2;

The 'required_relids' in the two RestrictInfos for 't1.b > 1' and 't1.b
< 2' both reference the same bitmapset. The content of this bitmapset
is {t1, t3}. However, as we have decided to remove the t1/t2 join by
eliminating t1, we need to substitute the Vars of t1 with the Vars of
t2. To achieve this, we remove each of the two RestrictInfos from the
joininfo lists it is in and perform the necessary replacements.

After applying this process to the first RestrictInfo, the bitmapset now
becomes {t2, t3}. Consequently, the second RestrictInfo also perceives
{t2, t3} as its required_relids. As a result, when attempting to remove
it from the joininfo lists, a problem arises, because it is not in t2's
joininfo list.

Just to clarify, I am not objecting to your v2 patch. I just want to
make sure what is our purpose in introducing REALLOCATE_BITMAPSETS. I'd
like to confirm whether our intention aligns with the former scenario or
the latter one that I mentioned upthread.

Also, here are some review comments for the v2 patch:

* There is no reallocation that happens in bms_add_members(), do we need
to call bms_copy_and_free() there?

* Do you think we can add Assert(bms_is_valid_set(a)) for bms_free()?

[1]: /messages/by-id/CAMbWs4_wJthNtYBL+SsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw@mail.gmail.com
/messages/by-id/CAMbWs4_wJthNtYBL+SsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw@mail.gmail.com

Thanks
Richard

#7David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#6)
Re: Revise the Asserts added to bimapset manipulation functions

On Fri, 29 Dec 2023 at 23:38, Richard Guo <guofenglinux@gmail.com> wrote:

After applying this process to the first RestrictInfo, the bitmapset now
becomes {t2, t3}. Consequently, the second RestrictInfo also perceives
{t2, t3} as its required_relids. As a result, when attempting to remove
it from the joininfo lists, a problem arises, because it is not in t2's
joininfo list.

Changing the relids so they reference t2 instead of t1 requires both
bms_add_member() and bms_del_member(). The bms_add_member() will
cause the set to be reallocated with my patch so I don't see why this
case isn't covered.

Also, here are some review comments for the v2 patch:

* There is no reallocation that happens in bms_add_members(), do we need
to call bms_copy_and_free() there?

The spec I put in the comment at the top of bitmapset.c says:

have the code *always* reallocate the bitmapset when the
* set *could* be reallocated as a result of the modification

Looking at bms_add_members(), it seems to me that the set *could* be
reallocated as a result of the modification, per:

if (a->nwords < b->nwords)
{
result = bms_copy(b);
other = a;
}

In my view, that meets the spec I outlined.

* Do you think we can add Assert(bms_is_valid_set(a)) for bms_free()?

I did briefly have the Assert in bms_free(), but I removed it as I
didn't think it was that useful to validate the set before freeing it.
Certainly, there'd be an argument to do that, but I ended up not
putting it there. I wondered if there could be a case where we do
something like bms_int_members() which results in an empty set and
after checking for and finding the set is now empty, we call
bms_free(). If we did that then we'd Assert fail. In reality, we use
pfree() instead of bms_free() as we already know the set is not NULL,
so it wouldn't cause a problem for that particular case. I wondered if
there might be another one that I didn't spot, so felt it was best not
to Assert there.

I imagine that if we found some case where the bms_free() Assert
failed, we'd likely just remove it rather than trying to make the set
valid before freeing it. So it seems pretty pointless if we'd opt to
remove the Assert() at the first sign of trouble.

David

#8Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#7)
Re: Revise the Asserts added to bimapset manipulation functions

On Sun, Dec 31, 2023 at 6:44 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 29 Dec 2023 at 23:38, Richard Guo <guofenglinux@gmail.com> wrote:

After applying this process to the first RestrictInfo, the bitmapset now
becomes {t2, t3}. Consequently, the second RestrictInfo also perceives
{t2, t3} as its required_relids. As a result, when attempting to remove
it from the joininfo lists, a problem arises, because it is not in t2's
joininfo list.

Changing the relids so they reference t2 instead of t1 requires both
bms_add_member() and bms_del_member(). The bms_add_member() will
cause the set to be reallocated with my patch so I don't see why this
case isn't covered.

Hmm, you're right. This particular case is covered by your patch. I
wondered if there might be another case where a bitmapset with more than
one reference is modified without being potentially required to be
reallocated. I'm not sure if there is such case in reality (or could be
introduced in the future), but if there is, I think it would be great if
REALLOCATE_BITMAPSETS could also help handle it.

Also, here are some review comments for the v2 patch:

* There is no reallocation that happens in bms_add_members(), do we need
to call bms_copy_and_free() there?

The spec I put in the comment at the top of bitmapset.c says:

have the code *always* reallocate the bitmapset when the
* set *could* be reallocated as a result of the modification

Looking at bms_add_members(), it seems to me that the set *could* be
reallocated as a result of the modification, per:

if (a->nwords < b->nwords)
{
result = bms_copy(b);
other = a;
}

In my view, that meets the spec I outlined.

I think one purpose of introducing REALLOCATE_BITMAPSETS is to help find
dangling pointers to Bitmapset's. From this point of view, I agree that
we should call bms_copy_and_free() in bms_add_members(), because the
bitmapset 'a' might be recycled (in-place modified, or pfreed).

According to this criterion, it seems to me that we should also call
bms_copy_and_free() in bms_join(), since both inputs there might be
recycled. But maybe I'm not understanding it correctly.

* Do you think we can add Assert(bms_is_valid_set(a)) for bms_free()?

I did briefly have the Assert in bms_free(), but I removed it as I
didn't think it was that useful to validate the set before freeing it.
Certainly, there'd be an argument to do that, but I ended up not
putting it there. I wondered if there could be a case where we do
something like bms_int_members() which results in an empty set and
after checking for and finding the set is now empty, we call
bms_free(). If we did that then we'd Assert fail. In reality, we use
pfree() instead of bms_free() as we already know the set is not NULL,
so it wouldn't cause a problem for that particular case. I wondered if
there might be another one that I didn't spot, so felt it was best not
to Assert there.

I imagine that if we found some case where the bms_free() Assert
failed, we'd likely just remove it rather than trying to make the set
valid before freeing it. So it seems pretty pointless if we'd opt to
remove the Assert() at the first sign of trouble.

I think I understand your concern. In some bitmapset manipulation
functions, like bms_int_members(), the result might be computed as
empty. In such cases we need to free the input bitmap. If we use
bms_free(), the Assert would fail.

It seems to me that this scenario can only occur within the bitmapset
manipulation functions. Outside of bitmapset.c, I think it should be
quite safe to call bms_free() with this Assert. So I think it should
not have problem to have this Assert in bms_free() as long as we are
careful enough when calling bms_free() inside bitmapset.c.

Thanks
Richard

#9David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#8)
1 attachment(s)
Re: Revise the Asserts added to bimapset manipulation functions

On Tue, 2 Jan 2024 at 20:18, Richard Guo <guofenglinux@gmail.com> wrote:

I think one purpose of introducing REALLOCATE_BITMAPSETS is to help find
dangling pointers to Bitmapset's. From this point of view, I agree that
we should call bms_copy_and_free() in bms_add_members(), because the
bitmapset 'a' might be recycled (in-place modified, or pfreed).

I spoke to Andres about this in our weekly meeting and he explained it
in such I way that I now agree that it's useful to repalloc with all
modifications. I'd previously thought that when the comments mention
that some function "recycle their left input" that you could be
certain that the Bitmapset would be modified in-place, therefore any
other pointers pointing to this set should remain valid. As Andres
reminded me, that's not the case when the set becomes empty and
there's nothing particularly special about a set becoming empty so
making a clone of the set will help identify any pointers that don't
get updated as a result of the modification.

I've now adjusted the patch to have all modifications to Bitmapsets
return a newly allocated set. There are a few cases missing in master
that need to be fixed (items 6-10 below):

The patch now includes changes for the following:

1. Document what REALLOCATE_BITMAPSETS is for.
2. Provide a reusable function to check a set is valid and use it in
cassert builds and use it to validate sets (Richard)
3. Provide a reusable function to copy a set and pfree the original
and use that instead of duplicating that code all over bitmapset.c
4. Fix Assert duplication (Richard)
5. Make comments in bms_union, bms_intersect, bms_difference clear
that a new set is allocated. (This has annoyed me for a while)
6. Fix failure to duplicate the set in bms_add_members() when b == NULL.
7. Fix failure to duplicate the set in bms_add_range() when upper < lower
8. Fix failure to duplicate the set in bms_add_range() when the set
has enough words already.
9. Fix failure to duplicate the set in bms_del_members() when b == NULL
10. Fix failure to duplicate the set in bms_join() when a == NULL and
also fix the b == NULL case too
11. Fix hazard in bms_add_members(), bms_int_members(),
bms_del_members() and bms_join(), where the code added in 7d58f2342
could crash if a == b due to the REALLOCATE_BITMAPSETS code pfreeing
'a'. This happens in knapsack.c:93, although I propose we adjust that
code now that empty sets are always represented as NULL.

David

Show quoted text

According to this criterion, it seems to me that we should also call
bms_copy_and_free() in bms_join(), since both inputs there might be
recycled. But maybe I'm not understanding it correctly.

* Do you think we can add Assert(bms_is_valid_set(a)) for bms_free()?

I did briefly have the Assert in bms_free(), but I removed it as I
didn't think it was that useful to validate the set before freeing it.
Certainly, there'd be an argument to do that, but I ended up not
putting it there. I wondered if there could be a case where we do
something like bms_int_members() which results in an empty set and
after checking for and finding the set is now empty, we call
bms_free(). If we did that then we'd Assert fail. In reality, we use
pfree() instead of bms_free() as we already know the set is not NULL,
so it wouldn't cause a problem for that particular case. I wondered if
there might be another one that I didn't spot, so felt it was best not
to Assert there.

I imagine that if we found some case where the bms_free() Assert
failed, we'd likely just remove it rather than trying to make the set
valid before freeing it. So it seems pretty pointless if we'd opt to
remove the Assert() at the first sign of trouble.

I think I understand your concern. In some bitmapset manipulation
functions, like bms_int_members(), the result might be computed as
empty. In such cases we need to free the input bitmap. If we use
bms_free(), the Assert would fail.

It seems to me that this scenario can only occur within the bitmapset
manipulation functions. Outside of bitmapset.c, I think it should be
quite safe to call bms_free() with this Assert. So I think it should
not have problem to have this Assert in bms_free() as long as we are
careful enough when calling bms_free() inside bitmapset.c.

Thanks
Richard

Attachments:

v3-0001-Fix-REALLOCATE_BITMAPSETS-code.patchtext/plain; charset=US-ASCII; name=v3-0001-Fix-REALLOCATE_BITMAPSETS-code.patchDownload
From 2a8048beb4623f7b0d98763658252a74619cd887 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Fri, 5 Jan 2024 13:00:35 +1300
Subject: [PATCH v3] Fix REALLOCATE_BITMAPSETS code

7d58f2342 added a compile-time option to have bitmapset.c reallocate the
set before returning when a set is modified.  That commit failed to do
its job in various cases and returned the input set when it shouldn't
have in these cases.

This commit also adds some documentation about what
REALLOCATE_BITMAPSETS is for.  This is important as future functions
that go inside bitmapset.c need to know if they need to do anything
special when this compile-time option is defined.

Also, between 71a3e8c43 and 7d58f2342 some Asserts seem to have become
duplicated.  Here we tidy these up.  Rather than having the Assert check
each aspect of what makes a set invalid, this commit introduces a helper
function which returns false when a set is invalid.  Have the Asserts
use this instead.

Here we also make a pass on improving various comments in bitmapset.c.
Various comments talked about the input sets being "recycled" which
could be interpreted to mean that the output set will always point to
the same memory as the given input parameter.  Here we try to make it
clear that this must not be relied upon and that callers must ensure
that all references to a given set are updated on each modification.

In passing, improve comments for bms_union(), bms_intersect() and
bms_difference() to detail what they do.  I (David) have too often had to
remind myself by reading the code each time to find out if I need, for
example, to use bms_union() or bms_join().

Author: Richard Guo, David Rowley
Discussion: https://postgr.es/m/CAMbWs4-djy9qYux2gZrtmxA0StrYXJjvB-oqLxn-d7J88t=PQQ@mail.gmail.com
---
 src/backend/nodes/bitmapset.c | 313 ++++++++++++++++++++--------------
 1 file changed, 189 insertions(+), 124 deletions(-)

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index f4b61085be..d9b9453270 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -16,6 +16,18 @@
  * containing only a single word (likely the majority of them) this halves the
  * number of loop condition checks.
  *
+ * Callers must ensure that the set returned by functions in this file which
+ * adjust the members of an existing set is assigned to all pointers pointing
+ * to that existing set.  No guarantees are made that we'll ever modify the
+ * existing set in-place and return it.
+ *
+ * To help find bugs caused by callers failing to record the return value of
+ * the function which manipulates an existing set, we support building with
+ * REALLOCATE_BITMAPSETS.  This results in the set being reallocated each time
+ * the set is altered and the existing being pfreed.  This is useful as if any
+ * references still exist to the old set, we're more likely to notice as
+ * any users of the old set will be accessing pfree'd memory.  This option is
+ * only intended to be used for debugging.
  *
  * Copyright (c) 2003-2024, PostgreSQL Global Development Group
  *
@@ -72,6 +84,49 @@
 #error "invalid BITS_PER_BITMAPWORD"
 #endif
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * bms_is_valid_set - for cassert builds to check for valid sets
+ */
+static bool
+bms_is_valid_set(const Bitmapset *a)
+{
+	/* NULL is the correct representation of an empty set */
+	if (a == NULL)
+		return true;
+
+	/* check the node tag is set correctly.  pfree'd pointer, maybe? */
+	if (!IsA(a, Bitmapset))
+		return false;
+
+	/* trailing zero words are not allowed */
+	if (a->words[a->nwords - 1] == 0)
+		return false;
+
+	return true;
+}
+#endif
+
+#ifdef REALLOCATE_BITMAPSETS
+/*
+ * bms_copy_and_free
+ *		Only required in REALLOCATE_BITMAPSETS builds.  Provide a simple way
+ *		to return a freshly allocated set and pfree the original.
+ *
+ * Note: callers which accept multiple sets must be careful when calling this
+ * function to clone one parameter as other parameters may point to the same
+ * set.  A good option is to call this just before returning the resulting
+ * set.
+ */
+static Bitmapset *
+bms_copy_and_free(Bitmapset *a)
+{
+	Bitmapset  *c = bms_copy(a);
+
+	bms_free(a);
+	return c;
+}
+#endif
 
 /*
  * bms_copy - make a palloc'd copy of a bitmapset
@@ -82,9 +137,11 @@ bms_copy(const Bitmapset *a)
 	Bitmapset  *result;
 	size_t		size;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return NULL;
-	Assert(IsA(a, Bitmapset));
+
 	size = BITMAPSET_SIZE(a->nwords);
 	result = (Bitmapset *) palloc(size);
 	memcpy(result, a, size);
@@ -99,8 +156,8 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -140,8 +197,8 @@ bms_compare(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -206,7 +263,8 @@ bms_free(Bitmapset *a)
 
 
 /*
- * bms_union - set union
+ * bms_union - create and return a new set containing all members from both
+ * input sets.  Both inputs are left unmodified.
  */
 Bitmapset *
 bms_union(const Bitmapset *a, const Bitmapset *b)
@@ -216,8 +274,8 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 	int			otherlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -246,7 +304,8 @@ bms_union(const Bitmapset *a, const Bitmapset *b)
 }
 
 /*
- * bms_intersect - set intersection
+ * bms_intersect - create and return a new set containing members which both
+ * input sets have in common.  Both inputs are left unmodified.
  */
 Bitmapset *
 bms_intersect(const Bitmapset *a, const Bitmapset *b)
@@ -257,8 +316,8 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 	int			resultlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
@@ -299,7 +358,8 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
 }
 
 /*
- * bms_difference - set difference (ie, A without members of B)
+ * bms_difference - create and return a new set containing all the members of
+ * 'a' without the members of 'b'.
  */
 Bitmapset *
 bms_difference(const Bitmapset *a, const Bitmapset *b)
@@ -307,8 +367,8 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 	Bitmapset  *result;
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -316,8 +376,6 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return bms_copy(a);
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/*
 	 * In Postgres' usage, an empty result is a very common case, so it's
 	 * worth optimizing for that by testing bms_nonempty_difference().  This
@@ -374,8 +432,8 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -383,8 +441,6 @@ bms_is_subset(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return false;
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/* 'a' can't be a subset of 'b' if it contains more words */
 	if (a->nwords > b->nwords)
 		return false;
@@ -411,8 +467,8 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	int			shortlen;
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -424,8 +480,6 @@ bms_subset_compare(const Bitmapset *a, const Bitmapset *b)
 	if (b == NULL)
 		return BMS_SUBSET2;
 
-	Assert(IsA(a, Bitmapset) && IsA(b, Bitmapset));
-
 	/* Check common words */
 	result = BMS_EQUAL;			/* status so far */
 	shortlen = Min(a->nwords, b->nwords);
@@ -477,14 +531,14 @@ bms_is_member(int x, const Bitmapset *a)
 	int			wordnum,
 				bitnum;
 
+	Assert(bms_is_valid_set(a));
+
 	/* XXX better to just return false for x<0 ? */
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return false;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 	if (wordnum >= a->nwords)
@@ -509,12 +563,12 @@ bms_member_index(Bitmapset *a, int x)
 	int			result = 0;
 	bitmapword	mask;
 
+	Assert(bms_is_valid_set(a));
+
 	/* return -1 if not a member of the bitmap */
 	if (!bms_is_member(x, a))
 		return -1;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -549,8 +603,8 @@ bms_overlap(const Bitmapset *a, const Bitmapset *b)
 	int			shortlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL || b == NULL)
@@ -576,7 +630,7 @@ bms_overlap_list(const Bitmapset *a, const List *b)
 	int			wordnum,
 				bitnum;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	if (a == NULL || b == NIL)
 		return false;
@@ -607,8 +661,8 @@ bms_nonempty_difference(const Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -640,11 +694,11 @@ bms_singleton_member(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		elog(ERROR, "bitmapset is empty");
 
-	Assert(IsA(a, Bitmapset));
-
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -683,9 +737,11 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return false;
-	Assert(IsA(a, Bitmapset));
+
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -717,9 +773,11 @@ bms_num_members(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return 0;
-	Assert(IsA(a, Bitmapset));
+
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -745,9 +803,11 @@ bms_membership(const Bitmapset *a)
 	int			nwords;
 	int			wordnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return BMS_EMPTY_SET;
-	Assert(IsA(a, Bitmapset));
+
 	nwords = a->nwords;
 	wordnum = 0;
 	do
@@ -778,7 +838,7 @@ bms_membership(const Bitmapset *a)
 /*
  * bms_add_member - add a specified member to set
  *
- * Input set is modified or recycled!
+ * 'a' is recycled when possible.
  */
 Bitmapset *
 bms_add_member(Bitmapset *a, int x)
@@ -786,11 +846,13 @@ bms_add_member(Bitmapset *a, int x)
 	int			wordnum,
 				bitnum;
 
+	Assert(bms_is_valid_set(a));
+
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return bms_make_singleton(x);
-	Assert(IsA(a, Bitmapset));
+
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
@@ -799,15 +861,8 @@ bms_add_member(Bitmapset *a, int x)
 	{
 		int			oldnwords = a->nwords;
 		int			i;
-#ifdef REALLOCATE_BITMAPSETS
-		Bitmapset  *tmp = a;
 
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(wordnum + 1));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-#else
 		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(wordnum + 1));
-#endif
 		a->nwords = wordnum + 1;
 		/* zero out the enlarged portion */
 		i = oldnwords;
@@ -816,18 +871,18 @@ bms_add_member(Bitmapset *a, int x)
 			a->words[i] = 0;
 		} while (++i < a->nwords);
 	}
+
+	a->words[wordnum] |= ((bitmapword) 1 << bitnum);
+
 #ifdef REALLOCATE_BITMAPSETS
-	else
-	{
-		Bitmapset  *tmp = a;
 
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-	}
+	/*
+	 * There's no guarantee that the repalloc returned a new pointer, so copy
+	 * and free unconditionally here.
+	 */
+	a = bms_copy_and_free(a);
 #endif
 
-	a->words[wordnum] |= ((bitmapword) 1 << bitnum);
 	return a;
 }
 
@@ -836,31 +891,26 @@ bms_add_member(Bitmapset *a, int x)
  *
  * No error if x is not currently a member of set
  *
- * Input set is modified in-place!
+ * 'a' is recycled when possible.
  */
 Bitmapset *
 bms_del_member(Bitmapset *a, int x)
 {
 	int			wordnum,
 				bitnum;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
+
+	Assert(bms_is_valid_set(a));
 
 	if (x < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
 	if (a == NULL)
 		return NULL;
 
-	Assert(IsA(a, Bitmapset));
-
 	wordnum = WORDNUM(x);
 	bitnum = BITNUM(x);
 
 #ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
+	a = bms_copy_and_free(a);
 #endif
 
 	/* member can't exist.  Return 'a' unmodified */
@@ -890,7 +940,7 @@ bms_del_member(Bitmapset *a, int x)
 }
 
 /*
- * bms_add_members - like bms_union, but left input is recycled
+ * bms_add_members - like bms_union, but left input is recycled when possible
  */
 Bitmapset *
 bms_add_members(Bitmapset *a, const Bitmapset *b)
@@ -900,14 +950,20 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	int			otherlen;
 	int			i;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
 		return bms_copy(b);
 	if (b == NULL)
+	{
+#ifdef REALLOCATE_BITMAPSETS
+		a = bms_copy_and_free(a);
+#endif
+
 		return a;
+	}
 	/* Identify shorter and longer input; copy the longer one if needed */
 	if (a->nwords < b->nwords)
 	{
@@ -916,13 +972,6 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	}
 	else
 	{
-#ifdef REALLOCATE_BITMAPSETS
-		Bitmapset  *tmp = a;
-
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-#endif
 		result = a;
 		other = b;
 	}
@@ -935,6 +984,11 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
 	} while (++i < otherlen);
 	if (result != a)
 		pfree(a);
+#ifdef REALLOCATE_BITMAPSETS
+	else
+		result = bms_copy_and_free(result);
+#endif
+
 	return result;
 }
 
@@ -955,11 +1009,17 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 				ushiftbits,
 				wordnum;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	/* do nothing if nothing is called for, without further checking */
 	if (upper < lower)
+	{
+#ifdef REALLOCATE_BITMAPSETS
+		a = bms_copy_and_free(a);
+#endif
+
 		return a;
+	}
 
 	if (lower < 0)
 		elog(ERROR, "negative bitmapset member not allowed");
@@ -975,16 +1035,9 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 	{
 		int			oldnwords = a->nwords;
 		int			i;
-#ifdef REALLOCATE_BITMAPSETS
-		Bitmapset  *tmp = a;
 
-		a = (Bitmapset *) palloc(BITMAPSET_SIZE(uwordnum + 1));
-		memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-		pfree(tmp);
-#else
 		/* ensure we have enough words to store the upper bit */
 		a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
-#endif
 		a->nwords = uwordnum + 1;
 		/* zero out the enlarged portion */
 		i = oldnwords;
@@ -1021,11 +1074,21 @@ bms_add_range(Bitmapset *a, int lower, int upper)
 		a->words[uwordnum] |= (~(bitmapword) 0) >> ushiftbits;
 	}
 
+#ifdef REALLOCATE_BITMAPSETS
+
+	/*
+	 * There's no guarantee that the repalloc returned a new pointer, so copy
+	 * and free unconditionally here.
+	 */
+	a = bms_copy_and_free(a);
+#endif
+
 	return a;
 }
 
 /*
- * bms_int_members - like bms_intersect, but left input is recycled
+ * bms_int_members - like bms_intersect, but left input is recycled when
+ * possible
  */
 Bitmapset *
 bms_int_members(Bitmapset *a, const Bitmapset *b)
@@ -1033,15 +1096,9 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 	int			lastnonzero;
 	int			shortlen;
 	int			i;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
-
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
@@ -1052,12 +1109,6 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 		return NULL;
 	}
 
-#ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
-#endif
-
 	/* Intersect b into a; we need never copy */
 	shortlen = Min(a->nwords, b->nwords);
 	lastnonzero = -1;
@@ -1079,35 +1130,38 @@ bms_int_members(Bitmapset *a, const Bitmapset *b)
 
 	/* get rid of trailing zero words */
 	a->nwords = lastnonzero + 1;
+
+#ifdef REALLOCATE_BITMAPSETS
+	a = bms_copy_and_free(a);
+#endif
+
 	return a;
 }
 
 /*
- * bms_del_members - like bms_difference, but left input is recycled
+ * bms_del_members - delete members in 'a' that are set in 'b'.  'a' is
+ * recycled when possible.
  */
 Bitmapset *
 bms_del_members(Bitmapset *a, const Bitmapset *b)
 {
 	int			i;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
 
-	Assert(a == NULL || (IsA(a, Bitmapset) && a->words[a->nwords - 1] != 0));
-	Assert(b == NULL || (IsA(b, Bitmapset) && b->words[b->nwords - 1] != 0));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
 		return NULL;
 	if (b == NULL)
-		return a;
-
+	{
 #ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
+		a = bms_copy_and_free(a);
 #endif
 
+		return a;
+	}
+
 	/* Remove b's bits from a; we need never copy */
 	if (a->nwords > b->nwords)
 	{
@@ -1147,11 +1201,15 @@ bms_del_members(Bitmapset *a, const Bitmapset *b)
 		a->nwords = lastnonzero + 1;
 	}
 
+#ifdef REALLOCATE_BITMAPSETS
+	a = bms_copy_and_free(a);
+#endif
+
 	return a;
 }
 
 /*
- * bms_join - like bms_union, but *both* inputs are recycled
+ * bms_join - like bms_union, but *either* input *may* be recycled
  */
 Bitmapset *
 bms_join(Bitmapset *a, Bitmapset *b)
@@ -1160,28 +1218,28 @@ bms_join(Bitmapset *a, Bitmapset *b)
 	Bitmapset  *other;
 	int			otherlen;
 	int			i;
-#ifdef REALLOCATE_BITMAPSETS
-	Bitmapset  *tmp = a;
-#endif
 
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
-
-	Assert(a == NULL || IsA(a, Bitmapset));
-	Assert(b == NULL || IsA(b, Bitmapset));
+	Assert(bms_is_valid_set(a));
+	Assert(bms_is_valid_set(b));
 
 	/* Handle cases where either input is NULL */
 	if (a == NULL)
+	{
+#ifdef REALLOCATE_BITMAPSETS
+		b = bms_copy_and_free(b);
+#endif
+
 		return b;
+	}
 	if (b == NULL)
-		return a;
-
+	{
 #ifdef REALLOCATE_BITMAPSETS
-	a = (Bitmapset *) palloc(BITMAPSET_SIZE(tmp->nwords));
-	memcpy(a, tmp, BITMAPSET_SIZE(tmp->nwords));
-	pfree(tmp);
+		a = bms_copy_and_free(a);
 #endif
 
+		return a;
+	}
+
 	/* Identify shorter and longer input; use longer one as result */
 	if (a->nwords < b->nwords)
 	{
@@ -1202,6 +1260,11 @@ bms_join(Bitmapset *a, Bitmapset *b)
 	} while (++i < otherlen);
 	if (other != result)		/* pure paranoia */
 		pfree(other);
+
+#ifdef REALLOCATE_BITMAPSETS
+	result = bms_copy_and_free(result);
+#endif
+
 	return result;
 }
 
@@ -1231,7 +1294,7 @@ bms_next_member(const Bitmapset *a, int prevbit)
 	int			wordnum;
 	bitmapword	mask;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	if (a == NULL)
 		return -2;
@@ -1292,7 +1355,7 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 	int			ushiftbits;
 	bitmapword	mask;
 
-	Assert(a == NULL || IsA(a, Bitmapset));
+	Assert(bms_is_valid_set(a));
 
 	/*
 	 * If set is NULL or if there are no more bits to the right then we've
@@ -1337,6 +1400,8 @@ bms_prev_member(const Bitmapset *a, int prevbit)
 uint32
 bms_hash_value(const Bitmapset *a)
 {
+	Assert(bms_is_valid_set(a));
+
 	if (a == NULL)
 		return 0;				/* All empty sets hash to 0 */
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
-- 
2.40.1

#10Richard Guo
guofenglinux@gmail.com
In reply to: David Rowley (#9)
Re: Revise the Asserts added to bimapset manipulation functions

On Tue, Jan 16, 2024 at 11:08 AM David Rowley <dgrowleyml@gmail.com> wrote:

I've now adjusted the patch to have all modifications to Bitmapsets
return a newly allocated set. There are a few cases missing in master
that need to be fixed (items 6-10 below):

The patch now includes changes for the following:

1. Document what REALLOCATE_BITMAPSETS is for.
2. Provide a reusable function to check a set is valid and use it in
cassert builds and use it to validate sets (Richard)
3. Provide a reusable function to copy a set and pfree the original
and use that instead of duplicating that code all over bitmapset.c
4. Fix Assert duplication (Richard)
5. Make comments in bms_union, bms_intersect, bms_difference clear
that a new set is allocated. (This has annoyed me for a while)
6. Fix failure to duplicate the set in bms_add_members() when b == NULL.
7. Fix failure to duplicate the set in bms_add_range() when upper < lower
8. Fix failure to duplicate the set in bms_add_range() when the set
has enough words already.
9. Fix failure to duplicate the set in bms_del_members() when b == NULL
10. Fix failure to duplicate the set in bms_join() when a == NULL and
also fix the b == NULL case too
11. Fix hazard in bms_add_members(), bms_int_members(),
bms_del_members() and bms_join(), where the code added in 7d58f2342
could crash if a == b due to the REALLOCATE_BITMAPSETS code pfreeing
'a'. This happens in knapsack.c:93, although I propose we adjust that
code now that empty sets are always represented as NULL.

Thank you so much for all the work you have put into making this patch
perfect. I reviewed through the v3 patch and I do not have further
comments. I think it's in good shape now.

Thanks
Richard

#11David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#10)
Re: Revise the Asserts added to bimapset manipulation functions

On Tue, 16 Jan 2024 at 21:00, Richard Guo <guofenglinux@gmail.com> wrote:

Thank you so much for all the work you have put into making this patch
perfect. I reviewed through the v3 patch and I do not have further
comments. I think it's in good shape now.

Thanks for looking again. I pushed the patch after removing some
comments mentioning "these operations". I found these not to be very
useful and also misleading.

David