Microoptimization of Bitmapset usage in postgres_fdw
There are a couple of places in postgres_fdw where we check if the Bitmapset
has multiple members using bms_num_members(), without storing the returned
count. The attached patch instead use bms_membership() which is optimized for
just that usecase, and (IMO) makes for clearer code.
cheers ./daniel
Attachments:
postgres_fdw_bms_multiple.patchapplication/octet-stream; name=postgres_fdw_bms_multiple.patch; x-unix-mode=0644Download
From 88aa03c49bc5e7730f9ac5305e129b4017aa9451 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 29 May 2018 11:26:01 -0400
Subject: [PATCH] Use bms_membership() to identify multiples
When we are only looking to identify whether the Bitmapset has
multiple members we can use the optimized version bms_membership()
instead.
---
contrib/postgres_fdw/deparse.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d272719ff4..8068e28184 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1076,7 +1076,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
/* Construct FROM clause */
appendStringInfoString(buf, " FROM ");
deparseFromExprForRel(buf, context->root, scanrel,
- (bms_num_members(scanrel->relids) > 1),
+ (bms_membership(scanrel->relids) == BMS_MULTIPLE),
(Index) 0, NULL, context->params_list);
/* Construct WHERE clause */
@@ -1262,7 +1262,7 @@ deparseLockingClause(deparse_expr_cxt *context)
}
/* Add the relation alias if we are here for a join relation */
- if (bms_num_members(rel->relids) > 1 &&
+ if (bms_membership(rel->relids) == BMS_MULTIPLE &&
rc->strength != LCS_NONE)
appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid);
}
@@ -2328,7 +2328,7 @@ deparseVar(Var *node, deparse_expr_cxt *context)
int colno;
/* Qualify columns when multiple relations are involved. */
- bool qualify_col = (bms_num_members(relids) > 1);
+ bool qualify_col = (bms_membership(relids) == BMS_MULTIPLE);
/*
* If the Var belongs to the foreign relation that is deparsed as a
--
2.14.1.145.gb3622a4ee
On Tue, May 29, 2018 at 9:10 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
There are a couple of places in postgres_fdw where we check if the Bitmapset
has multiple members using bms_num_members(), without storing the returned
count. The attached patch instead use bms_membership() which is optimized for
just that usecase, and (IMO) makes for clearer code.
+1 for that change. Some of those usages of bms_num_members() were
added by me. Sorry for that. It was mostly because I wasn't aware of
bms_membership() when I wrote that code. May be we should add a
comment in the prologue of bms_num_members() like "Note: if the
callers is interested only knowing whether the bitmapset has 0, 1 or
more members, it should call bms_membership().". I understand that
bms_membership() resides just below bms_num_members(), but 1.
bms_membership doesn't sound like it would tell me that 2.
bms_membership's prologue refers to bms_num_members, which should have
been the other way; we want the developers to use bms_membership
instead of bms_num_members(), when they land on bms_num_members. It's
less likely that somebody landing on bms_membership would want to use
bms_num_members().
I am not sure if this can b e squeezed into HEAD right now. It looks
safe to do so. But in case not, please add this to the next commitfest
so that it's not forgotten.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 30 May 2018, at 09:36, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
On Tue, May 29, 2018 at 9:10 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
There are a couple of places in postgres_fdw where we check if the Bitmapset
has multiple members using bms_num_members(), without storing the returned
count. The attached patch instead use bms_membership() which is optimized for
just that usecase, and (IMO) makes for clearer code.+1 for that change. Some of those usages of bms_num_members() were
added by me. Sorry for that. It was mostly because I wasn't aware of
bms_membership() when I wrote that code. May be we should add a
comment in the prologue of bms_num_members() like "Note: if the
callers is interested only knowing whether the bitmapset has 0, 1 or
more members, it should call bms_membership().". I understand that
bms_membership() resides just below bms_num_members(), but 1.
bms_membership doesn't sound like it would tell me that 2.
bms_membership's prologue refers to bms_num_members, which should have
been the other way; we want the developers to use bms_membership
instead of bms_num_members(), when they land on bms_num_members. It's
less likely that somebody landing on bms_membership would want to use
bms_num_members().
That makes sense, I’ve added a second patch to this thread which expands the
comment on bms_num_members to make it clearer.
I am not sure if this can b e squeezed into HEAD right now. It looks
safe to do so. But in case not, please add this to the next commitfest
so that it's not forgotten.
Will do, thanks for reviewing.
cheers ./daniel
Attachments:
bms_num_members_comment-v2.patchapplication/octet-stream; name=bms_num_members_comment-v2.patch; x-unix-mode=0644Download
From eb1312ae84628be0ce91f001ab9dee34521ad7f5 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 30 May 2018 10:25:23 -0400
Subject: [PATCH 2/2] Clarify the comment on bms_num_members vs bms_membership
---
src/backend/nodes/bitmapset.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 9bf9a29d6b..6adf9b2895 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -666,6 +666,9 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
/*
* bms_num_members - count members of set
+ *
+ * In situations where the exact count isn't required, bms_membership can be
+ * used to test if the set has 0, 1 or multiple members.
*/
int
bms_num_members(const Bitmapset *a)
--
2.14.1.145.gb3622a4ee
postgres_fdw_bms_multiple-v2.patchapplication/octet-stream; name=postgres_fdw_bms_multiple-v2.patch; x-unix-mode=0644Download
From 81b5b310d99f792b1b77af6bdf5c11bd84bf43d2 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 28 May 2018 19:15:33 +0200
Subject: [PATCH 1/2] Use optimized BMS function for testing membership
When all we need to know is if the Bitmapset has zer0, one or many
members bms_membership() is faster than counting the members with
bms_num_members().
---
contrib/postgres_fdw/deparse.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d272719ff4..8068e28184 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1076,7 +1076,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
/* Construct FROM clause */
appendStringInfoString(buf, " FROM ");
deparseFromExprForRel(buf, context->root, scanrel,
- (bms_num_members(scanrel->relids) > 1),
+ (bms_membership(scanrel->relids) == BMS_MULTIPLE),
(Index) 0, NULL, context->params_list);
/* Construct WHERE clause */
@@ -1262,7 +1262,7 @@ deparseLockingClause(deparse_expr_cxt *context)
}
/* Add the relation alias if we are here for a join relation */
- if (bms_num_members(rel->relids) > 1 &&
+ if (bms_membership(rel->relids) == BMS_MULTIPLE &&
rc->strength != LCS_NONE)
appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid);
}
@@ -2328,7 +2328,7 @@ deparseVar(Var *node, deparse_expr_cxt *context)
int colno;
/* Qualify columns when multiple relations are involved. */
- bool qualify_col = (bms_num_members(relids) > 1);
+ bool qualify_col = (bms_membership(relids) == BMS_MULTIPLE);
/*
* If the Var belongs to the foreign relation that is deparsed as a
--
2.14.1.145.gb3622a4ee
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
Hello,
The v2 patches look good to me. However, I found a couple other
places where we might be able to use this micro-optimization.
1) dependencies_clauselist_selectivity() in dependencies.c
/*
* If there's not at least two distinct attnums then reject the whole list
* of clauses. We must return 1.0 so the calling function's selectivity is
* unaffected.
*/
if (bms_num_members(clauses_attnums) < 2)
{
pfree(list_attnums);
return 1.0;
}
2) BuildRelationExtStatistics() in extended_stats.c.
/* check allowed number of dimensions */
Assert(bms_num_members(stat->columns) >= 2 &&
bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);
Nathan
The new status of this patch is: Waiting on Author
On 14 Jun 2018, at 16:56, Nathan Bossart <bossartn@amazon.com> wrote:
The v2 patches look good to me. However, I found a couple other
places where we might be able to use this micro-optimization.
Thanks a lot for your review!
1) dependencies_clauselist_selectivity() in dependencies.c
/*
* If there's not at least two distinct attnums then reject the whole list
* of clauses. We must return 1.0 so the calling function's selectivity is
* unaffected.
*/
if (bms_num_members(clauses_attnums) < 2)
{
pfree(list_attnums);
return 1.0;
}
I agree with this one, not sure why I missed that when grep’ing around while
writing the patch. Fixed in the attached v3 patchset (which are now awkwardly
named but I didn’t change that).
2) BuildRelationExtStatistics() in extended_stats.c.
/* check allowed number of dimensions */
Assert(bms_num_members(stat->columns) >= 2 &&
bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);
Since this usage is in an assertion I don’t see the value in changing it as the
current programming is more optimized for readability.
cheers ./daniel
Attachments:
bms_num_members_comment-v3.patchapplication/octet-stream; name=bms_num_members_comment-v3.patch; x-unix-mode=0644Download
From d4a47088516615153e8eff6721fded4b072cce01 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Thu, 14 Jun 2018 21:17:37 +0200
Subject: [PATCH 2/2] Clarify the comment on bms_num_members vs bms_membership
---
src/backend/nodes/bitmapset.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 9bf9a29d6b..6adf9b2895 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -666,6 +666,9 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
/*
* bms_num_members - count members of set
+ *
+ * In situations where the exact count isn't required, bms_membership can be
+ * used to test if the set has 0, 1 or multiple members.
*/
int
bms_num_members(const Bitmapset *a)
--
2.14.1.145.gb3622a4ee
postgres_fdw_bms_multiple-v3.patchapplication/octet-stream; name=postgres_fdw_bms_multiple-v3.patch; x-unix-mode=0644Download
From 04cfdd4d7b442256b31387a7536cb9d18d064fe8 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Thu, 14 Jun 2018 21:17:33 +0200
Subject: [PATCH 1/2] Use optimized BMS function for testing membership
When all we need to know is if the Bitmapset has zero, one or many
members bms_membership() is faster than obtaining an exact count of
the members with bms_num_members().
---
contrib/postgres_fdw/deparse.c | 6 +++---
src/backend/statistics/dependencies.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d272719ff4..8068e28184 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1076,7 +1076,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
/* Construct FROM clause */
appendStringInfoString(buf, " FROM ");
deparseFromExprForRel(buf, context->root, scanrel,
- (bms_num_members(scanrel->relids) > 1),
+ (bms_membership(scanrel->relids) == BMS_MULTIPLE),
(Index) 0, NULL, context->params_list);
/* Construct WHERE clause */
@@ -1262,7 +1262,7 @@ deparseLockingClause(deparse_expr_cxt *context)
}
/* Add the relation alias if we are here for a join relation */
- if (bms_num_members(rel->relids) > 1 &&
+ if (bms_membership(rel->relids) == BMS_MULTIPLE &&
rc->strength != LCS_NONE)
appendStringInfo(buf, " OF %s%d", REL_ALIAS_PREFIX, relid);
}
@@ -2328,7 +2328,7 @@ deparseVar(Var *node, deparse_expr_cxt *context)
int colno;
/* Qualify columns when multiple relations are involved. */
- bool qualify_col = (bms_num_members(relids) > 1);
+ bool qualify_col = (bms_membership(relids) == BMS_MULTIPLE);
/*
* If the Var belongs to the foreign relation that is deparsed as a
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 3b4a09b1a7..1cc537b5b0 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -992,7 +992,7 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
* of clauses. We must return 1.0 so the calling function's selectivity is
* unaffected.
*/
- if (bms_num_members(clauses_attnums) < 2)
+ if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
{
pfree(list_attnums);
return 1.0;
--
2.14.1.145.gb3622a4ee
Thanks for the updated patch set.
On 6/14/18, 2:34 PM, "Daniel Gustafsson" <daniel@yesql.se> wrote:
2) BuildRelationExtStatistics() in extended_stats.c.
/* check allowed number of dimensions */
Assert(bms_num_members(stat->columns) >= 2 &&
bms_num_members(stat->columns) <= STATS_MAX_DIMENSIONS);Since this usage is in an assertion I don’t see the value in changing it as the
current programming is more optimized for readability.
Agreed. I hesitated to even point this one out.
I'll go ahead and mark this as Ready for Committer.
Nathan
On Thu, Jun 14, 2018 at 08:14:54PM +0000, Bossart, Nathan wrote:
I'll go ahead and mark this as Ready for Committer.
Another thing not mentioned on this thread is that bms_membership is
faster than bms_num_members by design with many members, so this change
makes sense to shave a couple of cycles.
/*
* bms_num_members - count members of set
+ *
+ * In situations where the exact count isn't required, bms_membership can be
+ * used to test if the set has 0, 1 or multiple members.
*/
bms_membership is a couple of lines down, I am not sure I see much point
in duplicating what's already present.
- if (bms_num_members(clauses_attnums) < 2)
+ if (bms_membership(clauses_attnums) != BMS_MULTIPLE)
For this one, the comment above directly mentions that at least two
attnums need to be present, so it seems to me that the current coding is
easier to understand and intentional... So I would be incline to not
change it.
I think that this should not go into the tree until REL_11_STABLE gets
created though, so this will have to wait a bit.
--
Michael
On 17 Jun 2018, at 14:47, Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jun 14, 2018 at 08:14:54PM +0000, Bossart, Nathan wrote:
I'll go ahead and mark this as Ready for Committer.
Another thing not mentioned on this thread is that bms_membership is
faster than bms_num_members by design with many members, so this change
makes sense to shave a couple of cycles./* * bms_num_members - count members of set + * + * In situations where the exact count isn't required, bms_membership can be + * used to test if the set has 0, 1 or multiple members. */ bms_membership is a couple of lines down, I am not sure I see much point in duplicating what's already present.
It is indeed a bit of a duplication, but the only reason this patch came to be
was that the original author of the instances in postgres_fdw had missed said
comment on bms_membership a few lines down.
- if (bms_num_members(clauses_attnums) < 2) + if (bms_membership(clauses_attnums) != BMS_MULTIPLE) For this one, the comment above directly mentions that at least two attnums need to be present, so it seems to me that the current coding is easier to understand and intentional... So I would be incline to not change it.
I don’t have any strong feelings either way, and will leave that call to the
committer who picks this up. I agree that the current coding is easy to
understand but I don’t see this being much harder.
I think that this should not go into the tree until REL_11_STABLE gets
created though, so this will have to wait a bit.
100% agree.
cheers ./daniel
On Sun, Jun 17, 2018 at 07:23:19PM +0200, Daniel Gustafsson wrote:
On 17 Jun 2018, at 14:47, Michael Paquier <michael@paquier.xyz> wrote:
- if (bms_num_members(clauses_attnums) < 2) + if (bms_membership(clauses_attnums) != BMS_MULTIPLE) For this one, the comment above directly mentions that at least two attnums need to be present, so it seems to me that the current coding is easier to understand and intentional... So I would be incline to not change it.I don’t have any strong feelings either way, and will leave that call to the
committer who picks this up. I agree that the current coding is easy to
understand but I don’t see this being much harder.
I have looked at that again, and pushed the portion for postgres_fdw as
the intention is clear, while leaving out the part from the statistics
per the comment close by. Thanks!
--
Michael