List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)

Started by Tom Laneover 3 years ago6 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

[ I'm intentionally forking this off as a new thread, so as to
not confuse the cfbot about what's the live patchset on the
ExecRTCheckPerms thread. ]

Amit Langote <amitlangote09@gmail.com> writes:

On Sat, Nov 12, 2022 at 1:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The main thing I was wondering about in connection with that
was whether to assume that there could be other future applications
of the logic to perform multi-bitmapset union, intersection,
etc. If so, then I'd be inclined to choose different naming and
put those functions in or near to bitmapset.c. It doesn't look
like Amit's code needs anything like that, but maybe somebody
has an idea about other applications?

Yes, simple storage of multiple Bitmapsets in a List somewhere in a
parse/plan tree sounded like that would have wider enough use to add
proper node support for. Assuming you mean trying to generalize
VarAttnoSet in your patch 0004 posted at [2], I wonder if you want to
somehow make its indexability by varno / RT index a part of the
interface of the generic code you're thinking for it?

For discussion's sake, here's my current version of that 0004 patch,
rewritten to use list-of-bitmapset as the data structure. (This
could actually be pulled out of the outer-join-vars patchset and
committed independently, just as a minor performance improvement.
It doesn't quite apply cleanly to HEAD, but pretty close.)

As it stands, the new functions are still in util/clauses.c, but
if we think they could be of general use it'd make sense to move them
either to nodes/bitmapset.c or to some new file under backend/nodes.

Some other thoughts:

* The multi_bms prefix is a bit wordy, so I was thinking of shortening
the function names to mbms_xxx. Maybe that's too brief.

* This is a pretty short list of functions so far. I'm not eager
to build out a bunch of dead code though. Is it OK to leave it
with just this much functionality until someone needs more?

* I'm a little hesitant about whether the API actually should be
List-of-Bitmapset, or some dedicated struct as I had in the previous
version of 0004. This version is way less invasive in prepjointree.c
than that was, but the reason is there's ambiguity about what the
forced_null_vars Lists actually contain, which feels error-prone.

Comments?

regards, tom lane

Attachments:

v7-0004-fix-antijoin-recognition.patchtext/x-diff; charset=us-ascii; name=v7-0004-fix-antijoin-recognition.patchDownload+144-25
#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#1)
Re: List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)

On Mon, Nov 14, 2022 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Sat, Nov 12, 2022 at 1:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The main thing I was wondering about in connection with that
was whether to assume that there could be other future applications
of the logic to perform multi-bitmapset union, intersection,
etc. If so, then I'd be inclined to choose different naming and
put those functions in or near to bitmapset.c. It doesn't look
like Amit's code needs anything like that, but maybe somebody
has an idea about other applications?

Yes, simple storage of multiple Bitmapsets in a List somewhere in a
parse/plan tree sounded like that would have wider enough use to add
proper node support for. Assuming you mean trying to generalize
VarAttnoSet in your patch 0004 posted at [2], I wonder if you want to
somehow make its indexability by varno / RT index a part of the
interface of the generic code you're thinking for it?

For discussion's sake, here's my current version of that 0004 patch,
rewritten to use list-of-bitmapset as the data structure. (This
could actually be pulled out of the outer-join-vars patchset and
committed independently, just as a minor performance improvement.
It doesn't quite apply cleanly to HEAD, but pretty close.)

As it stands, the new functions are still in util/clauses.c, but
if we think they could be of general use it'd make sense to move them
either to nodes/bitmapset.c or to some new file under backend/nodes.

These multi_bms_* functions sound generic enough to me, so +1 to put
them in nodes/bitmapset.c. Or even a new file if the API should
involve a dedicated struct enveloping the List as you write below.

Some other thoughts:

* The multi_bms prefix is a bit wordy, so I was thinking of shortening
the function names to mbms_xxx. Maybe that's too brief.

FWIW, multi_bms_* naming sounds fine to me.

* This is a pretty short list of functions so far. I'm not eager
to build out a bunch of dead code though. Is it OK to leave it
with just this much functionality until someone needs more?

+1

* I'm a little hesitant about whether the API actually should be
List-of-Bitmapset, or some dedicated struct as I had in the previous
version of 0004. This version is way less invasive in prepjointree.c
than that was, but the reason is there's ambiguity about what the
forced_null_vars Lists actually contain, which feels error-prone.

Are you thinking of something like a MultiBitmapset that wraps the
multi_bms List? That sounds fine to me. Another option is to make
the generic API be List-of-Bitmapset but keep VarAttnoSet in
prepjointree.c and put the List in it. IMHO, VarAttnoSet is
definitely more self-documenting for that patch's purposes.

+ * The new member is identified by the zero-based index of the List
+ * element it should go into, and the bit number to be set therein.
+ */
+List *
+multi_bms_add_member(List *mbms, int index1, int index2)

The comment sounds a bit ambiguous, especially the ", and the bit
number to be set therein." part. If you meant to describe the
arguments, how about mentioning their names too, as in:

The new member is identified by 'index1', the zero-based index of the
List element it should go into, and 'index2' specifies the bit number
to be set therein.

+   /* Add empty elements to a, as needed */
+   while (list_length(a) < list_length(b))
+       a = lappend(a, NULL);
+   /* forboth will stop at the end of the shorter list, which is fine */

Isn't this comment unnecessary given that the while loop makes both
lists be the same length?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#1)
Re: List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)

On 2022-Nov-14, Tom Lane wrote:

For discussion's sake, here's my current version of that 0004 patch,
rewritten to use list-of-bitmapset as the data structure.

I feel that there should be more commentary that explains what a
multi-bms is. Not sure where, maybe just put it near the function that
first appears in the file.

* The multi_bms prefix is a bit wordy, so I was thinking of shortening
the function names to mbms_xxx. Maybe that's too brief.

I don't think the "ulti_" bytes add a lot, and short names are better.
Either you know what a mbms is, or you don't. If the latter, then you
jump to one of these functions in order to find out what the data
structure is; after that, you can read the code and it should be clear
enough.

* This is a pretty short list of functions so far. I'm not eager
to build out a bunch of dead code though. Is it OK to leave it
with just this much functionality until someone needs more?

I agree with not adding dead code.

* I'm a little hesitant about whether the API actually should be
List-of-Bitmapset, or some dedicated struct as I had in the previous
version of 0004. This version is way less invasive in prepjointree.c
than that was, but the reason is there's ambiguity about what the
forced_null_vars Lists actually contain, which feels error-prone.

Hmm ... if somebody makes a mistake, does the functionality break in
obvious ways, or is it very hard to pinpoint what happened?

+/*
+ * multi_bms_add_member
+ *		Add a new member to a list of bitmapsets.
+ *
+ * This is like bms_add_member, but for lists of bitmapsets.
+ * The new member is identified by the zero-based index of the List
+ * element it should go into, and the bit number to be set therein.
+ */
+List *
+multi_bms_add_member(List *mbms, int index1, int index2)

Maybe s/index1/listidx/ or bitmapidx and s/index2/bitnr/ ?

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#3)
Re: List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2022-Nov-14, Tom Lane wrote:

For discussion's sake, here's my current version of that 0004 patch,
rewritten to use list-of-bitmapset as the data structure.

I feel that there should be more commentary that explains what a
multi-bms is. Not sure where, maybe just put it near the function that
first appears in the file.

Right. I split the new functions out to new files multibitmapset.h/.c,
so that the file headers can carry the overall explanation. (I duplicated
the text between .h and .c, which is also true of bitmapset.h/.c, but
maybe that's overkill.)

* The multi_bms prefix is a bit wordy, so I was thinking of shortening
the function names to mbms_xxx. Maybe that's too brief.

I don't think the "ulti_" bytes add a lot, and short names are better.

Yeah, after sleeping on it I like mbms.

* This is a pretty short list of functions so far. I'm not eager
to build out a bunch of dead code though. Is it OK to leave it
with just this much functionality until someone needs more?

I agree with not adding dead code.

I concluded that the only thing that makes this an odd set of functions
to start out with is the lack of mbms_is_member; it seems asymmetric
to have mbms_add_member but not mbms_is_member. So I added that.
I'm content to let the rest grow out as needed.

* I'm a little hesitant about whether the API actually should be
List-of-Bitmapset, or some dedicated struct as I had in the previous
version of 0004. This version is way less invasive in prepjointree.c
than that was, but the reason is there's ambiguity about what the
forced_null_vars Lists actually contain, which feels error-prone.

Hmm ... if somebody makes a mistake, does the functionality break in
obvious ways, or is it very hard to pinpoint what happened?

Now that Bitmapset is a full-fledged Node type, we can make use of
castNode checks to verify that the input Lists contain what we expect.
That seems probably sufficient to catch coding errors.

+multi_bms_add_member(List *mbms, int index1, int index2)

Maybe s/index1/listidx/ or bitmapidx and s/index2/bitnr/ ?

Right. I used listidx and bitidx.

regards, tom lane

Attachments:

invent-multibitmapsets-v1.patchtext/x-diff; charset=us-ascii; name=invent-multibitmapsets-v1.patchDownload+234-25
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#2)
Re: List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Nov 14, 2022 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

+ * The new member is identified by the zero-based index of the List
+ * element it should go into, and the bit number to be set therein.

The comment sounds a bit ambiguous, especially the ", and the bit
number to be set therein." part. If you meant to describe the
arguments, how about mentioning their names too, as in:

Done that way in the patch I just posted.

+ /* forboth will stop at the end of the shorter list, which is fine */

Isn't this comment unnecessary given that the while loop makes both
lists be the same length?

No, the while loop ensures that a is at least as long as b.
It could have started out longer, though.

regards, tom lane

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#5)
Re: List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)

On Wed, Nov 16, 2022 at 3:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Nov 14, 2022 at 11:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

+ * The new member is identified by the zero-based index of the List
+ * element it should go into, and the bit number to be set therein.

The comment sounds a bit ambiguous, especially the ", and the bit
number to be set therein." part. If you meant to describe the
arguments, how about mentioning their names too, as in:

Done that way in the patch I just posted.

Thanks.

+ /* forboth will stop at the end of the shorter list, which is fine */

Isn't this comment unnecessary given that the while loop makes both
lists be the same length?

No, the while loop ensures that a is at least as long as b.
It could have started out longer, though.

Oops, I missed that case.

The latest version looks pretty good to me.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com