Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
While working on BUG #18187 [1]/messages/by-id/18187-831da249cbd2ff8e@postgresql.org, I noticed that we also have issues with
how SJE replaces join clauses involving the removed rel. As an example,
consider the query below, which would trigger an Assert.
create table t (a int primary key, b int);
explain (costs off)
select * from t t1
inner join t t2 on t1.a = t2.a
left join t t3 on t1.b > 1 and t1.b < 2;
server closed the connection unexpectedly
The Assert failure happens in remove_self_join_rel() when we're trying
to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
share the same pointer of 'required_relids', which is {t1, t3} at first.
After we've performed replace_varno for the first clause, the
required_relids becomes {t2, t3}, which is no problem. However, the
second clause's required_relids also becomes {t2, t3}, because they are
actually the same pointer. So when we proceed with replace_varno on the
second clause, we'd trigger the Assert.
Off the top of my head I'm thinking that we can fix this kind of issue
by bms_copying the bitmapset first before we make a substitution in
replace_relid(), like attached.
Alternatively, we can revise distribute_qual_to_rels() as below so that
different RestrictInfos don't share the same pointer of required_relids.
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node
*clause,
* nonnullable-side rows failing the qual.
*/
Assert(ojscope);
- relids = ojscope;
+ relids = bms_copy(ojscope);
Assert(!pseudoconstant);
}
else
With this way, I'm worrying that there are other places where we should
avoid sharing the same pointer to Bitmapset structure. I'm not sure how
to discover all these places.
Any thoughts?
[1]: /messages/by-id/18187-831da249cbd2ff8e@postgresql.org
/messages/by-id/18187-831da249cbd2ff8e@postgresql.org
Thanks
Richard
Attachments:
v1-0001-Fix-how-SJE-replaces-join-clauses.patchapplication/octet-stream; name=v1-0001-Fix-how-SJE-replaces-join-clauses.patchDownload+28-3
Hi!
Thank you for spotting this and dealing with this.
On Tue, Nov 14, 2023 at 1:15 PM Richard Guo <guofenglinux@gmail.com> wrote:
While working on BUG #18187 [1], I noticed that we also have issues with
how SJE replaces join clauses involving the removed rel. As an example,
consider the query below, which would trigger an Assert.create table t (a int primary key, b int);
explain (costs off)
select * from t t1
inner join t t2 on t1.a = t2.a
left join t t3 on t1.b > 1 and t1.b < 2;
server closed the connection unexpectedlyThe Assert failure happens in remove_self_join_rel() when we're trying
to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
share the same pointer of 'required_relids', which is {t1, t3} at first.
After we've performed replace_varno for the first clause, the
required_relids becomes {t2, t3}, which is no problem. However, the
second clause's required_relids also becomes {t2, t3}, because they are
actually the same pointer. So when we proceed with replace_varno on the
second clause, we'd trigger the Assert.Off the top of my head I'm thinking that we can fix this kind of issue
by bms_copying the bitmapset first before we make a substitution in
replace_relid(), like attached.
I remember, I've removed bms_copy() from here. Now I understand why
that was needed. But I'm still not particularly happy about it. The
reason is that logic of replace_relid() becomes cumbersome. In some
cases it performs modification in-place, while in other cases it
copies.
Alternatively, we can revise distribute_qual_to_rels() as below so that
different RestrictInfos don't share the same pointer of required_relids.--- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, * nonnullable-side rows failing the qual. */ Assert(ojscope); - relids = ojscope; + relids = bms_copy(ojscope); Assert(!pseudoconstant); } elseWith this way, I'm worrying that there are other places where we should
avoid sharing the same pointer to Bitmapset structure. I'm not sure how
to discover all these places.
This looks better to me. However, I'm not sure what the overhead
would be? How much would it increase the memory footprint?
It's possibly dumb option, but what about just removing the assert?
------
Regards,
Alexander Korotkov
Hi,
On 2023-11-14 19:14:57 +0800, Richard Guo wrote:
While working on BUG #18187 [1], I noticed that we also have issues with
how SJE replaces join clauses involving the removed rel. As an example,
consider the query below, which would trigger an Assert.create table t (a int primary key, b int);
explain (costs off)
select * from t t1
inner join t t2 on t1.a = t2.a
left join t t3 on t1.b > 1 and t1.b < 2;
server closed the connection unexpectedlyThe Assert failure happens in remove_self_join_rel() when we're trying
to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
share the same pointer of 'required_relids', which is {t1, t3} at first.
After we've performed replace_varno for the first clause, the
required_relids becomes {t2, t3}, which is no problem. However, the
second clause's required_relids also becomes {t2, t3}, because they are
actually the same pointer. So when we proceed with replace_varno on the
second clause, we'd trigger the Assert.
Good catch.
Off the top of my head I'm thinking that we can fix this kind of issue
by bms_copying the bitmapset first before we make a substitution in
replace_relid(), like attached.Alternatively, we can revise distribute_qual_to_rels() as below so that
different RestrictInfos don't share the same pointer of required_relids.
--- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, * nonnullable-side rows failing the qual. */ Assert(ojscope); - relids = ojscope; + relids = bms_copy(ojscope); Assert(!pseudoconstant); } elseWith this way, I'm worrying that there are other places where we should
avoid sharing the same pointer to Bitmapset structure.
Indeed.
I'm not sure how to discover all these places. Any thoughts?
At the very least I think we should add a mode to bitmapset.c mode where
every modification of a bitmapset reallocates, rather than just when the size
actually changes. Because we only reallocte and free in relatively uncommon
cases, particularly on 64bit systems, it's very easy to not find spots that
continue to use the input pointer to one of the modifying bms functions.
A very hacky implementation of that indeed catches this bug with the existing
regression tests.
The tests do *not* pass with just the attached applied, as the "Delete relid
without substitution" path has the same issue. With that also copying and all
the "reusing" bms* functions always reallocating, the tests pass - kinda.
The kinda because there are callers to bms_(add|del)_members() that pass the
same bms as a and b, which only works if the reallocation happens "late".
Greetings,
Andres
Hi,
On 2023-11-14 14:42:13 +0200, Alexander Korotkov wrote:
It's possibly dumb option, but what about just removing the assert?
That's not at all an option - the in-place bms_* functions can free their
input. So a dangling pointer to the "old" version is a use-after-free waiting
to happen - you just need a query that actually gets to bitmapsets that are a
bit larger.
Greetings,
Andres
On Wed, Nov 15, 2023 at 8:04 AM Andres Freund <andres@anarazel.de> wrote:
On 2023-11-14 14:42:13 +0200, Alexander Korotkov wrote:
It's possibly dumb option, but what about just removing the assert?
That's not at all an option - the in-place bms_* functions can free their
input. So a dangling pointer to the "old" version is a use-after-free waiting
to happen - you just need a query that actually gets to bitmapsets that are a
bit larger.
Yeah, now I got it, thank you. I was under the wrong impression that
bitmapset has the level of indirection, so the pointer remains valid.
Now, I see that bitmapset manipulation functions can do free/repalloc
making the previous bitmapset pointer invalid.
------
Regards,
Alexander Korotkov
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund <andres@anarazel.de> wrote:
On 2023-11-14 19:14:57 +0800, Richard Guo wrote:
While working on BUG #18187 [1], I noticed that we also have issues with
how SJE replaces join clauses involving the removed rel. As an example,
consider the query below, which would trigger an Assert.create table t (a int primary key, b int);
explain (costs off)
select * from t t1
inner join t t2 on t1.a = t2.a
left join t t3 on t1.b > 1 and t1.b < 2;
server closed the connection unexpectedlyThe Assert failure happens in remove_self_join_rel() when we're trying
to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
share the same pointer of 'required_relids', which is {t1, t3} at first.
After we've performed replace_varno for the first clause, the
required_relids becomes {t2, t3}, which is no problem. However, the
second clause's required_relids also becomes {t2, t3}, because they are
actually the same pointer. So when we proceed with replace_varno on the
second clause, we'd trigger the Assert.Good catch.
Off the top of my head I'm thinking that we can fix this kind of issue
by bms_copying the bitmapset first before we make a substitution in
replace_relid(), like attached.Alternatively, we can revise distribute_qual_to_rels() as below so that
different RestrictInfos don't share the same pointer of required_relids.--- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, * nonnullable-side rows failing the qual. */ Assert(ojscope); - relids = ojscope; + relids = bms_copy(ojscope); Assert(!pseudoconstant); } elseWith this way, I'm worrying that there are other places where we should
avoid sharing the same pointer to Bitmapset structure.Indeed.
I'm not sure how to discover all these places. Any thoughts?
At the very least I think we should add a mode to bitmapset.c mode where
every modification of a bitmapset reallocates, rather than just when the size
actually changes. Because we only reallocte and free in relatively uncommon
cases, particularly on 64bit systems, it's very easy to not find spots that
continue to use the input pointer to one of the modifying bms functions.A very hacky implementation of that indeed catches this bug with the existing
regression tests.The tests do *not* pass with just the attached applied, as the "Delete relid
without substitution" path has the same issue. With that also copying and all
the "reusing" bms* functions always reallocating, the tests pass - kinda.The kinda because there are callers to bms_(add|del)_members() that pass the
same bms as a and b, which only works if the reallocation happens "late".
+1,
Neat idea. I'm willing to work on this. Will propose the patch soon.
------
Regards,
Alexander Korotkov
On Wed, Nov 15, 2023 at 5:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund <andres@anarazel.de> wrote:
The kinda because there are callers to bms_(add|del)_members() that pass the
same bms as a and b, which only works if the reallocation happens "late".+1,
Neat idea. I'm willing to work on this. Will propose the patch soon.
It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
each modification. I also find it useful to add assert to all
bitmapset functions on argument NodeTag. This allows you to find
access to hanging pointers earlier.
I had the feeling of falling into a rabbit hole while debugging all
the cases of failure with this new option. With the second patch
regressions tests pass.
Any thoughts?
------
Regards,
Alexander Korotkov
Attachments:
0001-REALLOCATE_BITMAPSETS-manual-compile-time-option-v1.patchapplication/octet-stream; name=0001-REALLOCATE_BITMAPSETS-manual-compile-time-option-v1.patchDownload+138-15
0002-Make-regression-tests-pass-with-REALLOCATE_BITMAP-v1.patchapplication/octet-stream; name=0002-Make-regression-tests-pass-with-REALLOCATE_BITMAP-v1.patchDownload+6-7
On Sun, Nov 19, 2023 at 6:48 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Nov 15, 2023 at 5:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund <andres@anarazel.de> wrote:
The kinda because there are callers to bms_(add|del)_members() that pass the
same bms as a and b, which only works if the reallocation happens "late".+1,
Neat idea. I'm willing to work on this. Will propose the patch soon.It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
each modification. I also find it useful to add assert to all
bitmapset functions on argument NodeTag. This allows you to find
access to hanging pointers earlier.
Creating separate patches for REALLOCATE_BITMAPSETs and
Assert(ISA(Bitmapset)) will be easier to review. We will be able to
check whether all the places that require either of the fixes have
been indeed fixed and correctly. I kept switching back and forth.
I had the feeling of falling into a rabbit hole while debugging all
the cases of failure with this new option. With the second patch
regressions tests pass.
I think this will increase memory consumption when planning queries
with partitioned tables (100s or 1000s of partitions). Have you tried
measuring the impact?
We should take hit on memory consumption when there is correctness
involved but not all these cases look correctness problems. For
example. RelOptInfo::left_relids or SpecialJoinInfo::syn_lefthand may
not get modified after they are set. But just because
RelOptInfo::relids of a lower relation was assigned somewhere which
got modified, these two get modified. bms_copy() in
make_specialjoininfo may not be necessary. I haven't tried that myself
so I may be wrong.
What might be useful is to mark a bitmap as "final" once it's know
that it can not change. e.g. RelOptInfo->relids once set never
changes. Each operation that modifies a Bitmapset throws an
error/Asserts if it's marked as "final", thus catching the places
where we expect a Bitmapset being modified when not intended. This
will catch shared bitmapsets as well. We could apply bms_copy in only
those cases then.
--
Best Wishes,
Ashutosh Bapat
On Sun, Nov 19, 2023 at 9:17 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
each modification.
+1 to the idea of introducing a reallocation mode to Bitmapset.
I had the feeling of falling into a rabbit hole while debugging all
the cases of failure with this new option. With the second patch
regressions tests pass.
It seems to me that we have always had situations where we share the
same pointer to a Bitmapset structure across different places. I do not
think this is a problem as long as we do not modify the Bitmapsets in a
way that requires reallocation or impact the locations sharing the same
pointer.
So I'm wondering, instead of attempting to avoid sharing pointer to
Bitmapset in all locations that have problems, can we simply bms_copy
the original Bitmapset within replace_relid() before making any
modifications, as I proposed previously? Of course, as Andres pointed
out, we need to do so also for the "Delete relid without substitution"
path. Please see the attached.
Thanks
Richard
Attachments:
v2-0001-Fix-how-SJE-replaces-join-clauses.patchapplication/octet-stream; name=v2-0001-Fix-how-SJE-replaces-join-clauses.patchDownload+29-4
On Thu, Nov 23, 2023 at 4:33 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Sun, Nov 19, 2023 at 9:17 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
each modification.+1 to the idea of introducing a reallocation mode to Bitmapset.
I had the feeling of falling into a rabbit hole while debugging all
the cases of failure with this new option. With the second patch
regressions tests pass.It seems to me that we have always had situations where we share the
same pointer to a Bitmapset structure across different places. I do not
think this is a problem as long as we do not modify the Bitmapsets in a
way that requires reallocation or impact the locations sharing the same
pointer.So I'm wondering, instead of attempting to avoid sharing pointer to
Bitmapset in all locations that have problems, can we simply bms_copy
the original Bitmapset within replace_relid() before making any
modifications, as I proposed previously? Of course, as Andres pointed
out, we need to do so also for the "Delete relid without substitution"
path. Please see the attached.
Yes, this makes sense. Thank you for the patch. My initial point was
that replace_relid() should either do in-place in all cases or make a
copy in all cases. Now I see that it should make a copy in all cases.
Note, that without making a copy in delete case, regression tests fail
with REALLOCATE_BITMAPSETS on.
Please, find the revised patchset. As Ashutosh Bapat asked, asserts
are split into separate patch.
------
Regards,
Alexander Korotkov
Attachments:
0003-Make-replace_relid-leave-argument-unmodified-v2.patchapplication/octet-stream; name=0003-Make-replace_relid-leave-argument-unmodified-v2.patchDownload+29-4
0001-Add-asserts-to-bimapset-manipulation-functions-v2.patchapplication/octet-stream; name=0001-Add-asserts-to-bimapset-manipulation-functions-v2.patchDownload+63-15
0002-REALLOCATE_BITMAPSETS-manual-compile-time-option-v2.patchapplication/octet-stream; name=0002-REALLOCATE_BITMAPSETS-manual-compile-time-option-v2.patchDownload+83-1
On Fri, Nov 24, 2023 at 3:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Nov 23, 2023 at 4:33 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Sun, Nov 19, 2023 at 9:17 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
each modification.+1 to the idea of introducing a reallocation mode to Bitmapset.
I had the feeling of falling into a rabbit hole while debugging all
the cases of failure with this new option. With the second patch
regressions tests pass.It seems to me that we have always had situations where we share the
same pointer to a Bitmapset structure across different places. I do not
think this is a problem as long as we do not modify the Bitmapsets in a
way that requires reallocation or impact the locations sharing the same
pointer.So I'm wondering, instead of attempting to avoid sharing pointer to
Bitmapset in all locations that have problems, can we simply bms_copy
the original Bitmapset within replace_relid() before making any
modifications, as I proposed previously? Of course, as Andres pointed
out, we need to do so also for the "Delete relid without substitution"
path. Please see the attached.Yes, this makes sense. Thank you for the patch. My initial point was
that replace_relid() should either do in-place in all cases or make a
copy in all cases. Now I see that it should make a copy in all cases.
Note, that without making a copy in delete case, regression tests fail
with REALLOCATE_BITMAPSETS on.Please, find the revised patchset. As Ashutosh Bapat asked, asserts
are split into separate patch.
Any objections to pushing this?
------
Regards,
Alexander Korotkov
On Mon, Nov 27, 2023 at 6:35 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Fri, Nov 24, 2023 at 3:54 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Nov 23, 2023 at 4:33 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Sun, Nov 19, 2023 at 9:17 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
each modification.+1 to the idea of introducing a reallocation mode to Bitmapset.
I had the feeling of falling into a rabbit hole while debugging all
the cases of failure with this new option. With the second patch
regressions tests pass.It seems to me that we have always had situations where we share the
same pointer to a Bitmapset structure across different places. I do not
think this is a problem as long as we do not modify the Bitmapsets in a
way that requires reallocation or impact the locations sharing the same
pointer.So I'm wondering, instead of attempting to avoid sharing pointer to
Bitmapset in all locations that have problems, can we simply bms_copy
the original Bitmapset within replace_relid() before making any
modifications, as I proposed previously? Of course, as Andres pointed
out, we need to do so also for the "Delete relid without substitution"
path. Please see the attached.Yes, this makes sense. Thank you for the patch. My initial point was
that replace_relid() should either do in-place in all cases or make a
copy in all cases. Now I see that it should make a copy in all cases.
Note, that without making a copy in delete case, regression tests fail
with REALLOCATE_BITMAPSETS on.Please, find the revised patchset. As Ashutosh Bapat asked, asserts
are split into separate patch.Any objections to pushing this?
Did we at least measure the memory impact?
How do we ensure that we are not making unnecessary copies of Bitmapsets?
--
Best Wishes,
Ashutosh Bapat
On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
How do we ensure that we are not making unnecessary copies of Bitmapsets?
We don't - but that's not specific to this patch. Bitmapsets typically aren't
very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than it'd save,
both in time and memory.
I am a bit worried about the maintainability of remove_rel_from_query() et
al. Is there any infrastructure for detecting that some PlannerInfo field that
needs updating wasn't updated? There's not even a note in PlannerInfo that
documents that that needs to happen.
On Mon, Nov 27, 2023 at 8:07 PM Andres Freund <andres@anarazel.de> wrote:
On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
How do we ensure that we are not making unnecessary copies of Bitmapsets?
We don't - but that's not specific to this patch. Bitmapsets typically aren't
very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than it'd save,
both in time and memory.I am a bit worried about the maintainability of remove_rel_from_query() et
al. Is there any infrastructure for detecting that some PlannerInfo field that
needs updating wasn't updated? There's not even a note in PlannerInfo that
documents that that needs to happen.
That makes sense, thank you. We need at least a comment about this.
I'll write a patch adding this comment.
BTW, what do you think about the patches upthread [1].
Links
1. /messages/by-id/CAPpHfdtLgCryACcrmLv=Koq9rAB3=tr5y9D84dGgvUhSCvjzjg@mail.gmail.com
------
Regards,
Alexander Korotkov
On 28/11/2023 01:37, Alexander Korotkov wrote:
On Mon, Nov 27, 2023 at 8:07 PM Andres Freund <andres@anarazel.de> wrote:
Sorry for the late answer, I missed this thread because of vacation.
On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
How do we ensure that we are not making unnecessary copies of Bitmapsets?
We don't - but that's not specific to this patch. Bitmapsets typically aren't
very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than it'd save,
both in time and memory.
I'd already clashed with Tom on copying the required_relids field and
voluntarily made unnecessary copies in the project [1]Asymmetric partition-wise JOIN /messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com.
And ... stuck into huge memory consumption. The reason was in Bitmapsets:
When we have 1E3-1E4 partitions and try to reparameterize a join, one
bitmapset field can have a size of about 1kB. Having bitmapset
referencing Relation with a large index value, we had a lot of (for
example, 1E4 * 1kB) copies on each reparametrization of such a field.
Alexander Pyhalov should remember that case.
I don't claim we will certainly catch such an issue here, but it is a
reason why we should look at this option carefully.
I am a bit worried about the maintainability of remove_rel_from_query() et
al. Is there any infrastructure for detecting that some PlannerInfo field that
needs updating wasn't updated? There's not even a note in PlannerInfo that
documents that that needs to happen.
Thanks you for highlighting this issue.> That makes sense, thank you.
We need at least a comment about this.
I'll write a patch adding this comment.
BTW, what do you think about the patches upthread [1].
Links
1. /messages/by-id/CAPpHfdtLgCryACcrmLv=Koq9rAB3=tr5y9D84dGgvUhSCvjzjg@mail.gmail.com
0001 - Looks good and can be applied.
0002 - I am afraid the problems with expanded range table entries are
likewise described above. The patch makes sense, but it requires time to
reproduce corner cases. Maybe we can do it separately from the current
hotfix?
0003 - I think it is really what we need right now: SJE is quite a rare
optimization and executes before the entries expansion procedure. So it
looks less risky.
[1]: Asymmetric partition-wise JOIN /messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com
/messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com
--
regards,
Andrei Lepikhov
Postgres Professional
Andrei Lepikhov писал(а) 2023-12-08 07:37:
On 28/11/2023 01:37, Alexander Korotkov wrote:
On Mon, Nov 27, 2023 at 8:07 PM Andres Freund <andres@anarazel.de>
wrote:Sorry for the late answer, I missed this thread because of vacation.
On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
How do we ensure that we are not making unnecessary copies of
Bitmapsets?We don't - but that's not specific to this patch. Bitmapsets
typically aren't
very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than
it'd save,
both in time and memory.I'd already clashed with Tom on copying the required_relids field and
voluntarily made unnecessary copies in the project [1].
And ... stuck into huge memory consumption. The reason was in
Bitmapsets:
When we have 1E3-1E4 partitions and try to reparameterize a join, one
bitmapset field can have a size of about 1kB. Having bitmapset
referencing Relation with a large index value, we had a lot of (for
example, 1E4 * 1kB) copies on each reparametrization of such a field.
Alexander Pyhalov should remember that case.
Yes. If it matters, this happened during reparametrization when 2
partitioned tables with 1000 partitions each were joined. Then
asymmetric pw join managed to eat lots of memory for bitmapsets (by
lots of memory I mean all available on the test VM).
[1] Asymmetric partition-wise JOIN
/messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com
--
Best regards,
Alexander Pyhalov,
Postgres Professional
On Fri, Dec 8, 2023 at 12:43 PM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:
Andrei Lepikhov писал(а) 2023-12-08 07:37:
On 28/11/2023 01:37, Alexander Korotkov wrote:
On Mon, Nov 27, 2023 at 8:07 PM Andres Freund <andres@anarazel.de>
wrote:Sorry for the late answer, I missed this thread because of vacation.
On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
How do we ensure that we are not making unnecessary copies of
Bitmapsets?We don't - but that's not specific to this patch. Bitmapsets
typically aren't
very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than
it'd save,
both in time and memory.I'd already clashed with Tom on copying the required_relids field and
voluntarily made unnecessary copies in the project [1].
And ... stuck into huge memory consumption. The reason was in
Bitmapsets:
When we have 1E3-1E4 partitions and try to reparameterize a join, one
bitmapset field can have a size of about 1kB. Having bitmapset
referencing Relation with a large index value, we had a lot of (for
example, 1E4 * 1kB) copies on each reparametrization of such a field.
Alexander Pyhalov should remember that case.Yes. If it matters, this happened during reparametrization when 2
partitioned tables with 1000 partitions each were joined. Then
asymmetric pw join managed to eat lots of memory for bitmapsets (by
lots of memory I mean all available on the test VM).
I did some analysis of memory consumption by bitmapsets in such cases.
[1]: https://docs.google.com/presentation/d/1S9BiAADhX-Fv9tDbx5R5Izq4blAofhZMhHcO1c-wzfI/edit?usp=sharing
crude and quite WIP. But they will give some idea.
[1]: https://docs.google.com/presentation/d/1S9BiAADhX-Fv9tDbx5R5Izq4blAofhZMhHcO1c-wzfI/edit?usp=sharing
--
Best Wishes,
Ashutosh Bapat
Hi, Ashutosh!
On Fri, Dec 8, 2023 at 3:28 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
I did some analysis of memory consumption by bitmapsets in such cases.
[1] contains slides with the result of this analysis. The slides are
crude and quite WIP. But they will give some idea.[1] https://docs.google.com/presentation/d/1S9BiAADhX-Fv9tDbx5R5Izq4blAofhZMhHcO1c-wzfI/edit?usp=sharing
Thank you for sharing your analysis. I understand that usage of a
plain bitmap becomes a problem with a large number of partitions. But
I wonder what does "post proposed fixes" mean? Is it the fixes posted
in [1]. If so it's very surprising for me they are reducing the
memory footprint size.
Links.
1. /messages/by-id/CAPpHfdtLgCryACcrmLv=Koq9rAB3=tr5y9D84dGgvUhSCvjzjg@mail.gmail.com
------
Regards,
Alexander Korotkov
On Fri, Dec 8, 2023 at 3:13 PM Alexander Pyhalov <a.pyhalov@postgrespro.ru>
wrote:
Andrei Lepikhov писал(а) 2023-12-08 07:37:
I'd already clashed with Tom on copying the required_relids field and
voluntarily made unnecessary copies in the project [1].
And ... stuck into huge memory consumption. The reason was in
Bitmapsets:
When we have 1E3-1E4 partitions and try to reparameterize a join, one
bitmapset field can have a size of about 1kB. Having bitmapset
referencing Relation with a large index value, we had a lot of (for
example, 1E4 * 1kB) copies on each reparametrization of such a field.
Alexander Pyhalov should remember that case.Yes. If it matters, this happened during reparametrization when 2
partitioned tables with 1000 partitions each were joined. Then
asymmetric pw join managed to eat lots of memory for bitmapsets (by
lots of memory I mean all available on the test VM).
By reparametrization did you mean the work done in
reparameterize_path_by_child()? If so maybe you'd be interested in the
patch [1]/messages/by-id/CAMbWs48PBwe1YadzgKGW_ES=V9BZhq00BaZTOTM6Oye8n_cDNg@mail.gmail.com which postpones reparameterization of paths until createplan.c
and thus can help avoid unnecessary reparametrization work.
[1]: /messages/by-id/CAMbWs48PBwe1YadzgKGW_ES=V9BZhq00BaZTOTM6Oye8n_cDNg@mail.gmail.com
/messages/by-id/CAMbWs48PBwe1YadzgKGW_ES=V9BZhq00BaZTOTM6Oye8n_cDNg@mail.gmail.com
Thanks
Richard
On 11/12/2023 09:31, Richard Guo wrote:
On Fri, Dec 8, 2023 at 3:13 PM Alexander Pyhalov
<a.pyhalov@postgrespro.ru <mailto:a.pyhalov@postgrespro.ru>> wrote:
Andrei Lepikhov писал(а) 2023-12-08 07:37:I'd already clashed with Tom on copying the required_relids field
and
voluntarily made unnecessary copies in the project [1].
And ... stuck into huge memory consumption. The reason was in
Bitmapsets:
When we have 1E3-1E4 partitions and try to reparameterize a join,one
bitmapset field can have a size of about 1kB. Having bitmapset
referencing Relation with a large index value, we had a lot of (for
example, 1E4 * 1kB) copies on each reparametrization of such afield.
Alexander Pyhalov should remember that case.
Yes. If it matters, this happened during reparametrization when 2
partitioned tables with 1000 partitions each were joined. Then
asymmetric pw join managed to eat lots of memory for bitmapsets (by
lots of memory I mean all available on the test VM).
By reparametrization did you mean the work done in
reparameterize_path_by_child()? If so maybe you'd be interested in the
patch [1] which postpones reparameterization of paths until createplan.c
and thus can help avoid unnecessary reparametrization work.
Yeah, I have discovered it already. It is a promising solution and only
needs a bit more review. But here, I embraced some corner cases with the
idea that we may not see other cases right now. And also, sometimes the
Bitmapset field is significant - it is not a corner case.
--
regards,
Andrei Lepikhov
Postgres Professional