Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Started by Ashutosh Bapatover 2 years ago58 messages
Jump to latest
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com

Hi All,

Following up on [1]/messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com ...

A restrictlist is a list of RestrictInfo nodes, each representing one
clause, applicable to a set of joins. When partitionwise join is used
as a strategy, the restrictlists for child join are obtained by
translating the restrictlists for the parent in
try_partitionwise_join(). That function is called once for every join
order. E.g. when computing join ABC, it will be called for planning
joins AB, BC, AC, (AB)C, A(BC), B(AC). Every time it is called it will
translate the given parent restrictlist. This means that a
RestrictInfo node applicable to given child relations will be
translated as many times as the join orders in which those child
relations appear in different joining relations.

For example, consider a query "select * from A, B, C where A.a = B.a
and B.a = C.a" where A, B and C are partitioned tables. A has
partitions A1, A2, ... An. B has partitions B1, B2, ... Bn and C has
partitions C1, C2, ... Cn. Partitions Ai, Bi and Ci are matching
partitions respectively for all i's. The join ABC is computed as
Append of (A1B1C1, A2B2C2, ... AnBnCn). The clause A.a = B.a is
translated to A1.a = B1.a thrice, when computing A1B1, A1(B1C1) and
B1(A1C1) respectively. Similarly other clauses are translated multiple
times. Some extra translations also happen in
reparameterize_path_by_child().

These translations consume memory which remains allocated till the
statement finishes. A ResrtictInfo should be translated only once per
parent-child pair, thus avoiding consuming extra memory.

There are two patches attached
0001 - to measure memory consumption during planning. This is the same
one as attached to [1]/messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com.
0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.

Every translated RestrictInfo is stored in the top parent's
RestrictInfo child_rinfos. RestrictInfo::required_relids is used as a
key to search a given translation. I have intercepted
adjust_appendrel_attrs_mutator() to track translations as well as
avoid multiple translations. It first looks for an existing
translation when translating a RestrictInfo and creates a new one only
when one doesn't exist already.

Using this patch the memory consumption for the above query reduces as follows

Number of partitions: 1000

Number of tables | without patch | with patch | % reduction |
being joined | | | |
--------------------------------------------------------------
2 | 40.3 MiB | 37.7 MiB | 6.43% |
3 | 146.8 MiB | 133.0 MiB | 9.42% |
4 | 445.4 MiB | 389.5 MiB | 12.57% |
5 | 1563.2 MiB | 1243.2 MiB | 20.47% |

The number of times a RestrictInfo requires to be translated increases
exponentially with the number of tables joined. Thus we see more
memory saved as the number of tables joined increases. When two tables
are joined there's only a single join planned so no extra translations
happen in try_partitionwise_join(). The memory saved in case of 2
joining tables comes from avoiding extra translations happening during
reparameterization of paths (in reparameterize_path_by_child()).

The attached patch is to show how much memory can be saved if we avoid
extra translation. But I want to discuss the following things about
the approach.

1. The patch uses RestrictInfo::required_relids as the key for
searching child RelOptInfos. I am not sure which of the two viz.
required_relids and clause_relids is a better key. required_relids
seems to be a subset of clause_relids and from the description it
looks like that's the set that decides the applicability of a clause
in a join. But clause_relids is obtained from all the Vars that appear
in the clause, so may be that's the one that matters for the
translations. Can somebody guide me?

2. The patch adds two extra pointers per RestrictInfo. They will
remain unused when partitionwise join is not used. Right now, I do not
see any increase in memory consumed by planner because of those
pointers even in case of unpartitioned tables; maybe they are absorbed
in memory alignment. They may show up as extra memory in the future. I
am wondering whether we can instead save and track translations in
PlannerInfo as a hash table using <rinfo_serial, required_relids (or
whatever is the answer to above question) of parent and child
respectively> as key. That will just add one extra pointer in
PlannerInfo when partitionwise join is not used. Please let me know
your suggestions.

3. I have changed adjust_appendrel_attrs_mutator() to return a
translated RestrictInfo if it already exists. IOW, it won't always
return a deep copy of given RestrictInfo as it does today. This can be
fixed by writing wrappers around adjust_appendrel_attrs() to translate
RestrictInfo specifically. But maybe we don't always need deep copies.
Are there any cases when we need translated deep copies of
RestrictInfo? Those cases will require fixing callers of
adjust_appendrel_attrs() instead of the mutator.

4. IIRC, when partitionwise join was implemented we had discussed
creating child RestrictInfos using a login similar to
build_joinrel_restrictlist(). That might be another way to build
RestrictInfo only once and use it multiple times. But we felt that it
was much harder problem to solve since it's not known which partitions
from joining partitioned tables will match and will be joined till we
enter try_partitionwise_join(), so the required RestrictInfos may not
be available in RelOptInfo::joininfo. Let me know your thoughts on
this.

Comments/suggestions welcome.

references
[1]: /messages/by-id/CAExHW5stmOUobE55pMt83r8UxvfCph+Pvo5dNpdrVCsBgXEzDQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

setup.sqlapplication/sql; name=setup.sqlDownload
queries.sqlapplication/sql; name=queries.sqlDownload
0001-Report-memory-used-for-planning-a-query-in--20230718.patchapplication/x-patch; name=0001-Report-memory-used-for-planning-a-query-in--20230718.patchDownload+38-5
0002-Avoid-translating-RestrictInfo-repeatedly-20230718.patchapplication/x-patch; name=0002-Avoid-translating-RestrictInfo-repeatedly-20230718.patchDownload+55-2
#2Richard Guo
guofenglinux@gmail.com
In reply to: Ashutosh Bapat (#1)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Thu, Jul 27, 2023 at 10:06 PM Ashutosh Bapat <
ashutosh.bapat.oss@gmail.com> wrote:

0002 - WIP patch to avoid repeated translations of RestrictInfo.
The WIP patch avoids repeated translations by tracking the child for
which a RestrictInfo is translated and reusing the same translation
every time it is requested. In order to track the translations,
RestrictInfo gets two new members.
1. parent_rinfo - In a child's RestrictInfo this points to the
RestrictInfo applicable to the topmost parent in partition hierarchy.
This is NULL in the topmost parent's RestrictInfo
2. child_rinfos - In a parent's RestrictInfo, this is a list that
contains all the translated child RestrictInfos. In child
RestrictInfos this is NULL.

I haven't looked into the details but with 0002 patch I came across a
crash while planning the query below.

regression=# set enable_partitionwise_join to on;
SET
regression=# EXPLAIN (COSTS OFF)
SELECT * FROM prt1 t1, prt2 t2
WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
server closed the connection unexpectedly

Thanks
Richard

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Richard Guo (#2)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Tue, Aug 8, 2023 at 4:08 PM Richard Guo <guofenglinux@gmail.com> wrote:

I haven't looked into the details but with 0002 patch I came across a
crash while planning the query below.

regression=# set enable_partitionwise_join to on;
SET
regression=# EXPLAIN (COSTS OFF)
SELECT * FROM prt1 t1, prt2 t2
WHERE t1.a = t2.b AND t1.a < 450 AND t2.b > 250 AND t1.b = 0;
server closed the connection unexpectedly

Thanks for the report. PFA the patches with this issue fixed. There is
another issue seen when running partition_join.sql "ERROR: index key
does not match expected index column". I will investigate that issue.

Right now I am looking at the 4th point in my first email in this
thread. [1]/messages/by-id/CAExHW5s=bCLMMq8n_bN6iU+Pjau0DS3z_6Dn6iLE69ESmsPMJQ@mail.gmail.com. I am trying to figure whether that approach would work.
If that approach doesn't work what's the best to track the
translations and also figure out answers to 1, 2, 3 there. Please let
me know your opinions if any.

--
Best Wishes,
Ashutosh Bapat

[1]: /messages/by-id/CAExHW5s=bCLMMq8n_bN6iU+Pjau0DS3z_6Dn6iLE69ESmsPMJQ@mail.gmail.com

Attachments:

0001-Report-memory-used-for-planning-a-query-in--20230808.patchtext/x-patch; charset=US-ASCII; name=0001-Report-memory-used-for-planning-a-query-in--20230808.patchDownload+49-9
0002-Avoid-translating-RestrictInfo-repeatedly-20230808.patchtext/x-patch; charset=US-ASCII; name=0002-Avoid-translating-RestrictInfo-repeatedly-20230808.patchDownload+51-2
#4David Rowley
dgrowleyml@gmail.com
In reply to: Ashutosh Bapat (#1)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

0001 - to measure memory consumption during planning. This is the same
one as attached to [1].

I see you're recording the difference in the CurrentMemoryContext of
palloc'd memory before and after planning. That won't really alert us
to problems if the planner briefly allocates, say 12GBs of memory
during, say the join search then quickly pfree's it again. unless
it's an oversized chunk, aset.c won't free() any memory until
MemoryContextReset(). Chunks just go onto a freelist for reuse later.
So at the end of planning, the context may still have that 12GBs
malloc'd, yet your new EXPLAIN ANALYZE property might end up just
reporting a tiny fraction of that.

I wonder if it would be more useful to just have ExplainOneQuery()
create a new memory context and switch to that and just report the
context's mem_allocated at the end.

It's also slightly annoying that these planner-related summary outputs
are linked to EXPLAIN ANALYZE. We could be showing them in EXPLAIN
without ANALYZE. If we were to change that now, it might be a bit
annoying for the regression tests as we'd need to go and add SUMMARY
OFF in a load of places...

drowley@amd3990x:~/pg_src/src/test/regress/sql$ git grep -i "costs off" | wc -l
1592

hmm, that would cause a bit of churn... :-(

David

#5Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: David Rowley (#4)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Wed, Aug 9, 2023 at 10:12 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 28 Jul 2023 at 02:06, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

0001 - to measure memory consumption during planning. This is the same
one as attached to [1].

I have started a separate thread to discuss this patch. I am taking
this discussion to that thread.

--
Best Wishes,
Ashutosh Bapat

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#1)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

I spent some time on 4th point below but also looked at other points.
Here's what I have found so far

On Thu, Jul 27, 2023 at 7:35 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

1. The patch uses RestrictInfo::required_relids as the key for
searching child RelOptInfos. I am not sure which of the two viz.
required_relids and clause_relids is a better key. required_relids
seems to be a subset of clause_relids and from the description it
looks like that's the set that decides the applicability of a clause
in a join. But clause_relids is obtained from all the Vars that appear
in the clause, so may be that's the one that matters for the
translations. Can somebody guide me?

I was wrong that required_relids is subset of clause_relids. The first
can contain OJ relids which the other can not. OJ relids do not have
any children, so they won't be translated. So clause_relids seems to
be a better key. I haven't made a decision yet.

2. The patch adds two extra pointers per RestrictInfo. They will
remain unused when partitionwise join is not used. Right now, I do not
see any increase in memory consumed by planner because of those
pointers even in case of unpartitioned tables; maybe they are absorbed
in memory alignment. They may show up as extra memory in the future. I
am wondering whether we can instead save and track translations in
PlannerInfo as a hash table using <rinfo_serial, required_relids (or
whatever is the answer to above question) of parent and child
respectively> as key. That will just add one extra pointer in
PlannerInfo when partitionwise join is not used. Please let me know
your suggestions.

I will go ahead with a pointer in PlannerInfo for now.

3. I have changed adjust_appendrel_attrs_mutator() to return a
translated RestrictInfo if it already exists. IOW, it won't always
return a deep copy of given RestrictInfo as it does today. This can be
fixed by writing wrappers around adjust_appendrel_attrs() to translate
RestrictInfo specifically. But maybe we don't always need deep copies.
Are there any cases when we need translated deep copies of
RestrictInfo? Those cases will require fixing callers of
adjust_appendrel_attrs() instead of the mutator.

I think it's better to handle the tracking logic outside
adjust_appendrel_attrs. That will be some code churn but it will be
cleaner and won't affect anything other that partitionwise joins.

4. IIRC, when partitionwise join was implemented we had discussed
creating child RestrictInfos using a login similar to
build_joinrel_restrictlist(). That might be another way to build
RestrictInfo only once and use it multiple times. But we felt that it
was much harder problem to solve since it's not known which partitions
from joining partitioned tables will match and will be joined till we
enter try_partitionwise_join(), so the required RestrictInfos may not
be available in RelOptInfo::joininfo. Let me know your thoughts on
this.

Here's some lengthy description of why I feel translations are better
compared to computing restrictlist and joininfo for a child join from
joining relation's joininfo
Consider a query "select * from p, q, r where p.c1 = q.c1 and q.c1 =
r.c1 and p.c2 + q.c2 < r.c2 and p.c3 != q.c3 and q.c3 != r.c3". The
query has following clauses
1. p.c1 = q.c1
2. q.c1 = r.c1
3. p.c2 + q.c2 < r.c2
4. p.c3 != q.c3
5. q.c3 != r.c3

The first two clauses are added to EC machinery and do not appear in
joininfo. They appear in restrictlist when we construct clauses in
restrictlist from ECs. Let's ignore them for now.

Assume that each table p, q, r has partitions (p1, p2, ...), (q1, q2,
...) and (r1, r2, ... ) respectively. Each triplet (pi, qi,ri) forms
the set of matching partitions from p, q, r respectively for all "i".
Consider join, p1q1r1. We will generate relations p1, q1, r1, p1q1,
p1r1, q1r1 and p1q1r1 while building the last join. Below is
description of how these clauses would look in each of these relations
and the list they appear in when computing that join. Please notice
the numeric suffixes carefully.

p1.
joininfo: p1.c2 + q.c2 < r.c2, p1.c3 != q.c3
restrictlist: <>

q1
joininfo: p.c2 + q1.c2 < r.c2, p.c3 != q1.c3, q1.c3 != r.c3
restrictlist: <>

r1
joininfo: p.c2 + q.c2 < r1.c2, q.c3 != r1.c3
restrictlist: <>

p1q1
joininfo: p1.c2 + q1.c2 < r.c2, q1.c3 != r.c3
restrictlist: p1.c3 != q1.c3

q1r1
joininfo: p.c2 + q1.c2 < r1.c2, p.c3 != q1.c3
restrictlist: q1.c3 != r1.c3

p1r1
joininfo: p1.c2 + q.c2 < r1.c2, p1.c3 != q.c3, q.c3 != r1.c3
restrictlist: <>

p1q1r1
joininfo: <>
restrictlist for (p1q1)r1: p1.c2 + q1.c2 < r1.c2, q1.c3 != r1.c3
restrictlist for (p1r1)q1: p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3, q1.c3 != r1.c3
restrictlist for p1(q1r1): p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3

If we translate the clauses when building join e.g. translate p1.c3 !=
q1.c3 when building p1q1 or p1q1r1, it would cause repeated
translations. So the translations need to be saved in lower relations
when we detect matching partitions and then use these translations.
Something I have done in the attached patches. But the problem is the
same clause reaches its final translation through different
intermediate translations as the join search advances. E.g. the
evolution of p.c2 + q.c2 < r.c2 to p1.c2 + q1.c2 < r1.c2 has three
different intemediate translations at second level of join. Each of
these intermediate translations conflict with each other and none of
them can be saved in any of the second level joins as a candidate for
the last stage translation. Extending the logic in the patches would
make those more complicated.

Another possibility is to avoid the same clause being translated
multiple times when building the join using
RestrictInfo::rinfo_serial. But simply that won't help avoiding
repeated translations caused by different join orders. E.g. we won't
be able to detect that p.c2 + q.c2 < r.c2 has been translated to p1.c2
+ q1.c2 < r1.c2 already when we computed (p1r1)q1 or p1(q1r1) or
(p1q1)r1 whichever was computed earlier. For that we need some
tracking outside the join relations themselves like I did in my first
patch.

Coming back to the problem of generating child restrictlist clauses
from equivalence classes, I think it's easier with some tweaks to pass
child relids down to the minions when dealing with child joins. It
seems to be working as is but I haven't tested it thoroughly.

Obtaining child clauses from parent clauses by translation and
tracking the translations is less complex and may be more efficient
too. I will post a patch on those lines soon.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0003-Construct-child-join-s-joininfo-and-restric-20230811.patchtext/x-patch; charset=US-ASCII; name=0003-Construct-child-join-s-joininfo-and-restric-20230811.patchDownload+253-35
#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#6)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Hi All,

On Fri, Aug 11, 2023 at 6:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Obtaining child clauses from parent clauses by translation and
tracking the translations is less complex and may be more efficient
too. I will post a patch on those lines soon.

PFA patch set to add infrastructure to track RestrictInfo translations
and reuse them.

PlannerInfo gets a new member "struct HTAB *child_rinfo_hash" which is
a hash table of hash tables keyed by RestrictInfo::rinfo_serial. Each
element in the array is a hash table of RestrictInfos keyed by
RestrictInfo::required_relids as explained in my previous email. When
building child clauses when a. building child join rels or b. when
reparameterizing paths, we first access the first level hash table
using RestrictInfo::rinfo_serial of the parent and search the required
translation by computing the child RestrictInfo::required_relids
obtained by translating RestrictInfo::required_relids of the parent
RestrictInfo. If the translation doesn't exist, we create one and add
to the hash table.

RestrictInfo::required_relids is same for a RestrictInfo and its
commuted RestrictInfo. The order of operands is important for
IndexClauses. Hence we track the commuted RestrictInfo in a new field
RestrictInfo::comm_rinfo. RestrictInfo::is_commuted differentiates
between a RestrictInfo and its commuted version. This is explained as
a comment in the patch. This scheme has a minor benefit of saving
memory when the same RestrictInfo is commuted multiple times.

Hash table of hash table is used instead of an array of hash tables
since a. not every rinfo_serial has a RestrictInfo associated with it
b. not every RestrictInfo has translations, c. I don't think the exact
size of this array is not known till the planning ends since we
continue to create new clauses as we create new RelOptInfos. Of
course, an array can be repalloc'ed and unused slots in the array may
not waste a lot of memory. I am open to change hash table to an array
which may be more efficient.

With these set of patches, the memory consumption stands as below
Number of tables | without patch | with patch | % reduction |
being joined | | | |
--------------------------------------------------------------
2 | 40.8 MiB | 37.4 MiB | 8.46% |
3 | 151.6 MiB | 135.0 MiB | 10.96% |
4 | 464.0 MiB | 403.6 MiB | 12.00% |
5 | 1663.9 MiB | 1329.1 MiB | 20.12% |

The patch set is thus
0001 - patch used to measure memory used during planning

0002 - Patch to free intermediate Relids computed by
adjust_child_relids_multilevel(). I didn't test memory consumption for
multi-level partitioning. But this is clear improvement. In that
function we free the AppendInfos array which as many pointers long as
the number of relids. So it doesn't make sense not to free the Relids
which can be {largest relid}/8 bytes long at least.

0003 - patch to save and reuse commuted RestrictInfo. This patch by
itself shows a small memory saving (3%) in the query below where the
same clause is commuted twice. The query does not contain any
partitioned tables.
create table t2 (a int primary key, b int, c int);
create index t2_a_b on t2(a, b);
select * from t2 where 10 = a
Memory used without patch: 13696 bytes
Memory used with patch: 13264 bytes

0004 - Patch which implements the hash table of hash table described
above and also code to avoid repeated RestrictInfo list translations.

I will add this patchset to next commitfest.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Free-intermediate-Relids-created-in-adjust_-20230914.patchtext/x-patch; charset=US-ASCII; name=0002-Free-intermediate-Relids-created-in-adjust_-20230914.patchDownload+12-6
0001-Report-memory-used-for-planning-a-query-in--20230914.patchtext/x-patch; charset=US-ASCII; name=0001-Report-memory-used-for-planning-a-query-in--20230914.patchDownload+49-9
0003-Save-commuted-RestrictInfo-for-later-use-20230914.patchtext/x-patch; charset=US-ASCII; name=0003-Save-commuted-RestrictInfo-for-later-use-20230914.patchDownload+36-2
0004-Avoid-translating-RestrictInfo-repeatedly-20230914.patchtext/x-patch; charset=US-ASCII; name=0004-Avoid-translating-RestrictInfo-repeatedly-20230914.patchDownload+347-22
#8Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#7)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The patch set is thus
0001 - patch used to measure memory used during planning

0002 - Patch to free intermediate Relids computed by
adjust_child_relids_multilevel(). I didn't test memory consumption for
multi-level partitioning. But this is clear improvement. In that
function we free the AppendInfos array which as many pointers long as
the number of relids. So it doesn't make sense not to free the Relids
which can be {largest relid}/8 bytes long at least.

0003 - patch to save and reuse commuted RestrictInfo. This patch by
itself shows a small memory saving (3%) in the query below where the
same clause is commuted twice. The query does not contain any
partitioned tables.
create table t2 (a int primary key, b int, c int);
create index t2_a_b on t2(a, b);
select * from t2 where 10 = a
Memory used without patch: 13696 bytes
Memory used with patch: 13264 bytes

0004 - Patch which implements the hash table of hash table described
above and also code to avoid repeated RestrictInfo list translations.

I will add this patchset to next commitfest.

--
Best Wishes,
Ashutosh Bapat

PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
the squashed version of the latest patch set at
/messages/by-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Free-intermediate-Relids-created-in-adjust_-20231031.patchtext/x-patch; charset=US-ASCII; name=0002-Free-intermediate-Relids-created-in-adjust_-20231031.patchDownload+12-6
0003-Save-commuted-RestrictInfo-for-later-use-20231031.patchtext/x-patch; charset=US-ASCII; name=0003-Save-commuted-RestrictInfo-for-later-use-20231031.patchDownload+36-2
0001-Report-memory-used-for-planning-a-query-in--20231031.patchtext/x-patch; charset=US-ASCII; name=0001-Report-memory-used-for-planning-a-query-in--20231031.patchDownload+148-9
0004-Avoid-translating-RestrictInfo-repeatedly-20231031.patchtext/x-patch; charset=US-ASCII; name=0004-Avoid-translating-RestrictInfo-repeatedly-20231031.patchDownload+347-22
#9Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Ashutosh Bapat (#8)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Hi! Thank you for your work on the subject, I think it's a really useful
feature.

I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to display
memory scheduling information? I agree that this is an important
feature, but I think only for debugging.

Secondly, I noticed that for the child_rinfo_hash key you use a counter
(int) and can it lead to collisions? Why didn't you generate a hash from
childRestrictInfo for this? For example, something like how it is formed
here [0]/messages/by-id/43ad8a48-b980-410d-a83c-5beebf82a4ed@postgrespro.ru.

[0]: /messages/by-id/43ad8a48-b980-410d-a83c-5beebf82a4ed@postgrespro.ru
/messages/by-id/43ad8a48-b980-410d-a83c-5beebf82a4ed@postgrespro.ru

--
Regards,
Alena Rybakina
Postgres Professional

#10Alena Rybakina
lena.ribackina@yandex.ru
In reply to: Alena Rybakina (#9)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On 24.11.2023 13:20, Alena Rybakina wrote:

Hi! Thank you for your work on the subject, I think it's a really
useful feature.

I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to
display memory scheduling information? I agree that this is an
important feature, but I think only for debugging.

Secondly, I noticed that for the child_rinfo_hash key you use a
counter (int) and can it lead to collisions? Why didn't you generate a
hash from childRestrictInfo for this? For example, something like how
it is formed here [0].

[0]
/messages/by-id/43ad8a48-b980-410d-a83c-5beebf82a4ed@postgrespro.ru

Sorry, my first question was not clear, I mean: have you thought about
creating a guc parameter to display memory planning information?

--
Regards,
Alena Rybakina

#11Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alena Rybakina (#10)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Fri, Nov 24, 2023 at 3:56 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:

On 24.11.2023 13:20, Alena Rybakina wrote:

Hi! Thank you for your work on the subject, I think it's a really useful feature.

I've reviewed your patch and I have a few questions.

First of all, have you thought about creating a gun parameter to display memory scheduling information? I agree that this is an important feature, but I think only for debugging.

Not a GUC parameter but I have a proposal to use EXPLAIN for the same. [1]https://commitfest.postgresql.org/45/4492/

Secondly, I noticed that for the child_rinfo_hash key you use a counter (int) and can it lead to collisions? Why didn't you generate a hash from childRestrictInfo for this? For example, something like how it is formed here [0].

Not usually. But that's the only "key" we have to access a set of
sematically same RestrictInfos. Relids is another key to access the
exact RestrictInfo. A child RestrictInfo can not be used since there
will many child RestrictInfos. Similar parent RestrictInfo can not be
used since there will be multiple forms of the same RestrictInfo.

[1]: https://commitfest.postgresql.org/45/4492/

--
Best Wishes,
Ashutosh Bapat

#12vignesh C
vignesh21@gmail.com
In reply to: Ashutosh Bapat (#8)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The patch set is thus
0001 - patch used to measure memory used during planning

0002 - Patch to free intermediate Relids computed by
adjust_child_relids_multilevel(). I didn't test memory consumption for
multi-level partitioning. But this is clear improvement. In that
function we free the AppendInfos array which as many pointers long as
the number of relids. So it doesn't make sense not to free the Relids
which can be {largest relid}/8 bytes long at least.

0003 - patch to save and reuse commuted RestrictInfo. This patch by
itself shows a small memory saving (3%) in the query below where the
same clause is commuted twice. The query does not contain any
partitioned tables.
create table t2 (a int primary key, b int, c int);
create index t2_a_b on t2(a, b);
select * from t2 where 10 = a
Memory used without patch: 13696 bytes
Memory used with patch: 13264 bytes

0004 - Patch which implements the hash table of hash table described
above and also code to avoid repeated RestrictInfo list translations.

I will add this patchset to next commitfest.

--
Best Wishes,
Ashutosh Bapat

PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
the squashed version of the latest patch set at
/messages/by-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.

CFBot shows that the patch does not apply anymore as in [1]http://cfbot.cputube.org/patch_46_4564.log:
=== Applying patches on top of PostgreSQL commit ID
7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
=== applying patch
./0001-Report-memory-used-for-planning-a-query-in--20231031.patch
...
patching file src/test/regress/expected/explain.out
Hunk #5 FAILED at 290.
Hunk #6 succeeded at 545 (offset 4 lines).
1 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/explain.out.rej
patching file src/tools/pgindent/typedefs.list
Hunk #1 succeeded at 1562 (offset 18 lines).

Please post an updated version for the same.

[1]: http://cfbot.cputube.org/patch_46_4564.log

Regards,
Vignesh

#13Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: vignesh C (#12)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Sat, Jan 27, 2024 at 8:26 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 31 Oct 2023 at 10:48, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Sep 14, 2023 at 4:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The patch set is thus
0001 - patch used to measure memory used during planning

0002 - Patch to free intermediate Relids computed by
adjust_child_relids_multilevel(). I didn't test memory consumption for
multi-level partitioning. But this is clear improvement. In that
function we free the AppendInfos array which as many pointers long as
the number of relids. So it doesn't make sense not to free the Relids
which can be {largest relid}/8 bytes long at least.

0003 - patch to save and reuse commuted RestrictInfo. This patch by
itself shows a small memory saving (3%) in the query below where the
same clause is commuted twice. The query does not contain any
partitioned tables.
create table t2 (a int primary key, b int, c int);
create index t2_a_b on t2(a, b);
select * from t2 where 10 = a
Memory used without patch: 13696 bytes
Memory used with patch: 13264 bytes

0004 - Patch which implements the hash table of hash table described
above and also code to avoid repeated RestrictInfo list translations.

I will add this patchset to next commitfest.

--
Best Wishes,
Ashutosh Bapat

PFA rebased patches. Nothing changes in 0002, 0003 and 0004. 0001 is
the squashed version of the latest patch set at
/messages/by-id/CAExHW5sCJX7696sF-OnugAiaXS=Ag95=-m1cSrjcmyYj8Pduuw@mail.gmail.com.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
7014c9a4bba2d1b67d60687afb5b2091c1d07f73 ===
=== applying patch
./0001-Report-memory-used-for-planning-a-query-in--20231031.patch
...
patching file src/test/regress/expected/explain.out
Hunk #5 FAILED at 290.
Hunk #6 succeeded at 545 (offset 4 lines).
1 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/explain.out.rej
patching file src/tools/pgindent/typedefs.list
Hunk #1 succeeded at 1562 (offset 18 lines).

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4564.log

Regards,
Vignesh

Thanks Vignesh for the notification. PFA rebased patches. 0001 in
earlier patch-set is now removed.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0002-Save-commuted-RestrictInfo-for-later-use-20240130.patchtext/x-patch; charset=US-ASCII; name=0002-Save-commuted-RestrictInfo-for-later-use-20240130.patchDownload+36-2
0001-Free-intermediate-Relids-created-in-adjust_-20240130.patchtext/x-patch; charset=US-ASCII; name=0001-Free-intermediate-Relids-created-in-adjust_-20240130.patchDownload+12-6
0003-Avoid-translating-RestrictInfo-repeatedly-20240130.patchtext/x-patch; charset=US-ASCII; name=0003-Avoid-translating-RestrictInfo-repeatedly-20240130.patchDownload+347-22
#14Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ashutosh Bapat (#13)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Hi,

After taking a look at the patch optimizing SpecialJoinInfo allocations,
I decided to take a quick look at this one too. I don't have any
specific comments on the code yet, but it seems quite a bit more complex
than the other patch ... it's introducing a HTAB into the optimizer,
surely that has costs too.

I started by doing the same test as with the other patch, comparing
master to the two patches (applied independently and both). And I got
about this (in MB):

tables master sjinfo rinfo both
-----------------------------------------------
2 37 36 34 33
3 138 129 122 113
4 421 376 363 318
5 1495 1254 1172 931

Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
planning time for this one.

The reduction in allocated memory is nice. I wonder what's allocating
the remaining memory, and we'd have to do to reduce it further.

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?

AFAIK we don't expect this to improve "regular" cases with modest number
of tables / partitions, etc. But could it make them slower in some case?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Andrei Lepikhov
lepihov@gmail.com
In reply to: Tomas Vondra (#14)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On 19/2/2024 06:05, Tomas Vondra wrote:

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?

The planner principle of cleaning up all allocated structures after the
optimization stage simplifies development and code. But, if we want to
achieve horizontal scalability on many partitions, we should introduce
per-partition memory context and reset it in between. GEQO already has a
short-lived memory context, making designing extensions a bit more
challenging but nothing too painful.

--
regards,
Andrei Lepikhov
Postgres Professional

#16Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Tomas Vondra (#14)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

After taking a look at the patch optimizing SpecialJoinInfo allocations,
I decided to take a quick look at this one too. I don't have any
specific comments on the code yet, but it seems quite a bit more complex
than the other patch ... it's introducing a HTAB into the optimizer,
surely that has costs too.

Thanks for looking into this too.

I started by doing the same test as with the other patch, comparing
master to the two patches (applied independently and both). And I got
about this (in MB):

tables master sjinfo rinfo both
-----------------------------------------------
2 37 36 34 33
3 138 129 122 113
4 421 376 363 318
5 1495 1254 1172 931

Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
planning time for this one.

Yeah. That agreed with my observation as well.

The reduction in allocated memory is nice. I wonder what's allocating
the remaining memory, and we'd have to do to reduce it further.

Please see reply to SpecialJoinInfo thread. Other that the current
patches, we require memory efficient Relids implementation. I have
shared some ideas in the slides I shared in the other thread, but
haven't spent time experimenting with any ideas there.

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?

Yuya has been working on reducing planning time [1]/messages/by-id/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN=D6rPHcao5gcXanY5w@mail.gmail.com. Has some
significant improvements in that area based on my experiments. But
those patches are complex and still WIP.

AFAIK we don't expect this to improve "regular" cases with modest number
of tables / partitions, etc. But could it make them slower in some case?

AFAIR, my experiments did not show any degradation in regular cases
with modest number of tables/partitions. The variation in planning
time was with the usual planning time variations.

[1]: /messages/by-id/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN=D6rPHcao5gcXanY5w@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

#17Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#16)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Mon, Feb 19, 2024 at 5:17 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

On Mon, Feb 19, 2024 at 4:35 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

After taking a look at the patch optimizing SpecialJoinInfo allocations,
I decided to take a quick look at this one too. I don't have any
specific comments on the code yet, but it seems quite a bit more complex
than the other patch ... it's introducing a HTAB into the optimizer,
surely that has costs too.

Thanks for looking into this too.

I started by doing the same test as with the other patch, comparing
master to the two patches (applied independently and both). And I got
about this (in MB):

tables master sjinfo rinfo both
-----------------------------------------------
2 37 36 34 33
3 138 129 122 113
4 421 376 363 318
5 1495 1254 1172 931

Unlike the SpecialJoinInfo patch, I haven't seen any reduction in
planning time for this one.

Yeah. That agreed with my observation as well.

The reduction in allocated memory is nice. I wonder what's allocating
the remaining memory, and we'd have to do to reduce it further.

Please see reply to SpecialJoinInfo thread. Other that the current
patches, we require memory efficient Relids implementation. I have
shared some ideas in the slides I shared in the other thread, but
haven't spent time experimenting with any ideas there.

However, this is a somewhat extreme example - it's joining 5 tables,
each with 1000 partitions, using a partition-wise join. It reduces the
amount of memory, but the planning time is still quite high (and
essentially the same as without the patch). So it's not like it'd make
them significantly more practical ... do we have any other ideas/plans
how to improve that?

Yuya has been working on reducing planning time [1]. Has some
significant improvements in that area based on my experiments. But
those patches are complex and still WIP.

AFAIK we don't expect this to improve "regular" cases with modest number
of tables / partitions, etc. But could it make them slower in some case?

AFAIR, my experiments did not show any degradation in regular cases
with modest number of tables/partitions. The variation in planning
time was with the usual planning time variations.

Documenting some comments from todays' patch review session
1. Instead of a nested hash table, it might be better to use a flat hash
table to save more memory.
2. new comm_rinfo member in RestrictInfo may have problems when copying
RestrictInfo or translating it. Instead commuted versions may be tracked
outside RestrictInfo

Combining the above two, it feels like we need a single hash table with
(commuted, rinfo_serial, relids) as key to store all the translations of a
RestrictInfo.

--
Best Wishes,
Ashutosh Bapat

#18Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#17)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Sorry for the delay in my response.

On Wed, May 29, 2024 at 9:34 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Documenting some comments from todays' patch review session

I forgot to mention back then that both of the suggestions below came
from Tom Lane.

1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory.

Done. It indeed saves memory without impacting planning time.

2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commuted versions may be tracked outside RestrictInfo

After commit 767c598954bbf72e0535f667e2e0667765604b2a,
repameterization of parent paths for child relation happen at the time
of creating the plan. This reduces the number of child paths produced
by reparameterization and also reduces the number of RestrictInfos
that get translated during reparameterization. During
reparameterization commuted parent RestrictInfos are required to be
translated to child RestrictInfos. Before the commit, this led to
translating the same commuted parent RestrictInfo multiple times.
After the commit, only one path gets reparameterized for a given
parent and child pair. Hence we do not produce multiple copies of the
same commuted child RestrictInfo. Hence we don't need to keep track of
commuted child RestrictInfos anymore. Removed that portion of code
from the patches.

I made detailed memory consumption measurements with this patch for
number of partitions changed from 0 (unpartitioned) to 1000 and for 2
to 5-way joins. They are available in the spreadsheet at [1]https://docs.google.com/spreadsheets/d/1CEjRWZ02vuR8fSwhYaNugewtX8f0kIm5pLoRA95f3s8/edit?usp=sharing. The raw
measurement data is in the first sheet named "pwj_mem_measurements raw
numbers". The averages over multiple runs are in second sheet named
"avg_numbers". Rest of the sheet represent the averages in more
consumable manner. Please note that the averages make sense only for
planning time since the memory consumption remains same with every
run. Also note that EXPLAIN now reports planning memory consumption in
kB. Any changes to memory consumption below 1kB are not reported and
hence not noticed. Here are some observations.

1. When partitionwise join is not enabled, no changes to planner's
memory consumption are observed. See sheet named "pwj disabled,
planning memory".

2. When partitionwise join is enabled, upto 17% (206MB) memory is
saved by the patch in case of 5-way self-join with 1000 partitions.
This is the maximum memory saving observed. The amount of memory saved
increases with the number of joins and number of partitions. See sheet
with name "pwj enabled, planning memory"

3. After commit 767c598954bbf72e0535f667e2e0667765604b2a, we do not
translate a parent RestrictInfo multiple times for the same
parent-child pair in case of a *2-way partitionwise join*. But we
still consume memory in saving the child RestrictInfo in the hash
table. Hence in case of 2-way join we see increased memory consumption
with the patch compared to master. The memory consumption increases by
13kb, 23kB, 76kB and 146kB for 10, 100, 500, 1000 partitions
respectively. This increase is smaller compared to the overall memory
saving. In order to avoid this memory increase, we will need to avoid
using hash table for 2-way join. We will need to know whether there
will be more than one partitionwise join before translating the
RestrictInfos for the first partitionwise join. This is hard to
achieve in all the cases since the decision to use partitionwise join
happens at the time of creating paths for a given join relation, which
itself is computed on the fly. We may choose some heuristics which
take into account the number of partitioned tables in the query, their
partition schemes, and the quals in the query to decide whether or not
to track the translated child RestrictInfos. But that will make the
code more complex, but more importantly the heuristics may not be able
to keep up if we start using partitionwise join as an optimization
strategy for more cases (e.g. asymmetric partitionwise join [2]/messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com). The
attached patch looks like a good tradeoff to me. But other opinions
might vary. Suggestions are welcome.

4. There is no noticeable change in the planning time. I ran the same
experiment multiple times. The planning time variations from each
experiment do not show any noticeable pattern suggesting increase or
decrease in the planning time with the patch.

A note about the code: I have added all the structures and functions
dealing with the RestrictInfo hash table at the end of restrictinfo.c.
I have not come across a C file in PostgreSQL code base where private
structures are defined in the middle the file; usually they are
defined at the beginning of the file. But I have chosen it that way
here since it makes it easy to document the hash table and the
functions at one place at the beginning of this code section. I am
open to suggestions which make the documentation easy while placing
the structures at the beginning of the file.

[1]: https://docs.google.com/spreadsheets/d/1CEjRWZ02vuR8fSwhYaNugewtX8f0kIm5pLoRA95f3s8/edit?usp=sharing
[2]: /messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Avoid-translating-RestrictInfo-repeatedly-20240819.patchapplication/x-patch; name=0001-Avoid-translating-RestrictInfo-repeatedly-20240819.patchDownload+400-21
#19Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#18)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Mon, Aug 19, 2024 at 6:43 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Sorry for the delay in my response.

On Wed, May 29, 2024 at 9:34 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Documenting some comments from todays' patch review session

I forgot to mention back then that both of the suggestions below came
from Tom Lane.

1. Instead of a nested hash table, it might be better to use a flat hash table to save more memory.

Done. It indeed saves memory without impacting planning time.

2. new comm_rinfo member in RestrictInfo may have problems when copying RestrictInfo or translating it. Instead commuted versions may be tracked outside RestrictInfo

After commit 767c598954bbf72e0535f667e2e0667765604b2a,
repameterization of parent paths for child relation happen at the time
of creating the plan. This reduces the number of child paths produced
by reparameterization and also reduces the number of RestrictInfos
that get translated during reparameterization. During
reparameterization commuted parent RestrictInfos are required to be
translated to child RestrictInfos. Before the commit, this led to
translating the same commuted parent RestrictInfo multiple times.
After the commit, only one path gets reparameterized for a given
parent and child pair. Hence we do not produce multiple copies of the
same commuted child RestrictInfo. Hence we don't need to keep track of
commuted child RestrictInfos anymore. Removed that portion of code
from the patches.

I made detailed memory consumption measurements with this patch for
number of partitions changed from 0 (unpartitioned) to 1000 and for 2
to 5-way joins. They are available in the spreadsheet at [1]. The raw
measurement data is in the first sheet named "pwj_mem_measurements raw
numbers". The averages over multiple runs are in second sheet named
"avg_numbers". Rest of the sheet represent the averages in more
consumable manner. Please note that the averages make sense only for
planning time since the memory consumption remains same with every
run. Also note that EXPLAIN now reports planning memory consumption in
kB. Any changes to memory consumption below 1kB are not reported and
hence not noticed. Here are some observations.

1. When partitionwise join is not enabled, no changes to planner's
memory consumption are observed. See sheet named "pwj disabled,
planning memory".

2. When partitionwise join is enabled, upto 17% (206MB) memory is
saved by the patch in case of 5-way self-join with 1000 partitions.
This is the maximum memory saving observed. The amount of memory saved
increases with the number of joins and number of partitions. See sheet
with name "pwj enabled, planning memory"

3. After commit 767c598954bbf72e0535f667e2e0667765604b2a, we do not
translate a parent RestrictInfo multiple times for the same
parent-child pair in case of a *2-way partitionwise join*. But we
still consume memory in saving the child RestrictInfo in the hash
table. Hence in case of 2-way join we see increased memory consumption
with the patch compared to master. The memory consumption increases by
13kb, 23kB, 76kB and 146kB for 10, 100, 500, 1000 partitions
respectively. This increase is smaller compared to the overall memory
saving. In order to avoid this memory increase, we will need to avoid
using hash table for 2-way join. We will need to know whether there
will be more than one partitionwise join before translating the
RestrictInfos for the first partitionwise join. This is hard to
achieve in all the cases since the decision to use partitionwise join
happens at the time of creating paths for a given join relation, which
itself is computed on the fly. We may choose some heuristics which
take into account the number of partitioned tables in the query, their
partition schemes, and the quals in the query to decide whether or not
to track the translated child RestrictInfos. But that will make the
code more complex, but more importantly the heuristics may not be able
to keep up if we start using partitionwise join as an optimization
strategy for more cases (e.g. asymmetric partitionwise join [2]). The
attached patch looks like a good tradeoff to me. But other opinions
might vary. Suggestions are welcome.

4. There is no noticeable change in the planning time. I ran the same
experiment multiple times. The planning time variations from each
experiment do not show any noticeable pattern suggesting increase or
decrease in the planning time with the patch.

A note about the code: I have added all the structures and functions
dealing with the RestrictInfo hash table at the end of restrictinfo.c.
I have not come across a C file in PostgreSQL code base where private
structures are defined in the middle the file; usually they are
defined at the beginning of the file. But I have chosen it that way
here since it makes it easy to document the hash table and the
functions at one place at the beginning of this code section. I am
open to suggestions which make the documentation easy while placing
the structures at the beginning of the file.

[1] https://docs.google.com/spreadsheets/d/1CEjRWZ02vuR8fSwhYaNugewtX8f0kIm5pLoRA95f3s8/edit?usp=sharing
[2] /messages/by-id/CAOP8fzaVL_2SCJayLL9kj5pCA46PJOXXjuei6-3aFUV45j4LJQ@mail.gmail.com

Noticed that Tomas's email address has changed. Adding his new email address.

--
Best Wishes,
Ashutosh Bapat

#20Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#19)
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

Hi hackers,

Here's next set of patch with following changes
1. Addressed some of Alvaro's comments which he gave offlist.
2. One of Alvaro's comments made me realise that there is opportunity
to save planning time. Patch 0003 added for the same.
3. Alvaro suggested to try simplehash.h instead of dynahash for
RestrictInfo hash table. That's patch 0005

Patches
=======
Commit messages in each of the patches describe the code changes in detail. Here
is brief description of each patch to facilitate the results discussion.

0001
----
Implements a hash table to store and retrieve child RestrictInfos. A
pointer to hash table is added in PlannerInfo. When examining the
results, I found a possible regression in planning time for lower
number of partitions and lower join order. To assess whether the
regression is related to an increase in the size of PlannerInfo, I
separated the interface and PlannerInfo addition it their own patch.
Having this as a separate patch might help us in case we decide to go
ahead with either of 0003 or 0004.

0002
----
The new member added by 0001 increases the size of PlannerInfo from
696 bytes to 704 bytes. We had seen a performance regression because
of increase in size of a structure in another thread [2]/messages/by-id/CAApHDvqPgFtwme2Zyf75BpMLwYr2mnUstDyPiP=EpudYuQTPPQ@mail.gmail.com. This patch
moves around the new member PlannerInfo::child_rinfo_hash and a
related existing member PlannerInfo::last_rinfo_serial so as to keep
the size of PlannerInfo same. If this change doesn't fix the possible
regression, it may be discarded. Otherwise it should be merged into
0001.

0003
----
Alvaro spotted an assymetry in the way my earlier patch handled child
RestrictInfo in create_join_clause. This led me to realize that
there's an optimization opportunity in that function.
create_join_clause() scans the ec_derives and ec_sources lists to find
any existing clause built using the given EC members. When there are
thousands of partitions ec_derives will have thousands of elements,
and will impact planning time adversely. This commit uses the
RestrictInfo hash introduced in patch 0001 to store the child derived
clauses, thus reducing the planning time when thousands of partitions
are involved. Detailed code level analysis of this change can be found
in the commit message of this patch. We will discuss the performance
impact a bit later in this email.

0004
----
This patch uses RestrictInfo hash table to store the translated child
RestrictInfos to avoid memory bloat due to repeated translations of
parent RestrictInfos for the same parent-child pair saving memory.
This is almost the same patch as my last email in this thread.

0005
----
Changes the code to use simplehash instead of dynahash.

Results
=======
I ran the same experiments as previously described [3]/messages/by-id/CAExHW5uBhV1wWrXm-V+aGPq_PBv-RbmixU=HeUj-+hSmVCFAQw@mail.gmail.com. I applied
patches 0001 to 0005 one by one cumulatively and collected planning
time and memory for queries which involved self-join of order 2 to 5,
with table having partitions 0 (unpartitioned), 10, 100, 500 and 1000
respectively.

The spreadsheet file with results can be found at [1]https://docs.google.com/spreadsheets/d/1uLtlkwdFYKLSAUn8-cmavxS_vY9gndNVHtVSPTaB-dw/edit?usp=sharing. The first sheet
"README" describes how to read this spreadsheet. It has sheets
containing the raw planning time and planning memory measurements,
their averages and standard deviation. But there are also sheets which
make it easy to compare the effects of the patches (explained below)
on planning time and memory. Those sheets are more useful than the raw
and average numbers.

1. With patches upto 0003, planning time improves by 10%-20%for higher
number of partitions and higher join orders. (rows 24 to 35 and
columns S, Z in planning time sheets). This improvement can be seen
with or without partitionwise join. The improvement increases with the
number of partitions and join order as expected. I have repeated the
experiments a few times and I could reproduce the improvement all the
time.

2. For lower number of partitions and lower join orders, the planning
time shows a regression. In case of unpartitioned tables, the planning
time shows improvement. If we repeat the experiments, some
combinations of number of partitions, join order show improvements and
some show regression. The only steady pattern I see is that with 100
partitions, we see regression most of the time. However, I am not sure
of the reason for this regression. Patch 0001 is playing some role
here.

3. With just patch 0001 applied, planning time usually shows
degradation (column Q and X in planning time sheets) with or without
PWJ enabled. I first thought that it might be because of the increased
size of PlannerInfo. We had seen a similar phenomenon when adding a
new member to WindowAggState [2]/messages/by-id/CAApHDvqPgFtwme2Zyf75BpMLwYr2mnUstDyPiP=EpudYuQTPPQ@mail.gmail.com. Hence I introduced patch 0002 which
moves two fields around to not increase the size of structure. But
that doesn't fix the regression in the planning time (columns R and
Y). Apart from increasing the PlannerInfo size and may be object file
size, 0002 does not have any other impact. But the regression seen
with just that patch is more than what we saw in [2]/messages/by-id/CAApHDvqPgFtwme2Zyf75BpMLwYr2mnUstDyPiP=EpudYuQTPPQ@mail.gmail.com. More investigate
is required to decide whether this regression is real or not and if
real, the root cause. Looking at the numbers, it seems that this
regression is causing the planning time regression in rest of the
patches. If we fix regression by 0001, we should not see much
regression in rest of the patches. I am looking for some guidance in
investigating this regression.

4. Patches upto 0003, increase the memory consumed by the planner
because of the hash table, but that increase in memory is minimal when
compared with the total memory used by the planner in case of a large
number of partitions.

5. With 0004, the memory used by the planner reduces drastically in
case of large number of partitions and higher join orders. These
numbers are similar to my previous observations [3]/messages/by-id/CAExHW5uBhV1wWrXm-V+aGPq_PBv-RbmixU=HeUj-+hSmVCFAQw@mail.gmail.com (Columns T and AA
in "planning memory with PWJ enabled" sheet)

6. Using simplehash (patch 0005) does not save any memory or doesn't
improve planning time. I think I have used the simplehash correct, but
someone with more experience with simplehash might find something
wrong with 0005. But otherwise, I will drop 0005 from the next set of
patches.

I believe both 0003 and 0004 are useful if we could fix the regression
in planning time for lower number of partitions and for lower join
orders.

[1]: https://docs.google.com/spreadsheets/d/1uLtlkwdFYKLSAUn8-cmavxS_vY9gndNVHtVSPTaB-dw/edit?usp=sharing
[2]: /messages/by-id/CAApHDvqPgFtwme2Zyf75BpMLwYr2mnUstDyPiP=EpudYuQTPPQ@mail.gmail.com
[3]: /messages/by-id/CAExHW5uBhV1wWrXm-V+aGPq_PBv-RbmixU=HeUj-+hSmVCFAQw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachments:

0004-Avoid-translating-RestrictInfo-repeatedly-20241010.patchtext/x-patch; charset=US-ASCII; name=0004-Avoid-translating-RestrictInfo-repeatedly-20241010.patchDownload+242-18
0001-RestrictInfo-hash-table-interface-20241010.patchtext/x-patch; charset=US-ASCII; name=0001-RestrictInfo-hash-table-interface-20241010.patchDownload+150-2
0002-Compact-PlannerInfo-to-restore-its-previous-20241010.patchtext/x-patch; charset=US-ASCII; name=0002-Compact-PlannerInfo-to-restore-its-previous-20241010.patchDownload+6-7
0003-Use-RestrictInfo-hash-table-for-storing-EC--20241010.patchtext/x-patch; charset=US-ASCII; name=0003-Use-RestrictInfo-hash-table-for-storing-EC--20241010.patchDownload+115-26
0005-Use-simplehash-instead-of-dynamic-hash-20241010.patchtext/x-patch; charset=US-ASCII; name=0005-Use-simplehash-instead-of-dynamic-hash-20241010.patchDownload+26-34
#21Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Ashutosh Bapat (#20)
#22Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Dmitry Dolgov (#21)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#23)
#25Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#24)
#26Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#23)
#27Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#26)
#28Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Ashutosh Bapat (#22)
#29Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#27)
#30Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#29)
#31Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#30)
#32Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#31)
#33Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#32)
#34Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#33)
#35Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#34)
#36Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#35)
#37Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#36)
#38Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#37)
#39Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#38)
#40Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#39)
#41Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#40)
#42Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#41)
#43Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#42)
#44David Rowley
dgrowleyml@gmail.com
In reply to: Ashutosh Bapat (#43)
#45Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: David Rowley (#44)
#46Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#45)
#47Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#44)
#48Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alvaro Herrera (#47)
#49David Rowley
dgrowleyml@gmail.com
In reply to: Ashutosh Bapat (#46)
#50Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: David Rowley (#49)
#51Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#50)
#52Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#51)
#53Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#52)
#54Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#53)
#55Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#54)
#56Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#55)
#57Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#56)
#58Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ashutosh Bapat (#57)