[PoC] Reducing planning time when tables have many partitions

dgrowleyml@gmail.com

over 4 years ago

In reply to: Yuya Watari (#1)

Re: [PoC] Reducing planning time when tables have many partitions

On Fri, 18 Mar 2022 at 23:32, Yuya Watari <watari.yuya@gmail.com> wrote:

I found a problem that planning takes too much time when the tables
have many child partitions. According to my observation, the planning
time increases in the order of O(n^2). Here, n is the number of child
partitions. I attached the patch to solve this problem. Please be
noted that this patch is a PoC.

3. How to Solve?

I think a better way to solve this would be just to have a single hash
table over all EquivalenceClasses that allows fast lookups of
EquivalenceMember->em_expr. I think there's no reason that a given
Expr should appear in more than one non-merged EquivalenceClass. The
EquivalenceClass a given Expr belongs to would need to be updated
during the merge process.

For functions such as get_eclass_for_sort_expr() and
process_equivalence(), that would become a fairly fast hashtable
lookup instead of having nested loops to find if an EquivalenceMember
already exists for the given Expr. We might not want to build the hash
table for all queries. Maybe we could just do it if we get to
something like ~16 EquivalenceMember in total.

As of now, we don't have any means to hash Exprs, so all that
infrastructure would need to be built first. Peter Eisentraut is
working on a patch [1]https://commitfest.postgresql.org/37/3182/ which is a step towards having this.

Here's a simple setup to show the pain of this problem:

create table lp (a int, b int) partition by list(a);
select 'create table lp'||x::text|| ' partition of lp for values
in('||x::text||');' from generate_Series(0,4095)x;
\gexec
explain analyze select * from lp where a=b order by a;

Planning Time: 510.248 ms
Execution Time: 264.659 ms

David

[1]: https://commitfest.postgresql.org/37/3182/

watari.yuya@gmail.com

about 4 years ago

In reply to: David Rowley (#2)

Re: [PoC] Reducing planning time when tables have many partitions

Dear David,

Thank you for your comments on my patch. I really apologize for my
late response.

On Thu, Mar 24, 2022 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:

I think a better way to solve this would be just to have a single hash
table over all EquivalenceClasses that allows fast lookups of
EquivalenceMember->em_expr. I think there's no reason that a given
Expr should appear in more than one non-merged EquivalenceClass. The
EquivalenceClass a given Expr belongs to would need to be updated
during the merge process.

Thank you for your idea. However, I think building a hash table whose
key is EquivalenceMember->em_expr does not work for this case.

What I am trying to optimize in this patch is the following code.

=====
EquivalenceClass *ec = /* given */;

EquivalenceMember *em;
ListCell *lc;
foreach(lc, ec->ec_members)
{
em = (EquivalenceMember *) lfirst(lc);

/* predicate is bms_equal or bms_is_subset, etc */
if (!predicate(em))
continue;

/* The predicate satisfies */
do something...;
}
=====

From my observation, the predicates above will be false in most cases
and the subsequent processes are not executed. My optimization is
based on this notion and utilizes hash tables to eliminate calls of
predicates.

If the predicate were "em->em_expr == something", the hash table whose
key is em_expr would be effective. However, the actual predicates are
not of this type but the following.

// Find EquivalenceMembers whose relids is equal to the given relids
(1) bms_equal(em->em_relids, relids)

// Find EquivalenceMembers whose relids is a subset of the given relids
(2) bms_is_subset(em->em_relids, relids)

Since these predicates perform a match search for not em_expr but
em_relids, we need to build a hash table with em_relids as key. If so,
we can drastically reduce the planning time for the pattern (1).
Besides, by enumerating all subsets of relids, pattern (2) can be
optimized. The detailed algorithm is described in the first email.

I show an example of the pattern (1). The next code is in
src/backend/optimizer/path/equivclass.c. As can be seen from this
code, the foreach loop tries to find an EquivalenceMember whose
cur_em->em_relids is equal to rel->relids. If found, subsequent
processing will be performed.

== Before patched ==
List *
generate_implied_equalities_for_column(PlannerInfo *root,
RelOptInfo *rel,
ec_matches_callback_type callback,
void *callback_arg,
Relids prohibited_rels)
{
...

EquivalenceClass *cur_ec = (EquivalenceClass *)
list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
ListCell *lc2;

cur_em = NULL;
foreach(lc2, cur_ec->ec_members)
{
cur_em = (EquivalenceMember *) lfirst(lc2);
if (bms_equal(cur_em->em_relids, rel->relids) &&
callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}

if (!cur_em)
continue;

...
}
===

My patch modifies this code as follows. The em_foreach_relids_equals
is a newly defined macro that finds EquivalenceMember satisfying the
bms_equal. The macro looks up a hash table using rel->relids as a key.
This type of optimization cannot be achieved without using hash tables
whose key is em->em_relids.

== After patched ==
List *
generate_implied_equalities_for_column(PlannerInfo *root,
RelOptInfo *rel,
ec_matches_callback_type callback,
void *callback_arg,
Relids prohibited_rels)
{
...

EquivalenceClass *cur_ec = (EquivalenceClass *)
list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
EquivalenceMember *other_em;

cur_em = NULL;
em_foreach_relids_equals(cur_em, cur_ec, rel->relids)
{
Assert(bms_equal(cur_em->em_relids, rel->relids));
if (callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}

if (!cur_em)
continue;

...
}
===

We might not want to build the hash table for all queries.

I agree with you. Building a lot of hash tables will consume much
memory. My idea for this problem is to let the hash table's key be a
pair of EquivalenceClass and Relids. However, this approach may lead
to increasing looking up time of the hash table.

==========

I noticed that the previous patch does not work with the current HEAD.
I attached the modified one to this email.

Additionally, I added my patch to the current commit fest [1]https://commitfest.postgresql.org/38/3701/.
[1]: https://commitfest.postgresql.org/38/3701/

--
Best regards,
Yuya Watari

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Yuya Watari (#3)

Re: [PoC] Reducing planning time when tables have many partitions

Yuya Watari <watari.yuya@gmail.com> writes:

On Thu, Mar 24, 2022 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:

I think a better way to solve this would be just to have a single hash
table over all EquivalenceClasses that allows fast lookups of
EquivalenceMember->em_expr.

If the predicate were "em->em_expr == something", the hash table whose
key is em_expr would be effective. However, the actual predicates are
not of this type but the following.

// Find EquivalenceMembers whose relids is equal to the given relids
(1) bms_equal(em->em_relids, relids)

// Find EquivalenceMembers whose relids is a subset of the given relids
(2) bms_is_subset(em->em_relids, relids)

Yeah, that's a really interesting observation, and I agree that
David's suggestion doesn't address it. Maybe after we fix this
problem, matching of em_expr would be the next thing to look at,
but your results say it isn't the first thing.

I'm not real thrilled with trying to throw hashtables at the problem,
though. As David noted, they'd be counterproductive for simple
queries. Sure, we could address that with duplicate code paths,
but that's a messy and hard-to-tune approach. Also, I find the
idea of hashing on all subsets of relids to be outright scary.
"m is not so large in most cases" does not help when m *is* large.

For the bms_equal class of lookups, I wonder if we could get anywhere
by adding an additional List field to every RelOptInfo that chains
all EquivalenceMembers that match that RelOptInfo's relids.
The trick here would be to figure out when to build those lists.
The simple answer would be to do it lazily on-demand, but that
would mean a separate scan of all the EquivalenceMembers for each
RelOptInfo; I wonder if there's a way to do better?

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.

regards, tom lane

watari.yuya@gmail.com

almost 4 years ago

In reply to: Tom Lane (#4)

Re: [PoC] Reducing planning time when tables have many partitions

Dear Tom,

Thank you for replying to my email.

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not real thrilled with trying to throw hashtables at the problem,
though. As David noted, they'd be counterproductive for simple
queries.

As you said, my approach that utilizes hash tables has some overheads,
leading to degradation in query planning time.

I tested the degradation by a brief experiment. In this experiment, I
used a simple query shown below.

===
SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score
FROM students, scores, gpas
WHERE students.id = scores.student_id AND students.id = gpas.student_id
GROUP BY students.id, gpas.student_id;
===

Here, students, scores, and gpas tables have no partitions, i.e., they
are regular tables. Therefore, my techniques do not work for this
query and instead may lead to some regression. I repeatedly issued
this query 1 million times and measured their planning times.

The attached figure describes the distribution of the planning times.
The figure indicates that my patch has no severe negative impacts on
the planning performance. However, there seems to be a slight
degradation.

I show the mean and median of planning times below. With my patch, the
planning time became 0.002-0.004 milliseconds slower. We have to deal
with this problem, but reducing time complexity while keeping
degradation to zero is significantly challenging.

Planning time (ms)
| Mean | Median
------------------------------
Master | 0.682 | 0.674
Patched | 0.686 | 0.676
------------------------------
Degradation | 0.004 | 0.002

Of course, the attached result is just an example. Significant
regression might occur in other types of queries.

For the bms_equal class of lookups, I wonder if we could get anywhere
by adding an additional List field to every RelOptInfo that chains
all EquivalenceMembers that match that RelOptInfo's relids.
The trick here would be to figure out when to build those lists.
The simple answer would be to do it lazily on-demand, but that
would mean a separate scan of all the EquivalenceMembers for each
RelOptInfo; I wonder if there's a way to do better?

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.

Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.

--
Best regards,
Yuya Watari

lepihov@gmail.com

almost 4 years ago

In reply to: Yuya Watari (#5)

Re: [PoC] Reducing planning time when tables have many partitions

On 7/5/22 13:57, Yuya Watari wrote:

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.

Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.

This work has significant interest for highly partitioned
configurations. Are you still working on this patch? According to the
current state of the thread, changing the status to 'Waiting on author'
may be better until the next version.
Feel free to reverse the status if you need more feedback.

--
Regards
Andrey Lepikhov
Postgres Professional

watari.yuya@gmail.com

almost 4 years ago

In reply to: Andrei Lepikhov (#6)

Re: [PoC] Reducing planning time when tables have many partitions

Dear Andrey Lepikhov,

Thank you for replying and being a reviewer for this patch. I really
appreciate it.

Are you still working on this patch?

Yes, I’m working on improving this patch. It is not easy to address
the problems that this patch has, but I’m hoping to send a new version
of it in a few weeks.

--
Best regards,
Yuya Watari

dgrowleyml@gmail.com

almost 4 years ago

In reply to: Tom Lane (#4)

Re: [PoC] Reducing planning time when tables have many partitions

On Mon, 4 Jul 2022 at 09:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

For the bms_equal class of lookups, I wonder if we could get anywhere
by adding an additional List field to every RelOptInfo that chains
all EquivalenceMembers that match that RelOptInfo's relids.
The trick here would be to figure out when to build those lists.
The simple answer would be to do it lazily on-demand, but that
would mean a separate scan of all the EquivalenceMembers for each
RelOptInfo; I wonder if there's a way to do better?

How about, instead of EquivalenceClass having a List field named
ec_members, it has a Bitmapset field named ec_member_indexes and we
just keep a List of all EquivalenceMembers in PlannerInfo and mark
which ones are in the class by setting the bit in the class's
ec_member_indexes field.

That would be teamed up with a new eclass_member_indexes field in
RelOptInfo to store the index into PlannerInfo's List of
EquivalenceMembers that belong to the given RelOptInfo.

For searching:
If you want to get all EquivalenceMembers in an EquivalenceClass, you
bms_next_member loop over the EC's ec_member_indexes field.
If you want to get all EquivalenceMembers for a given RelOptInfo, you
bms_next_member loop over the RelOptInfo's eclass_member_indexes
field.
If you want to get all EquivalenceMembers for a given EquivalenceClass
and RelOptInfo you need to do some bms_intersect() calls for the rel's
eclass_member_indexes and EC's ec_member_indexes.

I'm unsure if we'd want to bms_union the RelOptInfo's
ec_member_indexes field for join rels. Looking at
get_eclass_indexes_for_relids() we didn't do it that way for
eclass_indexes. Maybe that's because we're receiving RelIds in a few
places without a RelOptInfo.

Certainly, the CPU cache locality is not going to be as good as if we
had a List with all elements together, but for simple queries, there's
not going to be many EquivalenceClasses anyway, and for complex
queries, this should be a win.

David

watari.yuya@gmail.com

almost 4 years ago

In reply to: David Rowley (#8)

Re: [PoC] Reducing planning time when tables have many partitions

Dear David,

Thank you for sharing your new idea.

I agree that introducing a Bitmapset field may solve this problem. I
will try this approach in addition to previous ones.

Thank you again for helping me.

--
Best regards,
Yuya Watari

#10

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3373c715535

dgrowleyml@gmail.com

almost 4 years ago

In reply to: Yuya Watari (#9)

Re: [PoC] Reducing planning time when tables have many partitions

On Wed, 27 Jul 2022 at 18:07, Yuya Watari <watari.yuya@gmail.com> wrote:

I agree that introducing a Bitmapset field may solve this problem. I
will try this approach in addition to previous ones.

I've attached a very half-done patch that might help you get started
on this. There are still 2 failing regression tests which seem to be
due to plan changes. I didn't expend any effort looking into why these
plans changed.

The attached does not contain any actual optimizations to find the
minimal set of EMs to loop through by masking the Bitmapsets that I
mentioned in my post last night. I just quickly put it together to
see if there's some hole in the idea. I don't think there is.

I've not really considered all of the places that we'll want to do the
bit twiddling to get the minimal set of EquivalenceMember. I did see
there's a couple more functions in postgres_fdw.c that could be
optimized.

One thing I've only partially thought about is what if you want to
also find EquivalenceMembers with a constant value. If there's a
Const, then you'll lose the bit for that when you mask the ec's
ec_member_indexes with the RelOptInfos. If there are some places
where we need to keep those then I think we'll need to add another
field to EquivalenceClass to mark the index into PlannerInfo's
eq_members for the EquivalenceMember with the Const. That bit would
have to be bms_add_member()ed back into the Bitmapset of matching
EquivalenceMembers after masking out RelOptInfo's ec_member_indexes.

When adding the optimizations to find the minimal set of EM bits to
search through, you should likely add some functions similar to the
get_eclass_indexes_for_relids() and get_common_eclass_indexes()
functions to help you find the minimal set of bits. You can also
probably get some other inspiration from [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3373c715535, in general.

David

#11

watari.yuya@gmail.com

almost 4 years ago

In reply to: David Rowley (#10)

Re: [PoC] Reducing planning time when tables have many partitions

Hello,

On Thu, Jul 28, 2022 at 6:35 AM David Rowley <dgrowleyml@gmail.com> wrote:

I've attached a very half-done patch that might help you get started
on this.

Thank you so much for creating the patch. I have implemented your
approach and attached a new version of the patch to this email.

If you have already applied David's patch, please start the 'git am'
command from 0002-Fix-bugs.patch. All regression tests passed with
this patch on my environment.

1. Optimizations

The new optimization techniques utilizing Bitmapsets are implemented
as the following functions in src/include/optimizer/paths.h.

* get_eclass_members_indexes_for_relids()
* get_eclass_members_indexes_for_not_children()
* get_eclass_members_indexes_for_relids_or_not_children()
* get_eclass_members_indexes_for_subsets_of_relids()
* get_eclass_members_indexes_for_subsets_of_relids_or_not_children()
// I think the names of these functions need to be reconsidered.

These functions intersect ec->ec_member_indexes and some Bitmapset and
return indexes of EquivalenceMembers that we want to get.

The implementation of the first three functions listed above is
simple. However, the rest functions regarding the bms_is_subset()
condition are a bit more complicated. I have optimized this case based
on Tom's idea. The detailed steps are as follows.

I. Intersect ec->ec_member_indexes and the Bitmapset in RelOptInfo.
This intersection set is a candidate for the EquivalenceMembers to be
retrieved.
II. Remove from the candidate set the members that do not satisfy the
bms_is_subset().

Optimization for EquivalenceMembers with a constant value is one of
the future works.

2. Experimental Results

I conducted an experiment by using the original query, which is
attached to this email. You can reproduce this experiment by the
following commands.

=====
psql -f create-tables.sql
psql -f query.sql
=====

The following table and the attached figure describe the experimental result.

Planning time of "query.sql" (n = the number of partitions)
----------------------------------------------------------------
n | Master (ms) | Patched (ms) | Speedup (%) | Speedup (ms)
----------------------------------------------------------------
1 | 0.809 | 0.760 | 6.09% | 0.049
2 | 0.799 | 0.811 | -1.53% | -0.012
4 | 1.022 | 0.989 | 3.20% | 0.033
8 | 1.357 | 1.325 | 2.32% | 0.032
16 | 2.149 | 2.026 | 5.69% | 0.122
32 | 4.357 | 3.925 | 9.91% | 0.432
64 | 9.543 | 7.543 | 20.96% | 2.000
128 | 27.195 | 15.823 | 41.82% | 11.372
256 | 130.207 | 52.664 | 59.55% | 77.542
384 | 330.642 | 112.324 | 66.03% | 218.318
512 | 632.009 | 197.957 | 68.68% | 434.052
640 | 1057.193 | 306.861 | 70.97% | 750.333
768 | 1709.914 | 463.628 | 72.89% | 1246.287
896 | 2531.685 | 738.827 | 70.82% | 1792.858
1024 | 3516.592 | 858.211 | 75.60% | 2658.381
----------------------------------------------------------------

-------------------------------------------------------
n | Stddev of Master (ms) | Stddev of Patched (ms)
-------------------------------------------------------
1 | 0.085 | 0.091
2 | 0.061 | 0.091
4 | 0.153 | 0.118
8 | 0.203 | 0.107
16 | 0.150 | 0.153
32 | 0.313 | 0.242
64 | 0.411 | 0.531
128 | 1.263 | 1.109
256 | 5.592 | 4.714
384 | 17.423 | 6.625
512 | 20.172 | 7.188
640 | 40.964 | 26.246
768 | 61.924 | 31.741
896 | 66.481 | 27.819
1024 | 80.950 | 49.162
-------------------------------------------------------

The speed up with the new patch was up to 75.6% and 2.7 seconds. The
patch achieved a 21.0% improvement even with 64 partitions, which is a
realistic size. We can conclude that this optimization is very
effective in workloads with highly partitioned tables.

Performance degradation occurred only when the number of partitions
was 2, and its degree was 1.53% or 12 microseconds. This degradation
is the difference between the average planning times of 10000 runs.
Their standard deviations far exceed the difference in averages. It is
unclear whether this degradation is an error.

=====

I'm looking forward to your comments.

--
Best regards,
Yuya Watari

#12

dgrowleyml@gmail.com

almost 4 years ago

In reply to: Yuya Watari (#11)

Re: [PoC] Reducing planning time when tables have many partitions

On Mon, 8 Aug 2022 at 23:28, Yuya Watari <watari.yuya@gmail.com> wrote:

If you have already applied David's patch, please start the 'git am'
command from 0002-Fix-bugs.patch. All regression tests passed with
this patch on my environment.

Thanks for fixing those scope bugs.

In regards to the 0002 patch, you have;

+ * TODO: "bms_add_members(ec1->ec_member_indexes, ec2->ec_member_indexes)"
+ * did not work to combine two EquivalenceClasses. This is probably because
+ * the order of the EquivalenceMembers is different from the previous
+ * implementation, which added the ec2's EquivalenceMembers to the end of
+ * the list.

as far as I can see, the reason the code I that wrote caused the
following regression test failure;

-         Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
+         Index Cond: (ff = '42'::bigint)

was down to how generate_base_implied_equalities_const() marks the EC
as ec_broken = true without any regard to cleaning up the work it's
partially already complete.

Because the loop inside generate_base_implied_equalities_const() just
breaks as soon as we're unable to find a valid equality operator for
the two given types, with my version, since the EquivalenceMember's
order has effectively changed, we just discover the EC is broken
before we call process_implied_equality() ->
distribute_restrictinfo_to_rels(). In the code you've added, the
EquivalenceMembers are effectively still in the original order and the
process_implied_equality() -> distribute_restrictinfo_to_rels() gets
done before we discover the broken EC. The same qual is just added
again during generate_base_implied_equalities_broken(), which is why
the plan has a duplicate ff=42.

This is all just down to the order that the ECs are merged. If you'd
just swapped the order of the items in the query's WHERE clause to
become:

where ec1.ff = 42::int8 and ss1.x = ec1.f1 and ec1.ff = ec1.f1;

then my version would keep the duplicate qual. For what you've changed
the code to, the planner would not have produced the duplicate ff=42
qual if you'd written the WHERE clause as follows:

where ss1.x = ec1.f1 and ec1.ff = ec1.f1 and ec1.ff = 42::int8;

In short, I think the code I had for that was fine and it's just the
expected plan that you should be editing. If we wanted to this
behaviour to be consistent then the fix should be to make
generate_base_implied_equalities_const() better at only distributing
the quals down to the relations after it has discovered that the EC is
not broken, or at least cleaning up the partial work that it's done if
it discovers a broken EC. The former seems better to me, but I doubt
that it matters too much as broken ECs should be pretty rare and it
does not seem worth spending too much effort making this work better.

I've not had a chance to look at the 0003 patch yet.

David

#13

dgrowleyml@gmail.com

almost 4 years ago

In reply to: David Rowley (#12)

Re: [PoC] Reducing planning time when tables have many partitions

esOn Tue, 9 Aug 2022 at 19:10, David Rowley <dgrowleyml@gmail.com> wrote:

I've not had a chance to look at the 0003 patch yet.

I've looked at the 0003 patch now.

The performance numbers look quite impressive, however, there were a
few things about the patch that I struggled to figure what they were
done the way you did them:

+ root->eq_not_children_indexes = bms_add_member(root->eq_not_children_indexes,

Why is that in PlannerInfo rather than in the EquivalenceClass?

if (bms_equal(rel->relids, em->em_relids))
{
rel->eclass_member_indexes =
bms_add_member(rel->eclass_member_indexes, em_index);
}

Why are you only adding the eclass_member_index to the RelOptInfo when
the em_relids contain a singleton relation?

I ended up going and fixing the patch to be more how I imagined it.

I've ended up with 3 Bitmapset fields in EquivalenceClass;
ec_member_indexes, ec_nonchild_indexes, ec_norel_indexes. I also
trimmed the number of helper functions down for obtaining the minimal
set of matching EquivalenceMember indexes to just:

Bitmapset *
get_ecmember_indexes(PlannerInfo *root, EquivalenceClass *ec, Relids relids,
bool with_children, bool with_norel_members)

Bitmapset *
get_ecmember_indexes_strict(PlannerInfo *root, EquivalenceClass *ec,
Relids relids, bool with_children,
bool with_norel_members)

I'm not so much a fan of the bool parameters, but it seemed better
than having 8 different functions with each combination of the bool
paramters instead of 2.

The "strict" version of the function takes the intersection of
eclass_member_indexes for each rel mentioned in relids, whereas the
non-strict version does a union of those. Each then intersect that
with all members in the 'ec', or just the non-child members when
'with_children' is false. They both then optionally bms_add_members()
the ec_norel_members if with_norel_members is true. I found it
difficult to figure out the best order to do the intersection. That
really depends on if the particular query has many EquivalenceClasses
with few EquivalenceMembers or few EquivalenceClasses with many
EquivalenceMembers. bms_int_members() always recycles the left input.
Ideally, that would always be the smallest Bitmapset. Maybe it's worth
inventing a new version of bms_int_members() that recycles the input
with the least nwords. That would give the subsequent
bms_next_member() calls an easier time. Right now they'll need to loop
over a bunch of 0 words at the end for many queries.

A few problems I ran into along the way:

1. generate_append_tlist() generates Vars with varno=0. That causes
problems when we add Exprs from those in add_eq_member() as there is
no element at root->simple_rel_array[0] to add eclass_member_indexes
to.
2. The existing comment for EquivalenceMember.em_relids claims "all
relids appearing in em_expr", but that's just not true when it comes
to em_is_child members.

So far, I fixed #1 by adding a hack to setup_simple_rel_arrays() to do
"root->simple_rel_array[0] = makeNode(RelOptInfo);" I'm not suggesting
that's the correct fix. It might be possible to set the varnos to the
varnos from the first Append child instead.

The fact that #2 is not true adds quite a bit of complexity to the
patch and I think the patch might even misbehave as a result. It seems
there are cases where a child em_relids can contain additional relids
that are not present in the em_expr. For example, when a UNION ALL
child has a Const in the targetlist, as explained in a comment in
add_child_rel_equivalences(). However, there also seem to be cases
where the opposite is true. I had to add the following code in
add_eq_member() to stop a regression test failing:

if (is_child)
expr_relids = bms_add_members(expr_relids, relids);

That's to make sure we add eclass_member_indexes to each RelOptInfo
mentioned in the em_expr.

After doing all that, I noticed that your benchmark was showing that
create_join_clause() was the new bottleneck. This was due to having to
loop so many times over the ec_sources to find an already built
RestrictInfo. I went off and added some new code to optimize the
lookup of those in a similar way by adding a new Bitmapset field in
RelOptInfo to index which ec_sources it mentioned, which meant having
to move ec_sources into PlannerInfo. I don't think this part of the
patch is quite right yet as the code I have relies on em_relids being
the same as the ones mentioned in the RestrictInfo. That seems not
true for em_is_child EMs, so I think we probably need to add a new
field to EquivalenceMember that truly is just pull_varnos from
em_expr, or else look into some way to make em_relids mean that (like
the comment claims).

Here are my results from running your benchmark on master (@f6c750d31)
with and without the attached patch.

npart master (ms) patched (ms) speedup
2 0.28 0.29 95.92%
4 0.37 0.38 96.75%
8 0.53 0.56 94.43%
16 0.92 0.91 100.36%
32 1.82 1.70 107.57%
64 4.05 3.26 124.32%
128 10.83 6.69 161.89%
256 42.63 19.46 219.12%
512 194.31 42.60 456.14%
1024 1104.02 98.37 1122.33%

This resulted in some good additional gains in planner performance.
The 1024 partition case is now about 11x faster on my machine instead
of 4x. The 2 partition does regress slightly. There might be a few
things we can do about that, for example, move ec_collation up 1 to
shrink EquivalenceClass back down closer to the size it was before.
[1]: https://commitfest.postgresql.org/39/3810/

I've attached a draft patch with my revisions.

David

[1]: https://commitfest.postgresql.org/39/3810/

#14

watari.yuya@gmail.com

almost 4 years ago

In reply to: David Rowley (#13)

Re: [PoC] Reducing planning time when tables have many partitions

Dear David,

I really appreciate your reply and your modifying the patch. The
performance improvements are quite impressive. I believe these
improvements will help PostgreSQL users. Thank you again.

The 2 partition does regress slightly. There might be a few
things we can do about that

I tried to solve this regression problem. From here, I will refer to
the patch you sent on August 16th as the v3 patch. I will also call my
patch attached to this email the v4 patch. I will discuss the v4 patch
later.

Additionally, I give names to queries.
* Query A: The query we have been using in previous emails, which
joins students, scores, and gpas tables.
* Query B: The query which is attached to this email.

Query B is as follows:

===
SELECT *
FROM testtable_1, testtable_2, testtable_3, testtable_4, testtable_5,
testtable_6, testtable_7, testtable_8
WHERE testtable_1.x = testtable_2.x AND testtable_1.x = testtable_3.x
AND testtable_1.x = testtable_4.x AND testtable_1.x = testtable_5.x
AND testtable_1.x = testtable_6.x AND testtable_1.x = testtable_7.x
AND testtable_1.x = testtable_8.x;
===

Query A joins three tables, whereas Query B joins eight tables. Since
EquivalenceClass is used when handling chained join conditions, I
thought queries joining many tables, such as Query B, would have
greater performance impacts.

I have investigated the v3 patch with these queries. As a result, I
did not observe any regressions in Query A in my environment. However,
the v3 patch showed significant degradation in Query B.

The following table and Figures 1 and 2 describe the result. The v3
patch resulted in a regression of 8.7% for one partition and 4.8% for
two partitions. Figure 2 shows the distribution of planning times for
the 1-partition case, indicating that the 8.7% regression is not an
error.

Table 1: Planning time of Query B
(n: number of partitions)
(milliseconds)
----------------------------------------------------------------
n | Master | v3 | v4 | Master / v3 | Master / v4
----------------------------------------------------------------
1 | 54.926 | 60.178 | 55.275 | 91.3% | 99.4%
2 | 53.853 | 56.554 | 53.519 | 95.2% | 100.6%
4 | 57.115 | 57.829 | 55.648 | 98.8% | 102.6%
8 | 64.208 | 60.945 | 58.025 | 105.4% | 110.7%
16 | 79.818 | 65.526 | 63.365 | 121.8% | 126.0%
32 | 136.981 | 77.813 | 76.526 | 176.0% | 179.0%
64 | 371.991 | 108.058 | 110.202 | 344.2% | 337.6%
128 | 1449.063 | 173.326 | 181.302 | 836.0% | 799.3%
256 | 6245.577 | 333.480 | 354.961 | 1872.8% | 1759.5%
----------------------------------------------------------------

This performance degradation is due to the heavy processing of the
get_ec***_indexes***() functions. These functions are the core part of
the optimization we are working on in this thread, but they are
relatively heavy when the number of partitions is small.

I noticed that these functions were called repeatedly with the same
arguments. During planning Query B with one partition, the
get_ec_source_indexes_strict() function was called 2087 times with
exactly the same parameters. Such repeated calls occurred many times
in a single query.

To address this problem, I introduced a caching mechanism in the v4
patch. This patch caches the Bitmapset once it has been computed.
After that, we only have to read the cached value instead of
performing the same process. Of course, we cannot devote much time to
the caching itself. Hash tables are a simple solution to accomplish
this but are not available under the current case where microsecond
performance degradation is a problem. Therefore, my patch adopts
another approach. I will use the following function as an example to
explain it.

===
Bitmapset *get_ecmember_indexes(PlannerInfo *root,
EquivalenceClass *ec, Relids relids, bool with_children, bool
with_norel_members);
===

My idea is "caching the returned Bitmapset into Relids." If the Relids
has the result Bitmapset, we can access it quickly via the pointer. Of
course, I understand this description is not accurate. Relids is just
an alias of Bitmapset, so we cannot change the layout.

I will describe the precise mechanism. In the v4 patch, I changed the
signature of the get_ecmember_indexes() function as follows.

===
Bitmapset *get_ecmember_indexes(PlannerInfo *root,
EquivalenceClass *ec, Relids relids, bool with_children, bool
with_norel_members, ECIndexCache *cache);
===

ECIndexCache is storage for caching returned values. ECIndexCache has
a one-to-one relationship with Relids. This relationship is achieved
by placing the ECIndexCache just alongside the Relids. For example,
ECIndexCache corresponding to some RelOptInfo's relids exists in the
same RelOptInfo. When calling the get_ecmember_indexes() function with
a RelOptInfo, we pass RelOptInfo->ECIndexCache together. On the other
hand, since Relids appear in various places, it is sometimes difficult
to prepare a corresponding ECIndexCache. In such cases, we give up
caching and pass NULL.

Besides, one ECIndexCache can only map to one EquivalenceClass.
ECIndexCache only caches for the first EquivalenceClass it encounters
and does not cache for another EC.

My method abandons full caching to prevent overhead. However, it
overcame the regression problem for Query B. As can be seen from
Figure 2, the regression with the v4 patch is either non-existent or
negligible. Furthermore, the v4 patch is faster than the v3 patch when
the number of partitions is 32 or less.

In addition to Query B, the results with Query A are shown in Figure
3. I cannot recognize any regression from Figure 3. Please be noted
that these results are done on my machine and may differ in other
environments.

However, when the number of partitions was relatively large, my patch
was slightly slower than the v3 patch. This may be due to too frequent
memory allocation. ECIndexCache is a large struct containing 13
pointers. In the current implementation, ECIndexCache exists within
commonly used structs such as RelOptInfo. Therefore, ECIndexCache is
allocated even if no one uses it. When there were 256 partitions of
Query B, 88509 ECIndexCache instances were allocated, but only 2295
were actually used. This means that 95.4% were wasted. I think
on-demand allocation would solve this problem. Similar problems could
also occur with other workloads, including OLTP. I'm going to try this
approach soon.

I really apologize for not commenting on the rest of your reply. I
will continue to consider them.

--
Best regards,
Yuya Watari

#15

dgrowleyml@gmail.com

almost 4 years ago

In reply to: Yuya Watari (#14)

Re: [PoC] Reducing planning time when tables have many partitions

On Fri, 26 Aug 2022 at 12:40, Yuya Watari <watari.yuya@gmail.com> wrote:

This performance degradation is due to the heavy processing of the
get_ec***_indexes***() functions. These functions are the core part of
the optimization we are working on in this thread, but they are
relatively heavy when the number of partitions is small.

I noticed that these functions were called repeatedly with the same
arguments. During planning Query B with one partition, the
get_ec_source_indexes_strict() function was called 2087 times with
exactly the same parameters. Such repeated calls occurred many times
in a single query.

How about instead of doing this caching like this, why don't we code
up some iterators that we can loop over to fetch the required EMs.

I'll attempt to type out my thoughts here without actually trying to
see if this works:

typedef struct EquivalenceMemberIterator
{
EquivalenceClass *ec;
Relids relids;
Bitmapset *em_matches;
int position; /* last found index of em_matches or -1 */
bool use_index;
bool with_children;
bool with_norel_members;
} EquivalenceMemberIterator;

We'd then have functions like:

static void
get_ecmember_indexes_iterator(EquivalenceMemberIterator *it,
PlannerInfo *root, EquivalenceClass *ec, Relids relids, bool
with_children, bool with_norel_members)
{
it->ec = ec;
it->relids = relids;
it->position = -1;

it->use_index = (root->simple_rel_array_size > 32); /* or whatever
threshold is best */
it->with_children = with_children;
it->with_norel_members = with_norel_members;

if (it->use_index)
it->em_matches = get_ecmember_indexes(root, ec, relids,
with_children, with_norel_members);
else
it->em_matches = NULL;
}

static EquivalenceMember *
get_next_matching_member(PlannerInfo *root, EquivalenceMemberIterator *it)
{
if (it->use_index)
{
it->position = bms_next_member(it->ec_matches, it->position);
if (it->position >= 0)
return list_nth(root->eq_members, it->position);
return NULL;
}
else
{
int i = it->position;
while ((i = bms_next_member(it->ec->ec_member_indexes, i) >= 0)
{
/* filter out the EMs we don't want here "break" when
we find a match */
}
it->position = i;
if (i >= 0)
return list_nth(root->eq_members, i);
return NULL;
}
}

Then the consuming code will do something like:

EquivalenceMemberIterator iterator;
get_ecmember_indexes_iterator(&iterator, root, ec, relids, true, false);

while ((cur_em = get_next_matching_member(root, &it)) != NULL)
{
// do stuff
}

David

#16

watari.yuya@gmail.com

almost 4 years ago

In reply to: David Rowley (#15)

Re: [PoC] Reducing planning time when tables have many partitions

Dear David,

On Fri, Aug 26, 2022 at 12:18 PM David Rowley <dgrowleyml@gmail.com> wrote:

How about instead of doing this caching like this, why don't we code
up some iterators that we can loop over to fetch the required EMs.

Thank you very much for your quick reply and for sharing your idea
with code. I also think introducing EquivalenceMemberIterator is one
good alternative solution. I will try to implement and test it.

Thank you again for helping me.

--
Best regards,
Yuya Watari

#17

[1]: https://github.com/winkyao/join-order-benchmark

watari.yuya@gmail.com

almost 4 years ago

In reply to: Yuya Watari (#16)

Re: [PoC] Reducing planning time when tables have many partitions

Hello,

On Fri, Aug 26, 2022 at 5:53 PM Yuya Watari <watari.yuya@gmail.com> wrote:

Thank you very much for your quick reply and for sharing your idea
with code. I also think introducing EquivalenceMemberIterator is one
good alternative solution. I will try to implement and test it.

I apologize for my late response. I have implemented several
approaches and tested them.

1. Changes

I will describe how I modified our codes. I tested five versions:

* v1: The first draft patch by David with bug fixes by me. This patch
does not perform any optimizations based on Bitmapset operations.
* v3: The past patch
* v5 (v3 with revert): The v3 with revert of one of our optimizations
* v6 (Iterator): An approach using iterators to enumerate over
EquivalenceMembers. This approach is David's suggestion in the
previous email.
* v7 (Cache): My approach to caching the result of get_ec***indexes***()

Please be noted that there is no direct parent-child relationship
between v6 and v7; they are v5's children, i.e., siblings. I'm sorry
for the confusing versioning.

1.1. Revert one of our optimizations (v5)

As I mentioned in the comment in
v[5|6|7]-0002-Revert-one-of-the-optimizations.patch, I reverted one of
our optimizations. This code tries to find EquivalenceMembers that do
not satisfy the bms_overlap condition. We encounter such members early
in the loop, so the linear search is enough, and our optimization is
too excessive here. As a result of experiments, I found this
optimization was a bottleneck, so I reverted it.

v6 (Iterator) and v7 (Cache) include this revert.

1.2. Iterator (v6)

I have implemented the iterator approach. The code is based on what
David advised, but I customized it a bit. I added the "bool
caller_needs_recheck" argument to get_ecmember_indexes_iterator() and
other similar functions. If this argument is true, the iterator
enumerates all EquivalenceMembers without checking conditions such as
bms_is_subset or bms_overlap.

This change is because callers of these iterators sometimes recheck
desired conditions after calling it. For example, if some caller wants
EquivalenceMembers whose Relids is equal to some value, it calls
get_ecmember_indexes(). However, since the result may have false
positives, the caller has to recheck the result by the bms_equal()
condition. In this case, if the threshold is below and we don't
perform our optimization, checking bms_overlap() in the iterator does
not make sense. We can solve this problem by passing true to the
"caller_needs_recheck" argument to skip redundant checking.

1.3. Cache (v7)

I have improved my caching approach. First, I introduced the on-demand
allocation approach I mentioned in the previous email. ECIndexCache is
allocated not together with RelOptInfo but when using it.

In addition to this, a new version of the patch can handle multiple
EquivalenceClasses. In the previous version, caching was only possible
for one EquivalenceClass. This limitation is to prevent overhead but
reduces caching opportunities. So, I have improved it so that it can
handle all EquivalenceClasses. I made this change on the advice of
Fujita-san. Thank you, Fujita-san.

2. Experimental Results

I conducted experiments to test these methods.

2.1. Query A

Figure 1 illustrates the planning times of Query A. Please see the
previous email for what Query A refers to. The performance of all
methods except master and v1 are almost the same. I cannot observe any
degradation from this figure.

2.2. Query B

Query B joins eight tables. In the previous email, I mentioned that
the v3 patch has significant degradation for this query.

Figure 2 and Table 1 show the results. The three approaches of v5, v6
(Iterator), and v7 (Cache) showed good overall performance. In
particular, v7 (Cache) performed best for the smaller number of
partitions.

Table 1: Planning Time of Query B (ms)
-------------------------------------
n | Master | v1 | v3
-------------------------------------
1 | 55.459 | 57.376 | 58.849
2 | 54.162 | 56.454 | 57.615
4 | 56.491 | 59.742 | 57.108
8 | 62.694 | 67.920 | 59.591
16 | 79.547 | 90.589 | 64.954
32 | 134.623 | 160.452 | 76.626
64 | 368.716 | 439.894 | 107.278
128 | 1374.000 | 1598.748 | 170.909
256 | 5955.762 | 6921.668 | 324.113
-------------------------------------
--------------------------------------------------------
n | v5 (v3 with revert) | v6 (Iterator) | v7 (Cache)
--------------------------------------------------------
1 | 56.268 | 57.520 | 56.703
2 | 55.511 | 55.212 | 54.395
4 | 55.643 | 55.025 | 54.996
8 | 57.770 | 57.519 | 57.114
16 | 63.075 | 63.117 | 63.161
32 | 74.788 | 74.369 | 75.801
64 | 104.027 | 104.787 | 105.450
128 | 169.473 | 169.019 | 174.919
256 | 321.450 | 322.739 | 342.601
--------------------------------------------------------

2.3. Join Order Benchmark

It is essential to test real workloads, so I used the Join Order
Benchmark [1]https://github.com/winkyao/join-order-benchmark. This benchmark contains many complicated queries
joining a lot of tables. I partitioned fact tables by 'id' columns and
measured query planning times.

Figure 3 and Table 2 describe the results. The results showed that all
methods produced some degradations when there were not so many
partitions. However, the degradation of v7 (cache) was relatively
small. It was 0.8% with two partitions, while the other methods'
degradation was at least 1.6%.

Table 2: Speedup of Join Order Benchmark (higher is better)
-----------------------------------------------------------------
n | v3 | v5 (v3 with revert) | v6 (Iterator) | v7 (Cache)
-----------------------------------------------------------------
2 | 95.8% | 97.3% | 97.3% | 97.7%
4 | 96.9% | 98.4% | 98.0% | 99.2%
8 | 102.2% | 102.9% | 98.1% | 103.0%
16 | 107.6% | 109.5% | 110.1% | 109.4%
32 | 123.5% | 125.4% | 125.5% | 125.0%
64 | 165.2% | 165.9% | 164.6% | 165.9%
128 | 308.2% | 309.2% | 312.1% | 311.4%
256 | 770.1% | 772.3% | 776.6% | 773.2%
-----------------------------------------------------------------

2.4. pgbench

Our optimizations must not cause negative impacts on OLTP workloads. I
conducted pgbench, and Figure 4 and Table 3 show its result.

Table 3: The result of pgbench (tps)
------------------------------------------------------------------------
n | Master | v3 | v5 (v3 with revert) | v6 (Iterator) | v7 (Cache)
------------------------------------------------------------------------
1 | 7617 | 7510 | 7484 | 7599 | 7561
2 | 7613 | 7487 | 7503 | 7609 | 7560
4 | 7559 | 7497 | 7453 | 7560 | 7553
8 | 7506 | 7429 | 7405 | 7523 | 7503
16 | 7584 | 7481 | 7466 | 7558 | 7508
32 | 7556 | 7456 | 7448 | 7558 | 7521
64 | 7555 | 7452 | 7435 | 7541 | 7504
128 | 7542 | 7430 | 7442 | 7558 | 7517
------------------------------------------------------------------------
Avg | 7566 | 7468 | 7455 | 7563 | 7528
------------------------------------------------------------------------

This result indicates that v3 and v5 (v3 with revert) had a
significant negative impact on the pgbench workload. Their tps
decreased by 1.3% or more. On the other hand, degradations of v6
(Iterator) and v7 (Cache) are non-existent or negligible.

3. Causes of Degression

We could not avoid degradation with the Join Order Benchmark. The
leading cause of this problem is that Bitmapset operation, especially
bms_next_member(), is relatively slower than simple enumeration over
List.

It is easy to imagine that bms_next_member(), which has complex bit
operations, is a little heavier than List enumerations simply
advancing a pointer. The fact that even the v1, where we don't perform
any optimizations, slowed down supports this notion.

I think preventing this regression is very hard. To do so, we must
have both List and Bitmapset representations of EquivalenceMembers.
However, I don't prefer this solution because it is redundant and
leads to less code maintainability.

Reducing Bitmapset->nwords is another possible solution. I will try
it, but it will likely not solve the significant degradation in
pgbench for v3 and v5. This is because such degradation did not occur
with v6 and v7, with also use Bitmapset.

4. Which Method is The Best?

First of all, it is hard to adopt v3 and v5 (v3 with revert) because
they degrade performance on OLTP workloads. Therefore, v6 (Iterator)
and v7 (Cache) are possible candidates. Of these methods, I prefer v7
(Cache).

Actually, I don't think an approach to introducing thresholds is a
good idea because the best threshold is unclear. If we become
conservative to avoid degradation, we must increase the threshold, but
that takes away the opportunity for optimization. The opposite is
true.

In contrast, v7 (Cache) is an essential solution in terms of reducing
the cost of repeated function calls and does not require the
introduction of a threshold. Besides, it performs better on almost all
workloads, including the Join Order Benchmark. It also has no negative
impacts on OLTP.

In conclusion, I think v7 (Cache) is the most desirable. Of course,
the method may have some problems, but it is worth considering.

--
Best regards,
Yuya Watari

#18

watari.yuya@gmail.com

over 3 years ago

In reply to: Yuya Watari (#17)

Re: [PoC] Reducing planning time when tables have many partitions

Hello,

On Wed, Sep 21, 2022 at 6:43 PM Yuya Watari <watari.yuya@gmail.com> wrote:

1.1. Revert one of our optimizations (v5)

As I mentioned in the comment in
v[5|6|7]-0002-Revert-one-of-the-optimizations.patch, I reverted one of
our optimizations. This code tries to find EquivalenceMembers that do
not satisfy the bms_overlap condition. We encounter such members early
in the loop, so the linear search is enough, and our optimization is
too excessive here. As a result of experiments, I found this
optimization was a bottleneck, so I reverted it.

In the previous mail, I proposed a revert of one excessive
optimization. In addition, I found a new bottleneck and attached a new
version of the patch solving it to this email.

The new bottleneck exists in the select_outer_pathkeys_for_merge()
function. At the end of this function, we count EquivalenceMembers
that satisfy the specific condition. To count them, we have used
Bitmapset operations. Through experiments, I concluded that this
optimization is effective for larger cases but leads to some
degradation for the smaller number of partitions. The new patch
switches two algorithms depending on the problem sizes.

1. Experimental result

1.1. Join Order Benchmark

As in the previous email, I used the Join Order Benchmark to evaluate
the patches' performance. The correspondence between each version and
patches is as follows.

v3: v8-0001-*.patch
v5 (v3 with revert): v8-0001-*.patch + v8-0002-*.patch
v8 (v5 with revert): v8-0001-*.patch + v8-0002-*.patch + v8-0003-*.patch

I show the speed-up of each method compared with the master branch in
Table 1. When the number of partitions is 1, performance degradation
is kept to 1.1% in v8, while they are 4.2% and 1.8% in v3 and v5. This
result indicates that a newly introduced revert is effective.

Table 1: Speedup of Join Order Benchmark (higher is better)
(n = the number of partitions)
----------------------------------------------------------
n | v3 | v5 (v3 with revert) | v8 (v5 with revert)
----------------------------------------------------------
2 | 95.8% | 98.2% | 98.9%
4 | 97.2% | 99.7% | 99.3%
8 | 101.4% | 102.5% | 103.4%
16 | 108.7% | 111.4% | 110.2%
32 | 127.1% | 127.6% | 128.8%
64 | 169.5% | 172.1% | 172.4%
128 | 330.1% | 335.2% | 332.3%
256 | 815.1% | 826.4% | 821.8%
----------------------------------------------------------

1.2. pgbench

The following table describes the result of pgbench. The v5 and v8
performed clearly better than the v3 patch. The difference between v5
and v8 is not so significant, but v8's performance is close to the
master branch.

Table 2: The result of pgbench (tps)
-----------------------------------------------------------------
n | Master | v3 | v5 (v3 with revert) | v8 (v5 with revert)
-----------------------------------------------------------------
1 | 7550 | 7422 | 7474 | 7521
2 | 7594 | 7381 | 7536 | 7529
4 | 7518 | 7362 | 7461 | 7524
8 | 7459 | 7340 | 7424 | 7460
-----------------------------------------------------------------
Avg | 7531 | 7377 | 7474 | 7509
-----------------------------------------------------------------

2. Conclusion and future works

The revert in the v8-0003-*.patch is effective in preventing
performance degradation for the smaller number of partitions. However,
I don't think what I have done in the patch is the best or ideal
solution. As I mentioned in the comments in the patch, switching two
algorithms may be ugly because it introduces code duplication. We need
a wiser solution to this problem.

--
Best regards,
Yuya Watari

#19

watari.yuya@gmail.com

over 3 years ago

In reply to: Yuya Watari (#18)

Re: [PoC] Reducing planning time when tables have many partitions

Hello,

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

--
Best regards,
Yuya Watari

#20

lepihov@gmail.com

over 3 years ago

In reply to: Yuya Watari (#19)

Re: [PoC] Reducing planning time when tables have many partitions

On 2/11/2022 15:27, Yuya Watari wrote:

Hello,

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

I'm still in review of your patch now. At most it seems ok, but are you
really need both eq_sources and eq_derives lists now? As I see,
everywhere access to these lists guides by eclass_source_indexes and
eclass_derive_indexes correspondingly. Maybe to merge them?

--
regards,
Andrey Lepikhov
Postgres Professional

#21

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andrei Lepikhov (#20)

#22

Zhang Mingli

zmlpostgres@gmail.com

over 3 years ago

In reply to: Tom Lane (#21)

#23

lepihov@gmail.com

over 3 years ago

In reply to: Yuya Watari (#19)

#24

thom@linux.com

over 3 years ago

In reply to: Zhang Mingli (#22)

#25

alvherre@2ndquadrant.com

over 3 years ago

In reply to: Thom Brown (#24)

#26

thom@linux.com

over 3 years ago

In reply to: Alvaro Herrera (#25)

#27

thom@linux.com

over 3 years ago

In reply to: Thom Brown (#26)

#28

watari.yuya@gmail.com

over 3 years ago

In reply to: Thom Brown (#27)

#29

dgrowleyml@gmail.com

over 3 years ago

In reply to: Yuya Watari (#28)

#30

thom@linux.com

over 3 years ago

In reply to: David Rowley (#29)

#31

dgrowleyml@gmail.com

over 3 years ago

In reply to: Thom Brown (#30)

#32

thom@linux.com

over 3 years ago

In reply to: David Rowley (#31)

#33

watari.yuya@gmail.com

over 3 years ago

In reply to: Thom Brown (#32)

#34

dgrowleyml@gmail.com

over 3 years ago

In reply to: Yuya Watari (#33)

#35

watari.yuya@gmail.com

over 3 years ago

In reply to: David Rowley (#34)

#36

watari.yuya@gmail.com

over 3 years ago

In reply to: Yuya Watari (#35)

#37

dgrowleyml@gmail.com

over 3 years ago

In reply to: Yuya Watari (#36)

#38

watari.yuya@gmail.com

over 3 years ago

In reply to: David Rowley (#37)

#39

lepihov@gmail.com

over 3 years ago

In reply to: Yuya Watari (#38)

#40

watari.yuya@gmail.com

over 3 years ago

In reply to: Andrei Lepikhov (#39)

#41

alvherre@2ndquadrant.com

over 3 years ago

In reply to: Yuya Watari (#40)

#42

dgrowleyml@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#41)

#43

watari.yuya@gmail.com

over 3 years ago

In reply to: David Rowley (#42)

#44

watari.yuya@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#43)

#45

lepihov@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#44)

#46

lepihov@gmail.com

almost 3 years ago

In reply to: Andrei Lepikhov (#45)

#47

watari.yuya@gmail.com

almost 3 years ago

In reply to: Andrei Lepikhov (#46)

#48

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Andrei Lepikhov (#46)

#49

watari.yuya@gmail.com

almost 3 years ago

In reply to: Ashutosh Bapat (#48)

#50

lepihov@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#49)

#51

watari.yuya@gmail.com

almost 3 years ago

In reply to: Andrei Lepikhov (#50)

#52

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Yuya Watari (#49)

#53

watari.yuya@gmail.com

almost 3 years ago

In reply to: Ashutosh Bapat (#52)

#54

lepihov@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#53)

#55

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Andrei Lepikhov (#54)

#56

lepihov@gmail.com

almost 3 years ago

In reply to: Ashutosh Bapat (#55)

#57

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Andrei Lepikhov (#56)

#58

watari.yuya@gmail.com

almost 3 years ago

In reply to: Ashutosh Bapat (#57)

#59

dgrowleyml@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#44)

#60

dgrowleyml@gmail.com

almost 3 years ago

In reply to: David Rowley (#59)

#61

dgrowleyml@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#58)

#62

watari.yuya@gmail.com

almost 3 years ago

In reply to: David Rowley (#61)

#63

watari.yuya@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#62)

#64

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Yuya Watari (#63)

#65

lepihov@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#63)

#66

watari.yuya@gmail.com

almost 3 years ago

In reply to: Andrei Lepikhov (#65)

#67

ashutosh.bapat@enterprisedb.com

almost 3 years ago

In reply to: Yuya Watari (#66)

#68

lepihov@gmail.com

almost 3 years ago

In reply to: Yuya Watari (#66)

#69

watari.yuya@gmail.com

over 2 years ago

In reply to: Andrei Lepikhov (#68)

#70

lena.ribackina@yandex.ru

over 2 years ago

In reply to: Yuya Watari (#69)

#71

John Naylor

john.naylor@enterprisedb.com

over 2 years ago

In reply to: Alena Rybakina (#70)

#72

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: John Naylor (#71)

#73

lena.ribackina@yandex.ru

over 2 years ago

In reply to: Tom Lane (#72)

#74

lepihov@gmail.com

over 2 years ago

In reply to: Yuya Watari (#69)

#75

watari.yuya@gmail.com

over 2 years ago

In reply to: Andrei Lepikhov (#74)

#76

watari.yuya@gmail.com

over 2 years ago

In reply to: Yuya Watari (#75)

#77

watari.yuya@gmail.com

over 2 years ago

In reply to: Yuya Watari (#76)

#78

lena.ribackina@yandex.ru

over 2 years ago

In reply to: Yuya Watari (#77)

#79

watari.yuya@gmail.com

over 2 years ago

In reply to: Alena Rybakina (#78)

#80

lena.ribackina@yandex.ru

over 2 years ago

In reply to: Yuya Watari (#79)

#81

watari.yuya@gmail.com

over 2 years ago

In reply to: Alena Rybakina (#80)

#82

ashutosh.bapat@enterprisedb.com

over 2 years ago

In reply to: Yuya Watari (#81)

#83

watari.yuya@gmail.com

about 2 years ago

In reply to: Ashutosh Bapat (#82)

#84

jian he

jian.universality@gmail.com

about 2 years ago

In reply to: Yuya Watari (#83)

#85

watari.yuya@gmail.com

about 2 years ago

In reply to: jian he (#84)

#86

watari.yuya@gmail.com

almost 2 years ago

In reply to: Yuya Watari (#85)

#87

watari.yuya@gmail.com

over 1 year ago

In reply to: Yuya Watari (#86)

#88

watari.yuya@gmail.com

over 1 year ago

In reply to: Yuya Watari (#87)

#89

Dmitry Dolgov

9erthalion6@gmail.com

over 1 year ago

In reply to: Yuya Watari (#88)

#90

watari.yuya@gmail.com

over 1 year ago

In reply to: Dmitry Dolgov (#89)

#91

ashutosh.bapat@enterprisedb.com

over 1 year ago

In reply to: Yuya Watari (#90)

#92

alvherre@2ndquadrant.com

over 1 year ago

In reply to: Ashutosh Bapat (#91)

#93

ashutosh.bapat@enterprisedb.com

over 1 year ago

In reply to: Alvaro Herrera (#92)

#94

watari.yuya@gmail.com

over 1 year ago

In reply to: Ashutosh Bapat (#93)

#95

alvherre@2ndquadrant.com

over 1 year ago

In reply to: Yuya Watari (#94)

#96

watari.yuya@gmail.com

over 1 year ago

In reply to: Alvaro Herrera (#95)

#97

alvherre@2ndquadrant.com

over 1 year ago

In reply to: Yuya Watari (#96)

#98

watari.yuya@gmail.com

over 1 year ago

In reply to: Alvaro Herrera (#97)

#99

watari.yuya@gmail.com

over 1 year ago

In reply to: Yuya Watari (#98)

#100

watari.yuya@gmail.com

over 1 year ago

In reply to: Yuya Watari (#99)

#101

watari.yuya@gmail.com

over 1 year ago

In reply to: Yuya Watari (#100)

#102

newtglobal postgresql_contributors

watari.yuya@gmail.com

over 1 year ago

In reply to: Yuya Watari (#101)

#103

postgresql_contributors@newtglobalcorp.com

over 1 year ago

In reply to: Yuya Watari (#102)

#104

dgrowleyml@gmail.com

over 1 year ago

In reply to: Yuya Watari (#102)

#105

watari.yuya@gmail.com

over 1 year ago

In reply to: David Rowley (#104)

#106

dgrowleyml@gmail.com

over 1 year ago

In reply to: Yuya Watari (#105)

#107

watari.yuya@gmail.com

over 1 year ago

In reply to: David Rowley (#106)

#108

dgrowleyml@gmail.com

over 1 year ago

In reply to: Yuya Watari (#107)

#109

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: David Rowley (#108)

#110

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: David Rowley (#108)

#111

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: Ashutosh Bapat (#110)

#112

watari.yuya@gmail.com

about 1 year ago

In reply to: David Rowley (#108)

#113

watari.yuya@gmail.com

about 1 year ago

In reply to: Tom Lane (#109)

#114

watari.yuya@gmail.com

about 1 year ago

In reply to: Ashutosh Bapat (#110)

#115

dgrowleyml@gmail.com

about 1 year ago

In reply to: Tom Lane (#109)

#116

dgrowleyml@gmail.com

about 1 year ago

In reply to: David Rowley (#115)

#117

watari.yuya@gmail.com

about 1 year ago

In reply to: David Rowley (#116)

#118

Langote_Amit_f8@lab.ntt.co.jp

about 1 year ago

In reply to: David Rowley (#116)

#119

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: David Rowley (#116)

#120

dgrowleyml@gmail.com

about 1 year ago

In reply to: Ashutosh Bapat (#119)

#121

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#118)

#122

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: David Rowley (#121)

#123

dgrowleyml@gmail.com

about 1 year ago

In reply to: Tom Lane (#122)

#124

dgrowleyml@gmail.com

about 1 year ago

In reply to: David Rowley (#123)

#125

watari.yuya@gmail.com

about 1 year ago

In reply to: David Rowley (#124)

#126

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: David Rowley (#124)

#127

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: David Rowley (#124)

#128

dgrowleyml@gmail.com

about 1 year ago

In reply to: Ashutosh Bapat (#127)

#129

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: David Rowley (#128)

#130

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: David Rowley (#128)

#131

dgrowleyml@gmail.com

about 1 year ago

In reply to: Ashutosh Bapat (#130)

#132

watari.yuya@gmail.com

about 1 year ago

In reply to: David Rowley (#131)

#133

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: David Rowley (#131)

#134

dgrowleyml@gmail.com

about 1 year ago

In reply to: Tom Lane (#133)

#135

Langote_Amit_f8@lab.ntt.co.jp

about 1 year ago

In reply to: David Rowley (#134)

#136

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Amit Langote (#135)

#137

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#135)

#138

ashutosh.bapat@enterprisedb.com

about 1 year ago

In reply to: David Rowley (#137)

#139

Langote_Amit_f8@lab.ntt.co.jp

about 1 year ago

In reply to: David Rowley (#137)

#140

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#139)

#141

Langote_Amit_f8@lab.ntt.co.jp

about 1 year ago

In reply to: David Rowley (#140)

#142

dgrowleyml@gmail.com

about 1 year ago

In reply to: Amit Langote (#141)

#143