MergeJoin beats HashJoin in the case of multiple hash clauses

lepihov@gmail.com

almost 3 years ago

In reply to: Andy Fan (#4)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On Mon, Sep 11, 2023, at 11:51 AM, Andy Fan wrote:

Hi,

On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Hi, all.

Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.

I can reproduce the visitation you want to improve and verify the patch
can do it expectedly. I think this is a right thing to do.

The attached patch shows a sketch of the solution.

I understand that this is a sketch of the solution, but the below
changes still
make me confused.
+ if (innerbucketsize > virtualbuckets)
+     innerbucketsize = 1.0 / virtualbuckets;
innerbucketsize is a fraction of rows in all the rows, so it is between
0.0 and 1.0.
and virtualbuckets is the number of buckets in total (when considered
the mutli
batchs), how is it possible for 'innerbucketsize > virtualbuckets' ?
Am
I missing something?

You are right here. I've made a mistake here. Changed diff is in attachment.

--
Regards,
Andrei Lepikhov

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 2 years ago

In reply to: Andrei Lepikhov (#5)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On 9/11/23 10:04, Lepikhov Andrei wrote:

On Mon, Sep 11, 2023, at 11:51 AM, Andy Fan wrote:

Hi,

On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Hi, all.

Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.

Yes, this analysis is correct - final_cost_hashjoin assumes the clauses
may be correlated (not necessarily by functional dependencies, just that
the overall ndistinct is not a simple product of per-column ndistincts).

And it even says so in the comment before calculating bucket size:

* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.

I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.

Unfortunately, AFAICS the patch simply assumes the extreme in the
opposite direction - it assumes each clause splits the bucket for each
distinct value in the column. Which works great when it's true, but
surely it'd have issues when the columns are correlated?

I think this deserves more discussion, i.e. what happens if the
assumptions do not hold? We know what happens for the conservative
approach, but what's the worst thing that would happen for the
optimistic one?

I doubt e can simply switch from the conservative approach to the
optimistic one. Yes, it'll make some queries faster, but for other
queries it likely causes problems and slowdowns.

IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.

This however reminds me we don't use extended statistics for join
clauses at all. Which means that even with accurate extended statistics,
we can still get stuff like this for multiple join clauses:

Hash Join (cost=1317.00..2386.00 rows=200 width=24)
(actual time=85.781..8574.784 rows=8000000 loops=1)

This is unrelated to the issue discussed here, of course, as it won't
affect join method selection for that join. But it certainly will affect
all estimates/costs above that join, which can be pretty disastrous.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: https://open.substack.com/pub/danolivo/p/why-postgresql-prefers-mergejoin?r=34q1yy&utm_campaign=post&utm_medium=web
https://open.substack.com/pub/danolivo/p/why-postgresql-prefers-mergejoin?r=34q1yy&utm_campaign=post&utm_medium=web

lepihov@gmail.com

almost 2 years ago

In reply to: Tomas Vondra (#6)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On 3/11/2023 23:43, Tomas Vondra wrote:

On 9/11/23 10:04, Lepikhov Andrei wrote:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.

I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.

Yes, It was contradictory idea, though.

IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.

And I've done it. Sorry for so long response. This patch employs of
extended statistics for estimation of the HashJoin bucket_size. In
addition, I describe the idea in more convenient form here [1]https://open.substack.com/pub/danolivo/p/why-postgresql-prefers-mergejoin?r=34q1yy&utm_campaign=post&utm_medium=web.
Obviously, it needs the only ndistinct to make a prediction that allows
to reduce computational cost of this statistic.

--
regards, Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#7)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On 7/8/24 19:45, Andrei Lepikhov wrote:

On 3/11/2023 23:43, Tomas Vondra wrote:

On 9/11/23 10:04, Lepikhov Andrei wrote:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.

I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.

Yes, It was contradictory idea, though.

IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.

And I've done it. Sorry for so long response. This patch employs of
extended statistics for estimation of the HashJoin bucket_size. In
addition, I describe the idea in more convenient form here [1].
Obviously, it needs the only ndistinct to make a prediction that allows
to reduce computational cost of this statistic.

Minor change to make cfbot happier.

--
regards, Andrei Lepikhov

aekorotkov@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#8)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Hi, Andrei!

On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 7/8/24 19:45, Andrei Lepikhov wrote:

On 3/11/2023 23:43, Tomas Vondra wrote:

On 9/11/23 10:04, Lepikhov Andrei wrote:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.

I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.

Yes, It was contradictory idea, though.

IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.

And I've done it. Sorry for so long response. This patch employs of
extended statistics for estimation of the HashJoin bucket_size. In
addition, I describe the idea in more convenient form here [1].
Obviously, it needs the only ndistinct to make a prediction that allows
to reduce computational cost of this statistic.

Minor change to make cfbot happier.

Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.

if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;

IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?

I've slightly revised the patch. I've run pg_indent and renamed
s/saveList/origin_rinfos/g for better readability.

Also, the patch badly needs regression test coverage. We can't
include costs in expected outputs. But that could be some plans,
which previously were reliably merge joins but now become reliable
hash joins.

------
Regards,
Alexander Korotkov
Supabase

#10

lepihov@gmail.com

over 1 year ago

In reply to: Alexander Korotkov (#9)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On 17/2/2025 01:34, Alexander Korotkov wrote:

Hi, Andrei!

On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.

if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;

IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?

I agree with your point. But I think the code works precisely the way
you have described.

I've slightly revised the patch. I've run pg_indent and renamed
s/saveList/origin_rinfos/g for better readability.

Thank You!

Also, the patch badly needs regression test coverage. We can't
include costs in expected outputs. But that could be some plans,
which previously were reliably merge joins but now become reliable
hash joins.

I added one test here. Writing more tests on this feature is hard, but
feature [1]Showing applied extended statistics in explain Part 2 /messages/by-id/TYYPR01MB82310B308BA8770838F681619E5E2@TYYPR01MB8231.jpnprd01.prod.outlook.com may provide us with additional tools to reveal extended stat
internals. I also have thought about injection points, but it seems an
over-complication.

[1]: Showing applied extended statistics in explain Part 2 /messages/by-id/TYYPR01MB82310B308BA8770838F681619E5E2@TYYPR01MB8231.jpnprd01.prod.outlook.com
/messages/by-id/TYYPR01MB82310B308BA8770838F681619E5E2@TYYPR01MB8231.jpnprd01.prod.outlook.com

--
regards, Andrei Lepikhov

#11

aekorotkov@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#10)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 17/2/2025 01:34, Alexander Korotkov wrote:

Hi, Andrei!

On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.

if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;

IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?

I agree with your point. But I think the code works precisely the way
you have described.

You're right. I just messed up with the sides of comparison operator.

------
Regards,
Alexander Korotkov
Supabase

#12

aekorotkov@gmail.com

over 1 year ago

In reply to: Alexander Korotkov (#11)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On Wed, Mar 5, 2025 at 4:43 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 17/2/2025 01:34, Alexander Korotkov wrote:

Hi, Andrei!

On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.

if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;

IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?

I agree with your point. But I think the code works precisely the way
you have described.

You're right. I just messed up with the sides of comparison operator.

I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.

------
Regards,
Alexander Korotkov
Supabase

#13

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Alexander Korotkov (#12)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Hi,

On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:

I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.

I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.

Andres

#14

Matthias van de Meent

boekewurm+postgres@gmail.com

about 1 year ago

In reply to: Andres Freund (#13)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On Fri, 11 Apr 2025 at 00:27, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:

I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.

I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.

I think the patch from that mail got committed as 6bb6a62f about a
month ago, which was shortly after Alexander's message. Did you get
confused about the month of his message, or by the incorrect state of
the CF entry?

Kind regards,

Matthias van de Meent

#15

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Matthias van de Meent (#14)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Hi,

On 2025-04-11 00:47:19 +0200, Matthias van de Meent wrote:

On Fri, 11 Apr 2025 at 00:27, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:

I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.

I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.

I think the patch from that mail got committed as 6bb6a62f about a
month ago, which was shortly after Alexander's message. Did you get
confused about the month of his message, or by the incorrect state of
the CF entry?

Sorry for that Alexander - for some reason the email just showed up as new in
my inbox and I only looked at the date, not the month :(

Greetings,

Andres Freund

#16

aekorotkov@gmail.com

about 1 year ago

In reply to: Andres Freund (#15)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

On Fri, Apr 11, 2025 at 5:06 AM Andres Freund <andres@anarazel.de> wrote:

On 2025-04-11 00:47:19 +0200, Matthias van de Meent wrote:

On Fri, 11 Apr 2025 at 00:27, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:

I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.

I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.

I think the patch from that mail got committed as 6bb6a62f about a
month ago, which was shortly after Alexander's message. Did you get
confused about the month of his message, or by the incorrect state of
the CF entry?

Sorry for that Alexander - for some reason the email just showed up as new in
my inbox and I only looked at the date, not the month :(

Not a problem at all!

------
Regards,
Alexander Korotkov
Supabase

#17

tndrwang@gmail.com

about 1 year ago

In reply to: Alexander Korotkov (#16)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Hi,

While I debug hashjoin codes, in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.

List *clauses = list_copy(hashclauses);
...

I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.

Any thoughts?

--
Thanks, Tender Wang

#18

tndrwang@gmail.com

about 1 year ago

In reply to: Tender Wang (#17)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Tender Wang <tndrwang@gmail.com> 于2025年4月14日周一 14:17写道：

Hi,

While I debug hashjoin codes, in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.

List *clauses = list_copy(hashclauses);
...

I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.

Any thoughts?

Hi Alexander,

In the last thread, I found a minor optimization for the code in
estimate_multivariate_bucketsize().
Adjust the place of list_copy() at the start of
estimate_multivariate_bucketsize, and we can avoid unnecessarily creating a
new list
and memory copy if we have only a single hash clause.

Do you think it's worth doing this?
--
Thanks,
Tender Wang

#19

[1]: https://commitfest.postgresql.org/patch/5704/

tndrwang@gmail.com

12 months ago

In reply to: Tender Wang (#18)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Tender Wang <tndrwang@gmail.com> 于2025年4月24日周四 22:07写道：

Tender Wang <tndrwang@gmail.com> 于2025年4月14日周一 14:17写道：

Hi,

While I debug hashjoin codes, in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.

List *clauses = list_copy(hashclauses);
...

I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.

Any thoughts?

Hi Alexander,

In the last thread, I found a minor optimization for the code in
estimate_multivariate_bucketsize().
Adjust the place of list_copy() at the start of
estimate_multivariate_bucketsize, and we can avoid unnecessarily creating a
new list
and memory copy if we have only a single hash clause.

Do you think it's worth doing this?

Hi all,

I have added this patch to commitfest[1]https://commitfest.postgresql.org/patch/5704/. I'm hoping someone can review it
for me.

--
Thanks,
Tender Wang

#20

tndrwang@gmail.com

12 months ago

In reply to: Andrei Lepikhov (#5)

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Andrei Lepikhov <lepihov@gmail.com> 于2025年7月2日周三 22:29写道：

On 30/6/2025 04:38, Tender Wang wrote:

Do you think it's worth doing this?

Hi all,

I have added this patch to commitfest[1]. I'm hoping someone can review
it for me.

It makes sense to apply. If you return the comment to its place, you may
reduce the patch size even more.

Thanks for reviewing. I returned the comment to its place. Please review
the attached patch.

--
Thanks,
Tender Wang

Import Notes

Reply to msg id not found: 3128c924-df05-4558-9d41-831d195189a5@gmail.com

#21

lepihov@gmail.com

12 months ago

In reply to: Tender Wang (#20)

#22

tndrwang@gmail.com

12 months ago

In reply to: Andrei Lepikhov (#21)

#23

tndrwang@gmail.com

12 months ago

In reply to: Tender Wang (#22)

#24

Tom Lane

tgl@sss.pgh.pa.us

11 months ago

In reply to: Tender Wang (#23)

#25