Use merge-based matching for MCVs in eqjoinsel

Started by Ilia Evdokimov9 months ago38 messageshackers
Jump to latest
#1Ilia Evdokimov
ilya.evdokimov@tantorlabs.com

Hi hackers,

While analyzing planner performance on JOB with
default_statistics_target = 1000, I noticed that a significant portion
of planning time is spent inside the eqjoinsel() function. According to
perf, in most JOB queries at default_statistics_target = 1000,
eqjoinsel() is the most expensive function during planning, accounting
for approximately 8% of total CPU time. At default_statistics_target =
10000, the planner spend up to 75% of its time inside eqjoinsel(),
making it one of the primary bottlenecks.

Еhis overhead is caused by the O(N^2) nested-loop comparison of MCVs in
var1 = var2 clauses.

I propose an optimization: when the column datatype supports
ordering(i.e., has < and >), we can sort both MCV lists and apply
mege-style algorithm to detect matches. This reduces runtime from O(N^2)
to O(NlogN), where N is the number of MCV entries. The patch also
applies the same optimization to semi-join clauses, which show similar
performance behavior.

On JOB, this changes reduce planner time in most queries with complex
joins and large MCVs with no observable effect on plan quality. I’ve
also attached bar charts showing per-query planner time before and after
the patch for default_statistics_target = 100, 1000, 10000 along with
query numbers for reference.

Any feedback or suggestions are welcome!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachments:

JOB_results.zipapplication/zip; name=JOB_results.zipDownload+11-6
v1-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patchtext/x-patch; charset=UTF-8; name=v1-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patchDownload+327-61
#2Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Ilia Evdokimov (#1)
Re: Use merge-based matching for MCVs in eqjoinsel

On 21.07.2025 16:55, Ilia Evdokimov wrote:

While analyzing planner performance on JOB with
default_statistics_target = 1000, I noticed that a significant portion
of planning time is spent inside the eqjoinsel() function. According
to perf, in most JOB queries at default_statistics_target = 1000,
eqjoinsel() is the most expensive function during planning, accounting
for approximately 8% of total CPU time. At default_statistics_target =
10000, the planner spend up to 75% of its time inside eqjoinsel(),
making it one of the primary bottlenecks.

This overhead is caused by the O(N^2) nested-loop comparison of MCVs
in var1 = var2 clauses.

I propose an optimization: when the column datatype supports
ordering(i.e., has < and >), we can sort both MCV lists and apply
mege-style algorithm to detect matches. This reduces runtime from
O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
also applies the same optimization to semi-join clauses, which show
similar performance behavior.

Following up on my previous message about optimizing eqjoinsel() for
Var1 = Var2 and semijoin clauses, I’d like to share more detailed
performance results across different values of default_statistics_target
on the JOB benchmark.

The performance improvement grows as the number of MCV entries increases
(i.e., with higher default_statistics_target). The table below shows
total planner time summed over all 113 queries in JOB for each setting
of default_statistics_target, before and after applying patch:

Total planner time across all JOB queries
=========================================
default_statistics_target | Before Patch (ms) | After Patch (ms)
--------------------------+-------------------+------------------
                      100 |          1828.433 |         1820.556
                     1000 |          2194.282 |         1963.110
                     2500 |          4606.705 |         2140.126
                     5000 |         16661.581 |         2616.109
                     7500 |         35988.569 |         3061.161
                    10000 |         66616.620 |         3504.144

For default_statistics_target < 1000, the planning time remains the same
before and after the patch. The optimization reduces planner
time substantially - by up to *13X *at default_statistics_target = 10000
- while the generated plans and selectivity calculations remain
unchanged. To illustrate this, the table below shows the 10 slowest JOB
queries (by planning time), along with their planning times before and
after applying the patch.

Top 10 slowest queries at default_statistics_target = 10000
===========================================================
Query | Before Patch (ms) | After Patch (ms)
------+--------------------+-------------------
  29c |           1939.282 |           111.219
  29b |           1939.237 |           100.109
  29a |           1931.870 |           100.224
  31c |           1622.255 |            67.609
  30c |           1602.156 |            70.942
  28b |           1521.026 |            84.058
  30b |           1519.972 |            68.777
  30a |           1518.014 |            70.529
  28a |           1514.908 |            86.662
  31a |           1507.303 |            63.579

As shown, the total planner time for these top 10 queries drops
substantially with the optimization.

I’ve identified and fixed two issues in the original v1 patch: In
'eqjoinsel_semi' the second MCV array was allocated with an incorrect
size. And the initialization of FunctionCallInfoData was moved outside
the comparator compare_mcv_items() to avoid repeated setup during
sorting. I've attached the updated v2 patch with changes.

Any suggestions?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachments:

v2-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patchtext/x-patch; charset=UTF-8; name=v2-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patchDownload+312-61
#3Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Ilia Evdokimov (#2)
Re: Use merge-based matching for MCVs in eqjoinsel

Following up on my previous messages about optimizing eqjoinsel() and
eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed
profiling results showing the effect of the patch on JOB for different
values of default_statistics_target.

The first table shows the total planner time (summed over all 113
queries) before and after applying the patch, along with the speedup
achieved:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------+---------------------+---------------------+--------------------
                     100  | *1.00x*       | 1828.433     |        1820.556
                    1000  | *1.12x*       | 2194.282     |        1963.110
                    2500  | *2.15x*       | 4606.705     |        2140.126
                    5000  | *6.37x*       | 16661.581     |        2616.109
                    7500  | *11.76x*       | 35988.569     |       
3061.161
                   10000  | *19.01x*       | 66616.620     |       
3504.144

The second table shows the profiling of eqjoinsel() using *perf*,
demonstrating that the function, which dominates planning at high
statistics targets, becomes essentially negligible after the patch:

default_statistics_target | eqjoinsel() Before (perf) | eqjoinsel()
After (perf)
--------------------------+---------------------------+--------------------------
                     100  |                     0.01%
|                     0.04%
                    1000  |                     6.23%
|                     0.06%
                    2500  |                    35.45%
|                     0.23%
                    5000  |                    66.14%
|                     0.53%
                    7500  |                    72.70%
|                     0.97%
                   10000  |                    75.42%
|                     1.25%

I’ve attached v3 of the patch. This version adds a check for NULL values
when comparing MCV entries, ensuring correctness in edge cases.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Attachments:

v3-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patchtext/x-patch; charset=UTF-8; name=v3-0001-Optimize-selectivity-estimation-for-Var-Var-clauses.patchDownload+321-61
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ilia Evdokimov (#3)
Re: Use merge-based matching for MCVs in eqjoinsel

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:

I’ve attached v3 of the patch. This version adds a check for NULL values
when comparing MCV entries, ensuring correctness in edge cases.

Um ... what edge cases would those be? We do not put NULL into
MCV arrays.

regards, tom lane

#5Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Tom Lane (#4)
Re: Use merge-based matching for MCVs in eqjoinsel

On 03.09.2025 23:26, Tom Lane wrote:

Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:

I’ve attached v3 of the patch. This version adds a check for NULL values
when comparing MCV entries, ensuring correctness in edge cases.

Um ... what edge cases would those be? We do not put NULL into
MCV arrays.

You're right - MCV arrays never contain NULLs. However, comparing two
MCV values could theoretically return NULL, even though this is very
unlikely. This check existed even before my changes, and similar checks
are used in other selectivity-estimation functions in 'selfuncs.c'.

...
fcinfo->isnull = false;
fresult = FunctionCallInvoke(fcinfo);
if (!fcinfo->isnull && DatumGetBool(fresult))
...

By "edge cases" I was referring to this situation; I probably did not
choose the best wording.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#6David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#2)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi Ilia!

On 29.07.2025 16:07, Ilia Evdokimov wrote:

On 21.07.2025 16:55, Ilia Evdokimov wrote:

While analyzing planner performance on JOB with
default_statistics_target = 1000, I noticed that a significant portion
of planning time is spent inside the eqjoinsel() function. According
to perf, in most JOB queries at default_statistics_target = 1000,
eqjoinsel() is the most expensive function during planning, accounting
for approximately 8% of total CPU time. At default_statistics_target =
10000, the planner spend up to 75% of its time inside eqjoinsel(),
making it one of the primary bottlenecks.

This overhead is caused by the O(N^2) nested-loop comparison of MCVs
in var1 = var2 clauses.

Thanks for working on this. I've wanted to submit a patch for the very
same issue for a while. I've come across this issue multiple times in
the field.

I propose an optimization: when the column datatype supports
ordering(i.e., has < and >), we can sort both MCV lists and apply
mege-style algorithm to detect matches. This reduces runtime from
O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
also applies the same optimization to semi-join clauses, which show
similar performance behavior.

Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?

There are more problems like this in the planner. For example col IN
(many values) is also quadratic because for every value in the IN list
all MCVs are checked. It would be great to fix this as well.

--
David Geier

#7David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#3)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi!

On 03.09.2025 18:53, Ilia Evdokimov wrote:

Following up on my previous messages about optimizing eqjoinsel() and
eqjoinsel_semi() for Var1 = Var2 clauses, I’d like to share detailed
profiling results showing the effect of the patch on JOB for different
values of default_statistics_target.

The first table shows the total planner time (summed over all 113
queries) before and after applying the patch, along with the speedup
achieved:

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------+---------------------+---------------------
+--------------------
100  | *1.00x*       | 1828.433     |        1820.556
1000  | *1.12x*       | 2194.282     |        1963.110
2500  | *2.15x*       | 4606.705     |        2140.126
5000  | *6.37x*       | 16661.581     |

2616.109

7500 | *11.76x* | 35988.569 |
3061.161
10000 | *19.01x* | 66616.620 |
3504.144

It looks to me like these results are with optimizations disabled?

Can you share the SQL script you used for testing?

--
David Geier

#8David Geier
geidav.pg@gmail.com
In reply to: David Geier (#6)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi Ilia!

On 05.09.2025 16:03, David Geier wrote:

I propose an optimization: when the column datatype supports
ordering(i.e., has < and >), we can sort both MCV lists and apply
mege-style algorithm to detect matches. This reduces runtime from
O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
also applies the same optimization to semi-join clauses, which show
similar performance behavior.

Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?

I've tested your and my code with the following script:

CREATE TABLE foo(col TEXT);
CREATE TABLE bar(col TEXT);
SET default_statistics_target = 10000;

-- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
-- only a single value or every value has exactly the same cardinality.
DO $$
BEGIN
FOR i IN 1..10000 LOOP
FOR j IN 1..least(i, 50) LOOP
INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
END LOOP;
END LOOP;
END;
$$;

ANALYZE foo, bar;
\timing on
EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;

Results are:

- master: 433 ms
- Order+Merge: 11 ms
- Hash map: 4 ms

I've attached my draft patch.

--
David Geier

Attachments:

0001-Optimize-eqoinsel_inner-with-hash-table.patchtext/x-patch; charset=UTF-8; name=0001-Optimize-eqoinsel_inner-with-hash-table.patchDownload+159-22
#9Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#8)
Re: Use merge-based matching for MCVs in eqjoinsel

On 08.09.2025 13:08, David Geier wrote:

Hi Ilia!

On 05.09.2025 16:03, David Geier wrote:

I propose an optimization: when the column datatype supports
ordering(i.e., has < and >), we can sort both MCV lists and apply
mege-style algorithm to detect matches. This reduces runtime from
O(N^2) to O(NlogN), where N is the number of MCV entries. The patch
also applies the same optimization to semi-join clauses, which show
similar performance behavior.

Why do you sort both lists and then merge instead of putting the smaller
list into a hash map and then doing hash lookups (if the type is hashable)?

I've tested your and my code with the following script:

CREATE TABLE foo(col TEXT);
CREATE TABLE bar(col TEXT);
SET default_statistics_target = 10000;

-- Generate MCV values. PostgreSQL doesn't store MCVs if the table has
-- only a single value or every value has exactly the same cardinality.
DO $$
BEGIN
FOR i IN 1..10000 LOOP
FOR j IN 1..least(i, 50) LOOP
INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT);
INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT);
END LOOP;
END LOOP;
END;
$$;

ANALYZE foo, bar;
\timing on
EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col;

Results are:

- master: 433 ms
- Order+Merge: 11 ms
- Hash map: 4 ms

I've attached my draft patch.

--
David Geier

Hi David,

Thank you for reviewing.

I have read all the previous messages - and yes, you are right. I don’t
know why I didn’t consider using a hash table approach initially. Your
idea makes a lot of sense.

To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.

$ python3 planning_time.py master hash
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

$ python3 planning_time.py master merge
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418

Based on these results, I’d prefer the hash lookup implementation, so I
think it makes sense to improve your patch further and bring it into
good shape. Shall I take care of that, or would you prefer to do it
yourself?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Attachments:

benchmark.zipapplication/zip; name=benchmark.zipDownload
#10Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Ilia Evdokimov (#9)
Re: Use merge-based matching for MCVs in eqjoinsel

On 08.09.2025 13:35, Ilia Evdokimov wrote:

Based on these results, I’d prefer the hash lookup implementation, so
I think it makes sense to improve your patch further and bring it into
good shape. Shall I take care of that, or would you prefer to do it
yourself?

I realized I mistakenly copied the wrong results for the hash-map
version in my previous draft. Sorry about that. Here are the correct
benchmark results:

Merge

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418

Hash-Map

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#11David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#9)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi Ilia!

I have read all the previous messages - and yes, you are right. I don’t
know why I didn’t consider using a hash table approach initially. Your
idea makes a lot of sense.

Your solution would be beneficial on top, for cases where the data type
is not hashable. But I think that's overkill for a v1. I would start
with the hash-based version.

To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.

Was this running with optimizations? How did you extract the planning time?

$ python3 planning_time.py master hash
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

$ python3 planning_time.py master merge
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?

Based on these results, I’d prefer the hash lookup implementation, so I
think it makes sense to improve your patch further and bring it into
good shape. Shall I take care of that, or would you prefer to do it
yourself?

I think process-wise it's best if you review my code and I do the changes.

Could you as part of your review test tables with just a few MCVs to
make sure we're not regressing "small" cases? For now I'm only bailing
if one of the two MCV lists has just a single value. I'm expecting the
gains from fine tuning this value to be not measurable but let's double
check.

--
David Geier

#12David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#10)
Re: Use merge-based matching for MCVs in eqjoinsel

On 08.09.2025 12:45, Ilia Evdokimov wrote:

I realized I mistakenly copied the wrong results for the hash-map
version in my previous draft. Sorry about that. Here are the correct
benchmark results:

Merge

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1898.622
1000                      | 1.12                | 2286.922            |
2033.553
2500                      | 1.92                | 4647.167            |
2423.552
5000                      | 5.94                | 17964.779           |
3025.739
7500                      | 10.48               | 38622.443           |
3684.262
10000                     | 16.72               | 69538.085           |
4159.418

Hash-Map

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | 1.00                | 1892.627            |
1886.969
1000                      | 1.09                | 2286.922            |
2100.099
2500                      | 1.94                | 4647.167            |
2400.711
5000                      | 6.15                | 17964.779           |
2919.914
7500                      | 10.58               | 38622.443           |
3650.375
10000                     | 16.33               | 69538.085           |
4257.864

It still seems to me like something is fishy with the numbers or
something in the benchmark adds a lot over overhead so that small
differences in eqjoinsel_inner() don't show here.

The delta between "hash" and "merge" for default_statistics_target=10000
should be the biggest but it's actually slower.

--
David Geier

#13Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#11)
Re: Use merge-based matching for MCVs in eqjoinsel

On 08.09.2025 13:56, David Geier wrote:

To evaluate it, I ran benchmarks on JOB with three variants:

$ ./benchmark.sh master
$ ./benchmark.sh merge
$ ./benchmark.sh hash

I compared total planning time across all 113 queries.

Was this running with optimizations? How did you extract the planning time?

I save all query plans using EXPLAIN SUMMARY, then go through all the
plans, read the 'Planning Time' for each, and sum them up.

I would have expected the delta between the "merge" and "hash" variant
to be bigger, especially for default_statistics_target=10000. My small
test also showed that. Any idea why this is not showing in your results?

So would I. With default_statistics_target = 10000 and the selectivity
in the JOB queries being close to zero, the difference should be
noticeable. I can only explain the previous results by cache-related
effects on my machine.

I reran the benchmark on a clean cluster and collected the top slowest
JOB queries — now the effect is clearly visible.

Merge (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | *1.00*                | 1888.105           
| 1879.431
1000                      | *1.14*                | 2282.239           
| 2009.114
2500                      | *2.10*                | 5595.030           
| 2668.530
5000                      | *5.56*                | 18544.933          
| 3333.252
7500                      | *9.17*                | 37390.956          
| 4076.390
10000                     | *16.10*               | 69319.479          
| 4306.417

HashMap (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                     | *1.03*                | 1888.105            |
1828.088
1000                    | *1.18*                | 2282.239            |
1939.884
2500                    | *2.64*                | 5595.030            |
2117.872
5000                    | *7.80*                | 18544.933           |
2377.206
7500                    | *13.80*               | 37390.956           |
2709.973
10000                   | *23.32*               | 69319.479           |
2973.073

Top 10 slowest JOB queries (default_statistics_target = 10000)
Query | master (ms) | merge (ms) | Hash (ms)
------+-------------+------------+-----------
29c   | 1904.586    | 144.135    | 100.473
29b   | 1881.392    | 117.891    | 89.028
29a   | 1868.805    | 112.242    | 83.913
31c   | 1867.234    | 76.498     | 56.140
30c   | 1646.630    | 88.494     | 62.549
30b   | 1608.820    | 84.821     | 64.603
31a   | 1573.964    | 75.978     | 56.140
28a   | 1457.738    | 95.939     | 77.309
28b   | 1455.052    | 99.383     | 73.065
30a   | 1416.699    | 91.057     | 62.549

BTW, the hashmap from your patch could also be applied to
eqjoinsel_semi() function.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#14David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#13)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi!

On 08.09.2025 15:45, Ilia Evdokimov wrote:

I reran the benchmark on a clean cluster and collected the top slowest
JOB queries — now the effect is clearly visible.

Merge (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                       | *1.00*                | 1888.105           
| 1879.431
1000                      | *1.14*                | 2282.239           
| 2009.114
2500                      | *2.10*                | 5595.030           
| 2668.530
5000                      | *5.56*                | 18544.933          
| 3333.252
7500                      | *9.17*                | 37390.956          
| 4076.390
10000                     | *16.10*               | 69319.479          
| 4306.417

HashMap (sum of all JOB queries)
==================
default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
--------------------------------------------------------------------------------
100                     | *1.03*                | 1888.105            |
1828.088
1000                    | *1.18*                | 2282.239            |
1939.884
2500                    | *2.64*                | 5595.030            |
2117.872
5000                    | *7.80*                | 18544.933           |
2377.206
7500                    | *13.80*               | 37390.956           |
2709.973
10000                   | *23.32*               | 69319.479           |
2973.073

Top 10 slowest JOB queries (default_statistics_target = 10000)
Query | master (ms) | merge (ms) | Hash (ms)
------+-------------+------------+-----------
29c   | 1904.586    | 144.135    | 100.473
29b   | 1881.392    | 117.891    | 89.028
29a   | 1868.805    | 112.242    | 83.913
31c   | 1867.234    | 76.498     | 56.140
30c   | 1646.630    | 88.494     | 62.549
30b   | 1608.820    | 84.821     | 64.603
31a   | 1573.964    | 75.978     | 56.140
28a   | 1457.738    | 95.939     | 77.309
28b   | 1455.052    | 99.383     | 73.065
30a   | 1416.699    | 91.057     | 62.549

This looks much better. Very nice!

BTW, the hashmap from your patch could also be applied to
eqjoinsel_semi() function.

Yep. The inner loop only runs until clamped_nvalues2 and it doesn't
compute matchprodfreq. I'll try to modify the patch such that it
accounts for these differences without being too hard to read.

Do you think anything else needs changes in the patch? Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

--
David Geier

#15Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#14)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi David,

On 08.09.2025 17:36, David Geier wrote:

Do you think anything else needs changes in the patch?

From an architectural perspective, I think the patch is already in good
shape. However, I have some suggestions regarding code style:

1. I would move McvHashEntry, McvHashContext, he new hash table
definition, hash_mcv and are_mcvs_equal to the top.
2. I’m not sure get_hash_func_oid() is needed at all – it seems we
could do without it.
3. It would be better to name the parameters matchProductFrequencies ->
matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().
4. As I wrote earlier, since we now have a dedicated function
eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based
search was factored out, maybe it would make sense to also factor
out the O(N^2) nested loop implementation?
5. I think it would be helpful to add a comment explaining that using a
hash table is not efficient when the MCV array length equals 1:

if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
    return false;

Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

Sure. I need some time to check this.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#16David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#15)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi!

Thanks for the review.

On 09.09.2025 09:29, Ilia Evdokimov wrote:

From an architectural perspective, I think the patch is already in good
shape. However, I have some suggestions regarding code style:

1. I would move McvHashEntry, McvHashContext, he new hash table
   definition, hash_mcv and are_mcvs_equal to the top.

Done. I've also moved up try_eqjoinsel_with_hashtable().

2. I’m not sure get_hash_func_oid() is needed at all – it seems we
   could do without it.

Removed.

3. It would be better to name the parameters matchProductFrequencies ->
   matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1,
   hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable().

Done.

4. As I wrote earlier, since we now have a dedicated function
   eqjoinsel_inner_with_hashtable(), perhaps it could be used in both
   eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based

Done.

The gains for SEMI join are even bigger because the function is called 3
times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM
bar b where f.col = b.col); For that query the planning time for me goes
from ~1300 ms -> 12 ms.

   search was factored out, maybe it would make sense to also factor
   out the O(N^2) nested loop implementation?

Generally agreed and while tempting, I've refrained from doing the
refactoring. Let's better do this in a separate patch to keep things simple.

5. I think it would be helpful to add a comment explaining that using a
   hash table is not efficient when the MCV array length equals 1:

if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1)
    return false;

Done.

Did you have a
chance to check tables with just few MCVs or are there any queries in
the JOB which regress with very small default_statistics_target?

Sure. I need some time to check this.

Could you please do that with the latest attached patch so that we test
it once more?

Could you also run once more the JOB benchmark to get some test coverage
on the SEMI join code (assuming it also uses SEMI joins)?

Once we've concluded on above and there are no objections, I will
register this patch in the commit fest.

--
David Geier

Attachments:

0002-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patchtext/x-patch; charset=UTF-8; name=0002-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patchDownload+190-41
#17Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#16)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi!

On 09.09.2025 12:22, David Geier wrote:

Could you please do that with the latest attached patch so that we test
it once more?

LGTM. Yes, I'll test this patch.

Could you also run once more the JOB benchmark to get some test coverage
on the SEMI join code (assuming it also uses SEMI joins)?

Unfortunately, the JOB benchmark does not contain semi join nodes.
However, TPC-DS does. I'll look for the queries with slowest planner
times there and check them.

I'll need some time to check both join and semi join cases with small
and large default_statistics_target. I'll share the results later.

Once we've concluded on above and there are no objections, I will
register this patch in the commit fest.

Sure. No problem.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#18Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Ilia Evdokimov (#17)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi hackers,

On 10.09.2025 16:56, Ilia Evdokimov wrote:

Unfortunately, the JOB benchmark does not contain semi join nodes.
However, TPC-DS does. I'll look for the queries with slowest planner
times there and check them.

I'll need some time to check both join and semi join cases with small
and large default_statistics_target. I'll share the results later.

JOIN
==============================

I’ve benchmarked the new implementation of eqjoinsel() with different
values of default_statistics_target. On small targets (1, 5, 10, 25, 50,
75, 100) the results are all within statistical noise, and I did not
observe any regressions. In my view, it’s reasonable to keep the current
condition that the hash table is not used for default_statistics_target
= 1. Raising that threshold does not seem useful.

Here are the results for JOB queries (where the effect of semi join is
not visible due to different data distributions):

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
------------------------------------------------------------------------------------------
1                         | 1.00                | 1846.643            |
1847.409
5                         | 1.00                | 1836.391            |
1828.318
10                        | 0.95                | 1841.750            |
1929.722
25                        | 0.99                | 1873.172            |
1890.741
50                        | 0.98                | 1869.897            |
1898.470
75                        | 1.02                | 1969.368            |
1929.521
100                       | 0.97                | 1857.890            |
1921.207
1000                      | 1.14                | 2279.700            |
1997.102
2500                      | 1.78                | 4682.658            |
2636.202
5000                      | 6.45                | 15943.696           |
2471.242
7500                      | 12.45               | 34350.855           |
2758.565
10000                     | 20.52               | 62519.342           |
3046.819

SEMI JOIN
==============================

Unfortunately, in TPC-DS it is not possible to clearly see improvements
for semi joins. To address this, I designed a synthetic example where
the data distribution forces the loop to run fully, without exiting
early, which makes the effect on semi joins more visible. In this setup,
I also ensured that the length of the MCV array is equal to the chosen
default_statistics_target.

CREATE TABLE t1 AS
SELECT CASE
         WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 1
         ELSE (g % 1000000) + 10000
       END AS id
FROM generate_series(1, 3000000) g;

CREATE TABLE t2 AS
SELECT CASE
         WHEN g <= 3000000 * 0.9 THEN (g % 10000) + 10001
         ELSE (g % 1000000) + 20000
       END AS id
FROM generate_series(1, 3000000) g;

ANALYZE t1, t2;

The results of the query are:

SELECT * FROM t1
WHERE id IN (SELECT id FROM t2);

default_statistics_target | Planner Speedup (×) | Planner Before (ms) |
Planner After (ms)
------------------------------------------------------------------------------------------
1                         | 1.12                | 1.191               |
1.062
5                         | 1.02                | 0.493               |
0.481
10                        | 0.92                | 0.431               |
0.471
25                        | 1.27                | 0.393               |
0.309
50                        | 1.04                | 0.432               |
0.416
75                        | 0.96                | 0.398               |
0.415
100                       | 0.95                | 0.450               |
0.473
1000                      | 9.42                | 6.742               |
0.716
2500                      | 19.15               | 21.621              |
1.129
5000                      | 46.74               | 85.667              |
1.833
7500                      | 73.26               | 194.806             |
2.659
10000                     | 107.95              | 349.981             |
3.242

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#19Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#16)
Re: Use merge-based matching for MCVs in eqjoinsel

Hi David,

In v2 patch, when the join is reversed we pass the commutator operator
Oid to eqjoinsel_semi(), and inside that function we immediately call
get_opcode(<commutator operator Oid>). Did you mean for the function to
take an operator Oid instead of an here?

If that was unintentional, perhaps the cleanest fix is to add a new
'operator' parameter to eqjoinsel_semi() so we can keep passing
'opfuncoid' as before and avoid changing the behavior.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

#20Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Ilia Evdokimov (#19)
Re: Use merge-based matching for MCVs in eqjoinsel

On 17.09.2025 12:40, Ilia Evdokimov wrote:

Hi David,

In v2 patch, when the join is reversed we pass the commutator operator
Oid to eqjoinsel_semi(), and inside that function we immediately call
get_opcode(<commutator operator Oid>). Did you mean for the function
to take an operator Oid instead of an here?

If that was unintentional, perhaps the cleanest fix is to add a new
'operator' parameter to eqjoinsel_semi() so we can keep passing
'opfuncoid' as before and avoid changing the behavior.

This v3 patch fixes the confusion between operator and function Oids in
eqjoinsel_semi(). This version restores the previous behavior by keeping
the function Oid as before and adds an explicit 'operator' parameter so
both values are available without extra behavior changes.

Do you have any further comments or suggestions on this version?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com

Attachments:

v3-0001-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patchtext/x-patch; charset=UTF-8; name=v3-0001-Optimize-eqjoinsel_inner-and-eqjoinsel_semi.patchDownload+193-41
#21David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#17)
#22Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ilia Evdokimov (#22)
#24Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ilia Evdokimov (#24)
#26David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#20)
#27Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: Tom Lane (#25)
#28David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#20)
#29David Geier
geidav.pg@gmail.com
In reply to: Ilia Evdokimov (#18)
#30Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ilia Evdokimov (#27)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#31)
#33David Geier
geidav.pg@gmail.com
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Geier (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#34)
#36David Geier
geidav.pg@gmail.com
In reply to: Tom Lane (#35)
#37Ilia Evdokimov
ilya.evdokimov@tantorlabs.com
In reply to: David Geier (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Geier (#36)