Add a greedy join search algorithm to handle large join problems

Started by Chengpeng Yan8 months ago42 messageshackers

chengpeng_yan@outlook.com

8 months ago

Hi hackers,

This patch implements GOO (Greedy Operator Ordering), a greedy
join-order search method for large join problems, based on Fegaras (DEXA
’98) [1]Leonidas Fegaras, “A New Heuristic for Optimizing Large Queries”, DEXA ’98. ACM Digital Library: https://dl.acm.org/doi/10.5555/648311.754892 A publicly accessible PostScript version is available from the author's page: https://lambda.uta.edu/order.ps. The algorithm repeatedly selects, among all legal joins, the
join pair with the lowest estimated total cost, merges them, and
continues until a single join remains. Patch attached.

To get an initial sense of performance, I reused the star join /
snowflake examples and the testing script from the thread in [2]Star/snowflake join thread and benchmarks: /messages/by-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me. The
star-join GUC in that SQL workload was replaced with
`enable_goo_join_search`, so the same tests can run under DP (standard
dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
GOO. Other planner settings, including join_collapse_limit, remained at
their defaults.

On my local machine, a single-client pgbench run produces the following
throughput (tps):

| DP | GEQO | GOO
--------------------+----------+----------+-----------
starjoin (inner) | 1762.52 | 192.13 | 6168.89
starjoin (outer) | 1683.92 | 173.90 | 5626.56
snowflake (inner) | 1829.04 | 133.40 | 3929.57
snowflake (outer) | 1397.93 | 99.65 | 3040.52

Open questions:
* The paper ranks joins by estimated result size, while this prototype
reuses the existing planner total_cost. This makes the implementation
straightforward, but it may be worth exploring row-based rule.
* The prototype allocates through multiple memory contexts, aiming to
reduce memory usage during candidate evaluation. Suggestions on
simplification are welcome.
* Planning performance vs GEQO: On the star/snowflake benchmarks above,
GOO reduces planning time significantly compared to GEQO. Broader
evaluation on more representative workloads (e.g., TPC-H / TPC-DS /
JOB) is TODO, covering both planning speed and plan quality.
* There is no tuning knob like `geqo_seed` for GEQO if GOO produces a
poor ordering.
* The long-term integration is open:
* fully replace GEQO,
* keep GEQO available and optionally try both,
* or use GOO as a starting point for GEQO.

Comments and review would be appreciated.

References:
[1]: Leonidas Fegaras, “A New Heuristic for Optimizing Large Queries”, DEXA ’98. ACM Digital Library: https://dl.acm.org/doi/10.5555/648311.754892 A publicly accessible PostScript version is available from the author's page: https://lambda.uta.edu/order.ps
DEXA ’98. ACM Digital Library:
https://dl.acm.org/doi/10.5555/648311.754892 A publicly accessible
PostScript version is available from the author's page:
https://lambda.uta.edu/order.ps
[2]: Star/snowflake join thread and benchmarks: /messages/by-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me
/messages/by-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me

--
Best regards,
Chengpeng Yan

Dilip Kumar

dilipbalaut@gmail.com

8 months ago

In reply to: Chengpeng Yan (#1)

Re: Add a greedy join search algorithm to handle large join problems

On Tue, Dec 2, 2025 at 9:18 AM Chengpeng Yan <chengpeng_yan@outlook.com> wrote:

Hi hackers,

This patch implements GOO (Greedy Operator Ordering), a greedy
join-order search method for large join problems, based on Fegaras (DEXA
’98) [1]. The algorithm repeatedly selects, among all legal joins, the
join pair with the lowest estimated total cost, merges them, and
continues until a single join remains. Patch attached.

Interesting.

To get an initial sense of performance, I reused the star join /
snowflake examples and the testing script from the thread in [2]. The
star-join GUC in that SQL workload was replaced with
`enable_goo_join_search`, so the same tests can run under DP (standard
dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
GOO. Other planner settings, including join_collapse_limit, remained at
their defaults.

On my local machine, a single-client pgbench run produces the following
throughput (tps):

| DP | GEQO | GOO
--------------------+----------+----------+-----------
starjoin (inner) | 1762.52 | 192.13 | 6168.89
starjoin (outer) | 1683.92 | 173.90 | 5626.56
snowflake (inner) | 1829.04 | 133.40 | 3929.57
snowflake (outer) | 1397.93 | 99.65 | 3040.52

Is pgbench the right workload to test this, I mean what are we trying
to compare here the planning time taken by DP vs GEQO vs GOO or the
quality of the plans generated by different join ordering algorithms
or both? All pgbench queries are single table scans and there is no
involvement of the join search, so I am not sure how we can justify
these gains?

--
Regards,
Dilip Kumar
Google

Chengpeng Yan

chengpeng_yan@outlook.com

8 months ago

In reply to: Dilip Kumar (#2)

Re: Add a greedy join search algorithm to handle large join problems

Hi,

Thanks for taking a look.

On Dec 2, 2025, at 13:36, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Is pgbench the right workload to test this, I mean what are we trying
to compare here the planning time taken by DP vs GEQO vs GOO or the
quality of the plans generated by different join ordering algorithms
or both? All pgbench queries are single table scans and there is no
involvement of the join search, so I am not sure how we can justify
these gains?

Just to clarify: as noted in the cover mail, the numbers are not from
default pgbench queries, but from the star-join / snowflake workloads in
thread [1]Star/snowflake join thread and benchmarks: /messages/by-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me, using the benchmark included in the v5-0001 patch. These
workloads contain multi-table joins and do trigger join search; you can
reproduce them by configuring the GUCs as described in the cover mail.

The benchmark tables contain no data, so execution time is negligible;
the results mainly reflect planning time of the different join-ordering
methods, which is intentional for this microbenchmark.

A broader evaluation on TPC-H / TPC-DS / JOB is TODO, covering both
planning time and plan quality. That should provide a more
representative picture of GOO, beyond this synthetic setup.

References:
[1]: Star/snowflake join thread and benchmarks: /messages/by-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me
/messages/by-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me

--
Best regards,
Chengpeng Yan

Tomas Vondra

tomas.vondra@2ndquadrant.com

8 months ago

In reply to: Chengpeng Yan (#1)

Re: Add a greedy join search algorithm to handle large join problems

On 12/2/25 04:48, Chengpeng Yan wrote:

Hi hackers,

This patch implements GOO (Greedy Operator Ordering), a greedy
join-order search method for large join problems, based on Fegaras (DEXA
’98) [1]. The algorithm repeatedly selects, among all legal joins, the
join pair with the lowest estimated total cost, merges them, and
continues until a single join remains. Patch attached.

To get an initial sense of performance, I reused the star join /
snowflake examples and the testing script from the thread in [2]. The
star-join GUC in that SQL workload was replaced with
`enable_goo_join_search`, so the same tests can run under DP (standard
dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
GOO. Other planner settings, including join_collapse_limit, remained at
their defaults.

On my local machine, a single-client pgbench run produces the following
throughput (tps):

                    |    DP    |   GEQO   |    GOO
--------------------+----------+----------+-----------
starjoin    (inner) | 1762.52 | 192.13 | 6168.89
starjoin    (outer) | 1683.92 | 173.90 | 5626.56
snowflake   (inner) | 1829.04 | 133.40 | 3929.57
snowflake   (outer) | 1397.93 |   99.65 | 3040.52

Seems interesting, and also much more ambitious than what I intended to
do in the starjoin thread (which is meant to be just a simplistic
heuristics on top of the regular join order planning).

I think a much broader evaluation will be needed, comparing not just the
planning time, but also the quality of the final plan. Which for the
starjoin tests does not really matter, as the plans are all equal in
this regard.

regards

--
Tomas Vondra

Chengpeng Yan

chengpeng_yan@outlook.com

8 months ago

In reply to: Tomas Vondra (#4)

Re: Add a greedy join search algorithm to handle large join problems

Hi,

On Dec 2, 2025, at 18:56, Tomas Vondra <tomas@vondra.me> wrote:

I think a much broader evaluation will be needed, comparing not just the
planning time, but also the quality of the final plan. Which for the
starjoin tests does not really matter, as the plans are all equal in
this regard.

Many thanks for your feedback.

You are absolutely right — plan quality is also very important. In my
initial email I only showed the improvements in planning time, but did
not provide results regarding plan quality. I will run tests on more
complex join scenarios, evaluating both planning time and plan quality.

Thanks again!
--
Best regards,
Chengpeng Yan

Tomas Vondra

tomas.vondra@2ndquadrant.com

8 months ago

In reply to: Chengpeng Yan (#5)

Re: Add a greedy join search algorithm to handle large join problems

On 12/2/25 14:04, Chengpeng Yan wrote:

Hi,

On Dec 2, 2025, at 18:56, Tomas Vondra <tomas@vondra.me> wrote:

I think a much broader evaluation will be needed, comparing not just the
planning time, but also the quality of the final plan. Which for the
starjoin tests does not really matter, as the plans are all equal in
this regard.

Many thanks for your feedback.

You are absolutely right — plan quality is also very important. In my
initial email I only showed the improvements in planning time, but did
not provide results regarding plan quality. I will run tests on more
complex join scenarios, evaluating both planning time and plan quality.

I was trying to do some simple experiments by comparing plans for TPC-DS
queries, but unfortunately I get a lot of crashes with the patch. All
the backtraces look very similar - see the attached example. The root
cause seems to be that sort_inner_and_outer() sees

inner_path = NULL

I haven't investigated this very much, but I suppose the GOO code should
be calling set_cheapest() from somewhere.

regards

--
Tomas Vondra

Tomas Vondra

tomas.vondra@2ndquadrant.com

8 months ago

In reply to: Tomas Vondra (#6)

Re: Add a greedy join search algorithm to handle large join problems

On 12/9/25 20:20, Tomas Vondra wrote:

On 12/2/25 14:04, Chengpeng Yan wrote:

Hi,

On Dec 2, 2025, at 18:56, Tomas Vondra <tomas@vondra.me> wrote:

I think a much broader evaluation will be needed, comparing not just the
planning time, but also the quality of the final plan. Which for the
starjoin tests does not really matter, as the plans are all equal in
this regard.

Many thanks for your feedback.

You are absolutely right — plan quality is also very important. In my
initial email I only showed the improvements in planning time, but did
not provide results regarding plan quality. I will run tests on more
complex join scenarios, evaluating both planning time and plan quality.

I was trying to do some simple experiments by comparing plans for TPC-DS
queries, but unfortunately I get a lot of crashes with the patch. All
the backtraces look very similar - see the attached example. The root
cause seems to be that sort_inner_and_outer() sees

inner_path = NULL

I haven't investigated this very much, but I suppose the GOO code should
be calling set_cheapest() from somewhere.

FWIW after looking at the failing queries for a bit, and a bit of
tweaking, it seems the issue is about aggregates in the select list. For
example this TPC-DS query fails (Q7):

select i_item_id,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, date_dim, item, promotion
where ss_sold_date_sk = d_date_sk and
ss_item_sk = i_item_sk and
ss_cdemo_sk = cd_demo_sk and
ss_promo_sk = p_promo_sk and
cd_gender = 'F' and
cd_marital_status = 'W' and
cd_education_status = 'Primary' and
(p_channel_email = 'N' or p_channel_event = 'N') and
d_year = 1998
group by i_item_id
order by i_item_id
LIMIT 100;

but if I remove the aggregates, it plans just fine:

select i_item_id
from store_sales, customer_demographics, date_dim, item, promotion
where ss_sold_date_sk = d_date_sk and
ss_item_sk = i_item_sk and
ss_cdemo_sk = cd_demo_sk and
ss_promo_sk = p_promo_sk and
cd_gender = 'F' and
cd_marital_status = 'W' and
cd_education_status = 'Primary' and
(p_channel_email = 'N' or p_channel_event = 'N') and
d_year = 1998
group by i_item_id
order by i_item_id
LIMIT 100;

The backtrace matches the one I already posted, I'm not going to post
that again.

I looked at a couple more failing queries, and removing the aggregates
fixes them too. Maybe there are other issues/crashes, of course.

regards

--
Tomas Vondra

Chengpeng Yan

chengpeng_yan@outlook.com

8 months ago

In reply to: Tomas Vondra (#7)

Re: Add a greedy join search algorithm to handle large join problems

Hi,

On Dec 10, 2025, at 07:30, Tomas Vondra <tomas@vondra.me> wrote:

I looked at a couple more failing queries, and removing the aggregates
fixes them too. Maybe there are other issues/crashes, of course.

Thanks a lot for pointing this out. I also noticed the same issue when
testing TPC-H Q5. The root cause was that in the goo algorithm I forgot
to handle the case of eager aggregation. This has been fixed in the v2
patch (after the fix, the v2 version works correctly for all TPC-H
queries). I will continue testing it on TPC-DS as well.

Sorry that I didn’t push the related changes earlier. I was running more
experiments on the greedy strategy, and I still needed some time to
organize and analyze the results. During this process, I found that the
current greedy strategy may lead to suboptimal plan quality in some
cases. Because of that, I plan to first evaluate a few basic greedy
heuristics on TPC-H to understand their behavior and limitations. If
there are meaningful results or conclusions, I will share them for
discussion.

Based on some preliminary testing, I’m leaning toward keeping the greedy
strategy simple and explainable, and focusing on the following
indicators together with the planner’s cost estimates:
* join cardinality (number of output rows)
* selectivity (join_size / (left_size * right_size))
* estimated result size in bytes(joinrel->reltarget->width * joinrel->rows)
* cheapest total path cost (cheapest_total_path->total_cost)

At the moment, I’m inclined to prioritize join cardinality with input
size as a secondary factor, but I’d like to validate this step by step:
first by testing these simple heuristics on TPC-H (as a relatively
simple workload) and summarizing some initial conclusions there. After
that, I plan to run more comprehensive experiments on more complex
benchmarks such as JOB and TPC-DS and report the results.

Do you have any thoughts or suggestions on this direction?

Thanks again for your feedback and help.

--
Best regards,
Chengpeng Yan

Tomas Vondra

tomas.vondra@2ndquadrant.com

8 months ago

In reply to: Chengpeng Yan (#8)

Re: Add a greedy join search algorithm to handle large join problems

On 12/10/25 06:20, Chengpeng Yan wrote:

Hi,

On Dec 10, 2025, at 07:30, Tomas Vondra <tomas@vondra.me> wrote:

I looked at a couple more failing queries, and removing the aggregates
fixes them too. Maybe there are other issues/crashes, of course.

Thanks a lot for pointing this out. I also noticed the same issue when
testing TPC-H Q5. The root cause was that in the goo algorithm I forgot
to handle the case of eager aggregation. This has been fixed in the v2
patch (after the fix, the v2 version works correctly for all TPC-H
queries). I will continue testing it on TPC-DS as well.

I can confirm v2 makes it work for planning all 99 TPC-DS queries, i.e.
there are no more crashes during EXPLAIN.

Comparing the plans from master/geqo and master/goo, I see about 80% of
them changed. I haven't done any evaluation of how good the plans are,
really. I'll see if I can get some numbers next, but it'll take a while.

It's also tricky as plan choices depend on GUCs like random_page_cost,
and if those values are not good enough, the optimizer may still end up
with a bad plan. I'm not sure what's the best approach.

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master: 8s
master/geqo: 20s
master/goo: 5s

Where master/geqo means

geqo_threshold=2

and master/goo means

geqo_threshold=2
enable_goo_join_search = on

It's nice that "goo" seems to be faster than "geqo" - assuming the plans
are comparable or better. But it surprised me switching to geqo makes it
slower than master. That goes against my intuition that geqo is meant to
be cheaper/faster join order planning. But maybe I'm missing something.

Sorry that I didn’t push the related changes earlier. I was running more
experiments on the greedy strategy, and I still needed some time to
organize and analyze the results. During this process, I found that the
current greedy strategy may lead to suboptimal plan quality in some
cases. Because of that, I plan to first evaluate a few basic greedy
heuristics on TPC-H to understand their behavior and limitations. If
there are meaningful results or conclusions, I will share them for
discussion.

Based on some preliminary testing, I’m leaning toward keeping the greedy
strategy simple and explainable, and focusing on the following
indicators together with the planner’s cost estimates:
* join cardinality (number of output rows)
* selectivity (join_size / (left_size * right_size))
* estimated result size in bytes(joinrel->reltarget->width * joinrel->rows)
* cheapest total path cost (cheapest_total_path->total_cost)

At the moment, I’m inclined to prioritize join cardinality with input
size as a secondary factor, but I’d like to validate this step by step:
first by testing these simple heuristics on TPC-H (as a relatively
simple workload) and summarizing some initial conclusions there. After
that, I plan to run more comprehensive experiments on more complex
benchmarks such as JOB and TPC-DS and report the results.

Do you have any thoughts or suggestions on this direction?

Thanks again for your feedback and help.

No opinion.

IIUC it's a simplified heuristics, replacing the "full" join planning
algorithm. So inevitably there will be cases when it produces plans that
are "worse" than the actual join order planning. I don't have a great
intuition what's the right trade off yet. Or am I missing something?

regards

--
Tomas Vondra

#10

John Naylor

john.naylor@enterprisedb.com

8 months ago

In reply to: Tomas Vondra (#9)

Re: Add a greedy join search algorithm to handle large join problems

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me> wrote:

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master: 8s
master/geqo: 20s
master/goo: 5s

It's nice that "goo" seems to be faster than "geqo" - assuming the plans
are comparable or better. But it surprised me switching to geqo makes it
slower than master. That goes against my intuition that geqo is meant to
be cheaper/faster join order planning. But maybe I'm missing something.

Yeah, that was surprising. It seems that geqo has a large overhead, so
it takes a larger join problem for the asymptotic behavior to win over
exhaustive search.

--
John Naylor
Amazon Web Services

#11

Pavel Stehule

pavel.stehule@gmail.com

8 months ago

In reply to: John Naylor (#10)

Re: Add a greedy join search algorithm to handle large join problems

čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com>
napsal:

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me> wrote:

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master: 8s
master/geqo: 20s
master/goo: 5s

It's nice that "goo" seems to be faster than "geqo" - assuming the plans
are comparable or better. But it surprised me switching to geqo makes it
slower than master. That goes against my intuition that geqo is meant to
be cheaper/faster join order planning. But maybe I'm missing something.

Yeah, that was surprising. It seems that geqo has a large overhead, so
it takes a larger join problem for the asymptotic behavior to win over
exhaustive search.

If I understand correctly to design - geqo should be slower for any queries
with smaller complexity. The question is how many queries in the tested
model are really complex.

Show quoted text

--
John Naylor
Amazon Web Services

#12

Tomas Vondra

tomas.vondra@2ndquadrant.com

8 months ago

In reply to: Pavel Stehule (#11)

Re: Add a greedy join search algorithm to handle large join problems

On 12/11/25 07:12, Pavel Stehule wrote:

čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com
<mailto:johncnaylorls@gmail.com>> napsal:

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me
<mailto:tomas@vondra.me>> wrote:

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master: 8s
master/geqo: 20s
master/goo: 5s

It's nice that "goo" seems to be faster than "geqo" - assuming the

plans

are comparable or better. But it surprised me switching to geqo

makes it

slower than master. That goes against my intuition that geqo is

meant to

be cheaper/faster join order planning. But maybe I'm missing

something.

Yeah, that was surprising. It seems that geqo has a large overhead, so
it takes a larger join problem for the asymptotic behavior to win over
exhaustive search.

If I understand correctly to design - geqo should be slower for any
queries with smaller complexity. The question is how many queries in the
tested model are really complex.

Depends on what you mean by "really complex". TPC-DS queries are not
trivial, but the complexity may not be in the number of joins.

Of course, setting geqo_threshold to 2 may be too aggressive. Not sure.

regards

--
Tomas Vondra

#13

Tomas Vondra

tomas.vondra@2ndquadrant.com

8 months ago

In reply to: Tomas Vondra (#9)

Re: Add a greedy join search algorithm to handle large join problems

Hi,

Here's a more complete set of results from a TPC-DS run. See the
run-queries-2.sh script for more details. There are also .sql files with
DDL to create the database, etc. It does not include the parts to
generate the data etc. (you'll need to the generator from TPC site).

The attached CSV has results for scales 1 and 10, with 0 and 4 parallel
workers. It runs three configurations:

- master (geqo=off, threshold=12)
- master-geqo (goo=off, threshold=2)
- master-goo (goo=on, threshold=2)

There's a couple more fields, e.g. whether it's cold/cached run, etc.

A very simple summary of the results is the total duration of the run,
for all 99 queries combined:

scale workers caching master master-geqo master-goo
===================================================================
1 0 cold 816 399 1124
warm 784 369 1097
4 cold 797 384 1085
warm 774 366 1069
-------------------------------------------------------------------
10 0 cold 2760 2653 2340
warm 2580 2470 2177
4 cold 2563 2423 1969
warm 2439 2325 1859

This is interesting, and also a bit funny.

The funny part is that geqo seems to do better than master - on scale 1
it's pretty clear, on scale 10 the difference is much smaller.

The interesting part is that "goo" is doing worse than master (or geqo)
on scale 1, and better on scale 10. I wonder how would it do on larger
scales, but I don't have such results.

There's a PDF with per-query results too.

This may be a little bit misleading because the statement timeout was
set to 300s, and there's a couple queries that did not complete before
this timeout. Maybe it'd be better to not include these queries. I
haven't tried, though.

It might be interesting to look at some of the queries that got worse,
and check why. Maybe that'd help you with picking the heuristics?

FWIW I still think no heuristics can be perfect, so getting slower plans
for some queries should not be treated as "hard failure". The other
thing is the quality of plans depends on GUCs like random_page_cost, and
I kept them at default values.

Anyway, I hope this is helpful input.

regards

--
Tomas Vondra

#14

Pavel Stehule

pavel.stehule@gmail.com

8 months ago

In reply to: Tomas Vondra (#12)

Re: Add a greedy join search algorithm to handle large join problems

čt 11. 12. 2025 v 18:07 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

On 12/11/25 07:12, Pavel Stehule wrote:

čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com
<mailto:johncnaylorls@gmail.com>> napsal:

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me
<mailto:tomas@vondra.me>> wrote:

I did however notice an interesting thing - running EXPLAIN on the

99

queries (for 3 scales and 0/4 workers, so 6x 99) took this much

time:

master: 8s
master/geqo: 20s
master/goo: 5s

It's nice that "goo" seems to be faster than "geqo" - assuming the

plans

are comparable or better. But it surprised me switching to geqo

makes it

slower than master. That goes against my intuition that geqo is

meant to

be cheaper/faster join order planning. But maybe I'm missing

something.

Yeah, that was surprising. It seems that geqo has a large overhead,

so

it takes a larger join problem for the asymptotic behavior to win

over

exhaustive search.

If I understand correctly to design - geqo should be slower for any
queries with smaller complexity. The question is how many queries in the
tested model are really complex.

Depends on what you mean by "really complex". TPC-DS queries are not
trivial, but the complexity may not be in the number of joins.

Of course, setting geqo_threshold to 2 may be too aggressive. Not sure.

I checked the TPC-H queries and almost all queries are simple - 5 x JOIN --
2x nested subselect

Show quoted text

regards

--
Tomas Vondra

#15

Chengpeng Yan

chengpeng_yan@outlook.com

8 months ago

In reply to: Tomas Vondra (#9)

Re: Add a greedy join search algorithm to handle large join problems

On Dec 10, 2025, at 18:20, Tomas Vondra <tomas@vondra.me> wrote:

I can confirm v2 makes it work for planning all 99 TPC-DS queries, i.e.
there are no more crashes during EXPLAIN.

Thanks a lot for testing this — much appreciated.

It's also tricky as plan choices depend on GUCs like random_page_cost,
and if those values are not good enough, the optimizer may still end up
with a bad plan. I'm not sure what's the best approach.

I agree that many other cost-related parameters can influence the join
order. For now, I’m still running my tests with the default settings,
and I’m not entirely sure how large the impact of those cost parameters
will be. This is something I’ll probably consider at a later stage.

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master: 8s
master/geqo: 20s
master/goo: 5s

It's nice that "goo" seems to be faster than "geqo" - assuming the plans
are comparable or better.

One additional advantage of goo is that its memory usage should be
better than DP/GEQO (though I haven’t done any measurements yet).
But I don’t think this is the main concern at the moment — I’m just
mentioning it briefly here. What matters more right now is the plan
quality itself.

On Dec 12, 2025, at 01:30, Tomas Vondra <tomas@vondra.me> wrote:

A very simple summary of the results is the total duration of the run,
for all 99 queries combined:

scale workers caching master master-geqo master-goo
===================================================================
1 0 cold 816 399 1124
warm 784 369 1097
4 cold 797 384 1085
warm 774 366 1069
-------------------------------------------------------------------
10 0 cold 2760 2653 2340
warm 2580 2470 2177
4 cold 2563 2423 1969
warm 2439 2325 1859

This is interesting, and also a bit funny.

The funny part is that geqo seems to do better than master - on scale 1
it's pretty clear, on scale 10 the difference is much smaller.

The interesting part is that "goo" is doing worse than master (or geqo)
on scale 1, and better on scale 10. I wonder how would it do on larger
scales, but I don't have such results.

It’s interesting to see that goo performs better at scale 10. I’ll try
to dig into the results and understand the reasons behind this in
follow-up work.

It might be interesting to look at some of the queries that got worse,
and check why. Maybe that'd help you with picking the heuristics?

FWIW I still think no heuristics can be perfect, so getting slower plans
for some queries should not be treated as "hard failure".

I agree that no set of heuristics can be perfect. The best we can do is
to look at existing workloads and understand why certain heuristics
don’t work there, and then evolve them toward strategies that are more
general and more aligned with the nature of joins themselves. There
doesn’t seem to be a silver bullet here — it really comes down to
analyzing specific SQL queries case by case.

Finally, thank you very much for the testing you did and for all the
insights you shared — it helped a lot.

--
Best regards,
Chengpeng Yan

#16

Chengpeng Yan

chengpeng_yan@outlook.com

7 months ago

In reply to: Chengpeng Yan (#15)

Re: Add a greedy join search algorithm to handle large join problems

Recently, I have been testing the TPC-H SF=1 dataset using four simple
greedy join-ordering strategies: join cardinality (estimated output
rows), selectivity, estimated result size in bytes, and cheapest total
path cost. These can be roughly seen as either output-oriented
heuristics (rows / selectivity / result size), which try to optimize the
shape of intermediate results, or a cost-oriented heuristic, which
prefers the locally cheapest join step.

The main goal of these experiments is to check whether the current
greedy rules show obvious structural weaknesses, and to use the observed
behavior as input for thinking about how a greedy rule might evolve.
While there is unlikely to be a perfect greedy strategy, I am hoping to
identify approaches that behave reasonably well across many cases and
avoid clear pathological behavior.

In the attached files, v3-0001 is identical to the previously submitted
v2-0001 patch and contains the core implementation of the GOO algorithm.
The v3-0002 patch adds testing-only code to evaluate different greedy
rules, including a GUC (goo_greedy_strategy) used only for switching
strategies during experiments.

All tests were performed on the TPC-H SF=1 dataset. After loading the
data, I ran the following commands before executing the benchmarks:

```
VACUUM FREEZE ANALYZE;
CHECKPOINT;

ALTER SYSTEM SET join_collapse_limit = 100;
ALTER SYSTEM SET max_parallel_workers_per_gather = 0;
ALTER SYSTEM SET statement_timeout = 600000;
ALTER SYSTEM SET shared_buffers = ‘4GB’;
ALTER SYSTEM SET effective_cache_size = ‘8GB’;
ALTER SYSTEM SET work_mem = ‘1GB’;
SELECT pg_reload_conf();
```

The detailed benchmark results are summarized in tpch.pdf. Execution
times are reported as ratios, using the DP-based optimizer’s execution
time as the baseline (1.0).

The compressed archive tpch_tests_result.zip contains summary.csv, which
is the raw data used to generate tpch.pdf and was produced by the
run_job.sh script. It also includes files (xxx_plan.txt), which were
generated by the run_analysis.sh script and record the EXPLAIN ANALYZE
outputs for the same query under different join-ordering algorithms, to
make plan differences easier to compare.

Based on the TPC-H results, my high-level observations are:
* The threeoutput-oriented greedy rules (rows, selectivity, result size)
show noticeable regressions compared to DP overall, with a relatively
large number of outliers.
* Using total path cost as the greedy key produces results that are
generally closer to DP, but still shows some clear outliers.

To understand why these regressions occur, I mainly looked at Q20 and
Q7, which show particularly large and consistent regressions and expose
different failure modes.

In Q20, there is a join between partsupp and an aggregated lineitem
subquery. For this join, the planner’s rowcount estimate is wrong by
orders of magnitude (tens of rows estimated versus hundreds of thousands
actually produced). As a result, output-oriented greedy rules strongly
prefer this join very early, because it appears to be extremely
shrinking. In reality, it processes large inputs and produces a large
intermediate, and this early misordering can significantly amplify
downstream join costs. This makes Q20 a clear outlier for output-based
greedy rules when estimates are severely wrong.

Q7 exposes a different issue. The cost-based greedy rule tends to choose
a locally cheap join early, but that join creates an intermediate which
later joins become much more expensive to process. In this case, an
early commitment under relatively weak constraints leads to a
many-to-many intermediate that is only filtered after fact-table joins
are applied. This illustrates how a purely cost-driven greedy rule can
make locally reasonable decisions that turn out to be globally harmful.

Taken together, these outliers suggest that all four single-metric
greedy rules tested so far have structural limitations. Output-oriented
rules appear fragile when join rowcount estimates are badly wrong, while
cost-oriented greedy decisions can still lead to locally reasonable but
globally poor plans.

One question this naturally raises is whether making irreversible greedy
choices based only on a local ranking signal is sufficient, or whether
some mechanism is needed to make the approach more robust and to limit
the impact of such outliers.

As a next step, based on the current results, I plan to ignore
selectivity (which performs poorly in many cases), treat rows as largely
redundant with result_size, and move on to testing on the JOB benchmark.
I also plan to compare the behavior of DP, GEQO, and GOO on JOB, and to
use those results to better understand which signals are most useful for
guiding greedy decisions.

I would be very interested in hearing people’s thoughts on these
observations and on possible directions to explore next.

--
Best regards,
Chengpeng Yan

#17

Tomas Vondra

tomas.vondra@2ndquadrant.com

7 months ago

In reply to: Chengpeng Yan (#16)

Re: Add a greedy join search algorithm to handle large join problems

Hi,

On 12/16/25 15:38, Chengpeng Yan wrote:

Recently, I have been testing the TPC-H SF=1 dataset using four simple
greedy join-ordering strategies: join cardinality (estimated output
rows), selectivity, estimated result size in bytes, and cheapest total
path cost. These can be roughly seen as either output-oriented
heuristics (rows / selectivity / result size), which try to optimize the
shape of intermediate results, or a cost-oriented heuristic, which
prefers the locally cheapest join step.

The main goal of these experiments is to check whether the current
greedy rules show obvious structural weaknesses, and to use the observed
behavior as input for thinking about how a greedy rule might evolve.
While there is unlikely to be a perfect greedy strategy, I am hoping to
identify approaches that behave reasonably well across many cases and
avoid clear pathological behavior.

In the attached files, v3-0001 is identical to the previously submitted
v2-0001 patch and contains the core implementation of the GOO algorithm.
The v3-0002 patch adds testing-only code to evaluate different greedy
rules, including a GUC (goo_greedy_strategy) used only for switching
strategies during experiments.

All tests were performed on the TPC-H SF=1 dataset. After loading the
data, I ran the following commands before executing the benchmarks:

```
VACUUM FREEZE ANALYZE;
CHECKPOINT;

ALTER SYSTEM SET join_collapse_limit = 100;
ALTER SYSTEM SET max_parallel_workers_per_gather = 0;
ALTER SYSTEM SET statement_timeout = 600000;
ALTER SYSTEM SET shared_buffers = ‘4GB’;
ALTER SYSTEM SET effective_cache_size = ‘8GB’;
ALTER SYSTEM SET work_mem = ‘1GB’;
SELECT pg_reload_conf();
```

You realize this does not actually change shared buffers size, right?
Because that requires a restart, not just pg_reload_conf. Are you sure
you're running with 4GB shared buffers?

The detailed benchmark results are summarized in tpch.pdf. Execution
times are reported as ratios, using the DP-based optimizer’s execution
time as the baseline (1.0).

The compressed archive tpch_tests_result.zip contains summary.csv, which
is the raw data used to generate tpch.pdf and was produced by the
run_job.sh script. It also includes files (xxx_plan.txt), which were
generated by the run_analysis.sh script and record the EXPLAIN ANALYZE
outputs for the same query under different join-ordering algorithms, to
make plan differences easier to compare.

I'm not sure SF=1 is sufficient for making any clear conclusions. No
matter what the shared_buffers value is, it's going to fit into RAM, the
runs are going to be fully cached.

Based on the TPC-H results, my high-level observations are:
* The threeoutput-oriented greedy rules (rows, selectivity, result size)
show noticeable regressions compared to DP overall, with a relatively
large number of outliers.
* Using total path cost as the greedy key produces results that are
generally closer to DP, but still shows some clear outliers.

To understand why these regressions occur, I mainly looked at Q20 and
Q7, which show particularly large and consistent regressions and expose
different failure modes.

In Q20, there is a join between partsupp and an aggregated lineitem
subquery. For this join, the planner’s rowcount estimate is wrong by
orders of magnitude (tens of rows estimated versus hundreds of thousands
actually produced). As a result, output-oriented greedy rules strongly
prefer this join very early, because it appears to be extremely
shrinking. In reality, it processes large inputs and produces a large
intermediate, and this early misordering can significantly amplify
downstream join costs. This makes Q20 a clear outlier for output-based
greedy rules when estimates are severely wrong.

I don't quite see how we could make good decisions if the estimates are
this wrong. Garbage in, garbage out. AFAICS this affects both the
regular DP planning, which heavily relies on the costing, but also the
approaches on heuristics, which still rely on estimates in some way.

If the aggregate subquery is 1000x off, why should any of the following
decisions be right? The approaches may have different tolerance for
estimation errors - some may be "defensive" and handle misestimates
better, but the trade off is that it may pick slower plans when the
estimates are exact.

I don't think there's an "optimal" approach picking the best plan in
both cases. If there was, join ordering wouldn't be such a challenge.

Q7 exposes a different issue. The cost-based greedy rule tends to choose
a locally cheap join early, but that join creates an intermediate which
later joins become much more expensive to process. In this case, an
early commitment under relatively weak constraints leads to a
many-to-many intermediate that is only filtered after fact-table joins
are applied. This illustrates how a purely cost-driven greedy rule can
make locally reasonable decisions that turn out to be globally harmful.

Taken together, these outliers suggest that all four single-metric
greedy rules tested so far have structural limitations. Output-oriented
rules appear fragile when join rowcount estimates are badly wrong, while
cost-oriented greedy decisions can still lead to locally reasonable but
globally poor plans.

I don't think Q20 is about greediness. Poor estimates are going to be an
issue for all approaches relying on them in some way.

The issue with Q7 seems pretty inherent to greedy algorithms. Picking
solutions that are optimal locally but not globally is the definition of
"greedy". I don't think it matters which exact metric is used, this flaw
is built-in. And I don't think it's solvable.

One question this naturally raises is whether making irreversible greedy
choices based only on a local ranking signal is sufficient, or whether
some mechanism is needed to make the approach more robust and to limit
the impact of such outliers.

Sufficient for what? Sufficient to replace DP or to replace GEQO?

I don't think we'd want to replace DP with such greedy algorithm. With
accurate estimates and modest number of joins, we should be able to find
the best join (more or less).

But this thread is not about replacing DP, I see GOO more as a GEQO
replacement, right? Or did the goal change?

As a next step, based on the current results, I plan to ignore
selectivity (which performs poorly in many cases), treat rows as largely
redundant with result_size, and move on to testing on the JOB benchmark.
I also plan to compare the behavior of DP, GEQO, and GOO on JOB, and to
use those results to better understand which signals are most useful for
guiding greedy decisions.

I'm not sure ignoring selectivity entirely is a good plan, though. Yes,
if the selectivity/estimate is significantly off, the plan may be poor.
But what exactly do you plan to use instead? Isn't a lot of the metrics
(output size, ...) derived from the selectivity/estimate anyway?

I agree the JOB benchmark may be a better fit for this. The TPC-H is
pretty simple, and even for TPC-DS the number of joins is not all that
high. Sorry if my initial feedback suggested these are the proper
benchmarks to evaluate this.

I suggest the evaluation should focus on cases where we expect GOO to
actually be used in practice - as a replacement for GEQO. Only when it
proves useful/acceptable for that use case (for sufficiently many joins
etc.), we can start comparing with DP to figure out if we need to adjust
the thresholds and so on.

Another suggestion is to also test with larger data sets. The problems
with SF=10 or SF=100 may be very different, even with TPC-H. Also,
consider testing with "cold" cases, as if there's nothing cached by
restarting the Postgres instance and drop page cache between runs.

I would be very interested in hearing people’s thoughts on these
observations and on possible directions to explore next.

I realized the paper mentioned at the beginning of this thread is from
1998. That doesn't make it wrong, but I was wondering if there are some
newer papers about join order search, with interesting
alternative/better approaches.

An interesting paper I found is this CIDR21 paper:

Simplicity Done Right for Join Ordering
Axel Hertzschuch, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
https://vldb.org/cidrdb/2021/simplicity-done-right-for-join-ordering.html
https://vldb.org/cidrdb/papers/2021/cidr2021_paper01.pdfl

Which does something similar to our approach, although it seems to be
more a replacement for the DP and not just for GEQO. It's meant to be
cheaper, but also more resilient to poor join orders, as it cares about
"upper bound" (~worst case) for the join orders.

It goes beyond the scope of GOO, as the paper also talks about sampling
to determine some of the estimates. But maybe it'd be useful (better
than GEQO) even without that?

I find the focus on the "worst case" (and only trusting baserel
estimates) interesting. I wonder if it might help with the common
nestloop problem, where we opt to believe a very optimistic low
cardinality estimate, only to end with a sequence of nestloops that
never complete.

regards

--
Tomas Vondra

#18

Chengpeng Yan

chengpeng_yan@outlook.com

7 months ago

In reply to: Tomas Vondra (#17)

Re: Add a greedy join search algorithm to handle large join problems

On Dec 28, 2025, at 01:00, Tomas Vondra <tomas@vondra.me> wrote:

You realize this does not actually change shared buffers size, right?
Because that requires a restart, not just pg_reload_conf. Are you sure
you're running with 4GB shared buffers?

Good catch, and sorry for not mentioning that explicitly. I did restart
the instance and verified that the new shared_buffers setting was in
effect. I’ll make sure to call this out clearly in future descriptions.

I'm not sure SF=1 is sufficient for making any clear conclusions. No
matter what the shared_buffers value is, it's going to fit into RAM, the
runs are going to be fully cached.

Thanks for the comment. I agree that TPC-H at SF=1 is too small to
support strong conclusions. I mainly used SF=1 as a convenient starting
point to quickly validate ideas and iterate on the implementation. I’ve
already run experiments on JOB and plan to share those results next,
where the behavior should be more representative and the conclusions
more meaningful.

I don't quite see how we could make good decisions if the estimates are
this wrong. Garbage in, garbage out. AFAICS this affects both the
regular DP planning, which heavily relies on the costing, but also the
approaches on heuristics, which still rely on estimates in some way.

If the aggregate subquery is 1000x off, why should any of the following
decisions be right? The approaches may have different tolerance for
estimation errors - some may be "defensive" and handle misestimates
better, but the trade off is that it may pick slower plans when the
estimates are exact.

I don't think there's an "optimal" approach picking the best plan in
both cases. If there was, join ordering wouldn't be such a challenge.

I don't think Q20 is about greediness. Poor estimates are going to be an
issue for all approaches relying on them in some way.

The issue with Q7 seems pretty inherent to greedy algorithms. Picking
solutions that are optimal locally but not globally is the definition of
"greedy". I don't think it matters which exact metric is used, this flaw
is built-in. And I don't think it's solvable.

I agree that all approaches are affected by estimation errors, but as
you noted, they differ in how much error they can tolerate. In its
current form, GOO is still quite fragile in this respect and can exhibit
severe regressions when estimates are badly off. Based on additional
experiments, I also agree that a single greedy rule inevitably has cases
where it performs extremely poorly; as a result, my current direction is
no longer to further tune the greedy criterion itself, but to improve
the robustness of the existing greedy choices through additional
mechanisms.

Regarding Q20, I agree that this is not fundamentally a “greedy” issue.
If the rowcount estimates were reasonably accurate, the extreme behavior
observed there would likely be avoided. At the same time, when estimates
are wrong by orders of magnitude, the current GOO approach does not
handle this well, and improving robustness under such severe
misestimation is exactly one of the main aspects I am currently working
on.

I have already run additional experiments in this direction, using
result_size and cost as the base signals, and will share the results
separately.

Sufficient for what? Sufficient to replace DP or to replace GEQO?

I don't think we'd want to replace DP with such greedy algorithm. With
accurate estimates and modest number of joins, we should be able to find
the best join (more or less).

But this thread is not about replacing DP, I see GOO more as a GEQO
replacement, right? Or did the goal change?

The goal has always been to use GOO as a replacement for GEQO, not DP.

I'm not sure ignoring selectivity entirely is a good plan, though. Yes,
if the selectivity/estimate is significantly off, the plan may be poor.
But what exactly do you plan to use instead? Isn't a lot of the metrics
(output size, ...) derived from the selectivity/estimate anyway?

I agree the JOB benchmark may be a better fit for this. The TPC-H is
pretty simple, and even for TPC-DS the number of joins is not all that
high. Sorry if my initial feedback suggested these are the proper
benchmarks to evaluate this.

I suggest the evaluation should focus on cases where we expect GOO to
actually be used in practice - as a replacement for GEQO. Only when it
proves useful/acceptable for that use case (for sufficiently many joins
etc.), we can start comparing with DP to figure out if we need to adjust
the thresholds and so on.

Another suggestion is to also test with larger data sets. The problems
with SF=10 or SF=100 may be very different, even with TPC-H. Also,
consider testing with "cold" cases, as if there's nothing cached by
restarting the Postgres instance and drop page cache between runs.

Thanks for the feedback.

I’ve already run experiments on the JOB benchmark and will share the
results shortly, including a comparison against GEQO, with DP used as a
baseline. I agree that the primary goal at this stage is to show that
the approach is good enough for the intended use case first, before
looking into secondary aspects such as threshold tuning.

Larger scale factors and cold-cache runs are also part of my upcoming
evaluation plan, and I agree those are important dimensions to cover.

Regarding selectivity, I agree the situation is not entirely clear-cut.
As you noted, result_size is also derived from selectivity to some
extent, and many bad cases share similar characteristics. However, in my
initial TPC-H SF=1 experiments, selectivity-based greedy rules already
showed many poor cases, and I expect this to become worse in more
complex scenarios. For now, I therefore focused on cost and result_size:
cost because it is a first-class concept in PostgreSQL, and result_size
because it is commonly cited in the literature as a more robust signal
for greedy join ordering. If you think selectivity is still an important
signal to evaluate, I can certainly add further experiments for it, but
based on the above I have not pursued additional selectivity-focused
tests so far.

I realized the paper mentioned at the beginning of this thread is from
1998. That doesn't make it wrong, but I was wondering if there are some
newer papers about join order search, with interesting
alternative/better approaches.

An interesting paper I found is this CIDR21 paper:

Simplicity Done Right for Join Ordering
Axel Hertzschuch, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
https://vldb.org/cidrdb/2021/simplicity-done-right-for-join-ordering.html
https://vldb.org/cidrdb/papers/2021/cidr2021_paper01.pdfl

Which does something similar to our approach, although it seems to be
more a replacement for the DP and not just for GEQO. It's meant to be
cheaper, but also more resilient to poor join orders, as it cares about
"upper bound" (~worst case) for the join orders.

It goes beyond the scope of GOO, as the paper also talks about sampling
to determine some of the estimates. But maybe it'd be useful (better
than GEQO) even without that?

I find the focus on the "worst case" (and only trusting baserel
estimates) interesting. I wonder if it might help with the common
nestloop problem, where we opt to believe a very optimistic low
cardinality estimate, only to end with a sequence of nestloops that
never complete.

Thanks for the pointer. I’ve also been looking at more recent work on
join ordering, and there are indeed several ideas that seem relevant to
the current direction, including the paper you mentioned. These are
definitely directions I plan to consider more seriously.

My current plan is to first establish a reasonably solid baseline using
relatively simple techniques—reducing extreme regressions and getting
overall plan quality closer to DP. Once that is in place, I’d like to
step back, summarize which recent papers and ideas seem most applicable,
and then experiment with more advanced approaches incrementally,
building on that foundation.

Thanks for all the detailed feedback and the many useful insights
throughout this discussion.

--
Best regards,
Chengpeng Yan

#19

Chengpeng Yan

chengpeng_yan@outlook.com

7 months ago

In reply to: Chengpeng Yan (#18)

Re: Add a greedy join search algorithm to handle large join problems

Hi,

Recently, I have been testing the JOB dataset using three greedy
strategies: result_size, cost, and combined. The result_size and cost
strategies are the same heuristics previously tested on TPC-H with SF =
1.

The combined strategy runs GOO twice, once greedy by cost and once by
result_size, producing two complete candidate plans, and then selects
the one with the lower final estimated total_cost. This allows GOO to
compare plans generated under different greedy criteria.

In the attached files, v4-0001 is identical to the previously submitted
v3-0001 patch and contains the core GOO implementation. All
experiment-related changes are in v4-0002, which adds simple, test-only
support for evaluating different greedy strategies and will be
optimized. The v4_tests archive contains scripts for generating
v4_job.pdf (run_job.sh) and for collecting EXPLAIN ANALYZE results under
different greedy strategies (run_analysis.sh).

The detailed results are included in v4_job.pdf. The results report both
the measured execution times and their ratios relative to the DP
execution time, with color coding used to highlight outliers (<0.5 dark
green, 0.5–0.8 green, 1.2x-2x light red, 2x-10x red, 10+ dark red).

Looking more closely at JOB, only a subset of queries exceeds
PostgreSQL’s default geqo_threshold of 12 relations and therefore
exercises GEQO. This includes the 17-table queries (29a, 29b, 29c), the
14-table queries (28a, 28b, 28c, 33a, 33b, 33c), and the 12-table
queries (24a, 24b, 26a, 26b, 26c, 27a, 27b, 27c, 30a, 30b, 30c), for a
total of about 20 queries.

I therefore also extracted results for this GEQO-relevant subset from
the full JOB runs. The detailed results are included in
v4_job_geqo_range.pdf, derived directly from v4_job.pdf. On this subset,
GOO variants tend to show more favorable behavior overall, particularly
in terms of tail robustness.

For reproducibility, JOB was run using the same configuration as in the
previous experiments. The following commands were executed after loading
the data and before running the benchmarks.

Settings requiring a restart (e.g., shared_buffers) were applied via a
server restart prior to benchmarking; other settings were applied using
ALTER SYSTEM followed by pg_reload_conf().

```sql
VACUUM FREEZE ANALYZE;
CHECKPOINT;

ALTER SYSTEM SET join_collapse_limit = 100;
ALTER SYSTEM SET max_parallel_workers_per_gather = 0;
ALTER SYSTEM SET statement_timeout = 600000;
ALTER SYSTEM SET shared_buffers = '4GB';
ALTER SYSTEM SET effective_cache_size = '8GB';
ALTER SYSTEM SET work_mem = '1GB';
SELECT pg_reload_conf();
```

Full JOB results (ratio = algo / DP, lower is better):

Overall summary
algo n gmean p90 p95 max
---------------- --- ------ ------ ------ ------
GOO(combined) 113 0.953 1.94 3.15 8.68
GOO(result_size) 113 0.969 2.63 4.45 67.53
GEQO 113 1.066 1.50 2.06 16.81
GOO(cost) 113 1.496 5.80 16.20 431.32

Tail behavior
algo >=2x >=5x >=10x
---------------- ----- ----- ------
GOO(combined) 11 3 0
GOO(result_size) 18 5 2
GEQO 9 2 2
GOO(cost) 30 15 7

Workload sum (total execution time, ms)
algo total_ms ratio_to_dp
---------------- ------------ ------------
DP 115710.85 1.000
GOO(cost) 798600.41 6.902
GOO(result_size) 139100.42 1.202
GOO(combined) 154027.19 1.331
GEQO 173464.38 1.499

Observations:
* GOO(cost) shows severe tail behavior (max 431×, p99 327×), with many
bad cases.
* GOO(result_size) performs reasonably on average (gmean 0.969) but has
a poor tail (max 67×).
* GOO(combined) is the most robust variant: it achieves the best gmean
(0.953), shows no regressions ≥10×, and significantly reduces the worst
case (max 8.68×), trading some average workload time for improved tail
behavior.
* GEQO is generally slower on JOB (gmean 1.066) and also exhibits ≥10×
regressions (max 16.8×).

Conclusions:
1. All non-DP algorithms have cases that outperform DP, which typically
arise in situations where the cost estimates no longer reflect the
relative execution costs of alternative plans, often due to severe
cardinality misestimation. This patch set does not attempt to fix
cardinality estimation; instead, it focuses on reducing tail risk among
greedy plans under the existing cost model.

2. Based on the current JOB results, further tuning of a single greedy
heuristic is not the immediate priority: each single-strategy variant
still shows corner cases with large regressions. The next direction is
to reduce catastrophic plans (tail risk), rather than to optimize one
specific greedy criterion.

3. GOO(combined) improves robustness by reducing extreme regressions.
Increasing plan diversity via multiple greedy criteria appears
effective: different heuristics fail for different reasons, and the
combined variant selects the cheapest plan among candidates from
different greedy strategies, helping avoid extreme outliers with minimal
additional planning overhead.

Next, I include brief notes on a small set of representative JOB queries
to illustrate where greedy join ordering behaves relatively well or
poorly. Under severe cardinality misestimation, neither DP-based nor
heuristic approaches can reliably select good plans (“garbage in,
garbage out”); different algorithms simply exhibit different tolerances
to such errors. These examples document how specific join-ordering
choices interact with misestimation and help identify where the current
GOO implementation is more or less fragile.

For regressions, I focus on GOO(result_size) (12b, 3b, 11b) and
GOO(combined) (15a, 17b). I also include a few cases where
GOO(result_size) outperforms DP (29c, 31a). The full plan-level
analyses are in the attached v4_bad_case_analysis.txt and are optional
background; the summary points below are sufficient for the main
conclusions.

## GOO(result_size) bad case analysis

TL;DR:
* 12b: Estimation error misleads the greedy choice, causing a highly
selective predicate to be applied too late; with accurate estimates, the
regression can be avoided.
* 3b: An inherent weakness of result_size-based greediness, where small
estimated results hide expensive scans or repeated execution.
* 11b: Row count underestimation leads to a non-materialized NLJ with
repeated execution; accurate estimates or materialization / Hash Join
largely avoid the regression.

## GOO(combined) bad case analysis

GOO(combined) regressions mainly fall into two categories:
1. failures of the cost model itself (15a), and
2. cases where cost and result_size share the same structural weakness
(17b).

TL;DR:
* 15a: The selector chooses the cost-greedy plan due to lower estimated
cost, even though the result_size plan is much faster in reality. This
reflects cost model unreliability; in such cases, plan diversity
collapses back to a single fragile choice. This is an expected
limitation when both candidate plans depend on the same cost signal;
suggestions on improving plan diversity or adding lightweight safeguards
are welcome.
* 17b: Both greedy strategies converge to similarly bad plans (~3× DP),
indicating a structural limitation of greedy enumeration rather than a
selector issue.

## Representative cases where GOO(result_size) outperforms DP

In addition to regressions, I include a small number of cases where
greedy join ordering outperforms DP. These cases are not meant to
suggest that greedy ordering makes intrinsically better decisions, but
to illustrate different failure modes: avoiding fan-out amplification
under severe underestimation (29c, 31a).

TL;DR:
* 29c: Under severe cardinality underestimation, DP introduces early
fan-out, producing large intermediates and repeated inner execution.
GOO(result_size) starts from a highly selective predicate and avoids the
fan-out explosion.
* 31a: DP chooses an early multiplying join under severe
underestimation, causing row counts to explode. GOO(result_size) delays
the multiplying join until a more selective prefix is built.

## Discussion and next steps

On JOB, the combined variant behaves reasonably well overall: it reduces
extreme regressions compared to the single-heuristic variants, and looks
competitive with GEQO on aggregate measures. This does not show that it
is “good enough” in general, but it seems like a promising direction and
worth broader evaluation.

The results are clearly workload- and query-dependent. TPC-H and TPC-DS
provide limited coverage of scenarios where GEQO is exercised, while
JOB, as a commonly used and well-recognized benchmark, includes a subset
of queries that reflect cases where GEQO is actually used. That said,
JOB is still limited in overall join graph size. I therefore plan to
explore more complex workloads with larger join graphs and queries
involving more relations to better approximate real-world GEQO usage.
Suggestions for suitable workloads or query sets would be very welcome.

Evaluating plan quality in isolation is inherently difficult, since join
ordering is only one component of plan selection, alongside cardinality
estimation, operator costing, and other planner decisions. As seen on
JOB, even DP does not always produce the fastest plan, suggesting that
improvements in join ordering alone do not consistently translate into
end-to-end plan quality gains. This makes it less clear how such
improvements should be measured, or how average and tail behavior should
be weighed.

Planning time and memory usage, in contrast, follow more directly from
the algorithmic structure, and I plan to add explicit measurements for
them in future. Beyond plan quality, I am also interested in addressing
areas where GEQO is less satisfactory in practice, such as tunability
and graceful degradation (e.g., using DP up to a resource limit and
completing planning with a greedy step). Broader evaluation across
workloads should help clarify both aspects.

Given the current JOB results and the fact that this implementation
matches what seems to be a common baseline in industry systems and much
of the literature, my next step is to treat the current implementation
as a starting point and expand evaluation: more workloads and
configurations. I also plan to incorporate selectivity into the greedy
strategy and re-run the evaluations.

There is also a large body of academic work proposing additional
mechanisms to improve robustness. Many of these ideas do not seem to
have a clear “production” consensus, and few systems implement them, so
I would prefer to explore them incrementally based on evidence: first
establish behavior of the baseline across a wider range of tests, then
evaluate a small number of well-motivated improvments and measure their
impact.

--
Best regards,
Chengpeng Yan

#20

lakshmi

lakshmigcdac@gmail.com

5 months ago

In reply to: Pavel Stehule (#14)

Re: Add a greedy join search algorithm to handle large join problems

HI all,
I tested the latest GOO patch (v4) on a fresh build from the current
PostgreSQL master. The patch applied cleanly, the server built without
issues, and regression tests passed except for the expected EXPLAIN output
differences due to the new join ordering behavior.

As a quick sanity check, I compared DP, GEQO, and GOO on a small multi-join
query:

DP planning: ~0.66 ms
GEQO planning: ~2.28 ms
GOO planning: ~0.38 ms
Execution times were similar across all three (~1.5–1.7 ms) with no
correctness issues. Even on a small join, GEQO shows higher planning
overhead, while GOO plans faster with comparable execution cost.
I then evaluated scaling using synthetic 15-table and 20-table joins with
EXPLAIN (ANALYZE, TIMING OFF):
15 tables
DP: ~22.9 ms | ~23.4 ms
GEQO: ~46.7 ms | ~20.5 ms
GOO: ~1.8 ms | ~22.4 ms

20 tables
DP: ~48.1 ms | ~30.5 ms
GEQO: ~51.0 ms | ~26.7 ms
GOO: ~3.2 ms | ~29.0 ms

Planning time increases notably for DP and remains relatively high for
GEQO, while GOO stays very low even at 20 joins, indicating substantially
reduced planning overhead. Execution costs remain broadly comparable, with
no obvious regressions from GOO in this synthetic workload.

Although this uses a controlled synthetic join graph rather than JOB/TPC-H,
the scaling behavior appears consistent with GOO’s goal of significantly
cheaper planning than DP/GEQO while preserving similar plan quality.

I plan to continue testing with more realistic workloads and will share
further results if anything notable appears.

Thanks for the interesting work.

Regards,
Lakshmi

On Fri, Feb 13, 2026 at 12:33 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Show quoted text

čt 11. 12. 2025 v 18:07 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

On 12/11/25 07:12, Pavel Stehule wrote:

čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com
<mailto:johncnaylorls@gmail.com>> napsal:

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me
<mailto:tomas@vondra.me>> wrote:

I did however notice an interesting thing - running EXPLAIN on

the 99

queries (for 3 scales and 0/4 workers, so 6x 99) took this much

time:

master: 8s
master/geqo: 20s
master/goo: 5s

It's nice that "goo" seems to be faster than "geqo" - assuming the

plans

are comparable or better. But it surprised me switching to geqo

makes it

slower than master. That goes against my intuition that geqo is

meant to

be cheaper/faster join order planning. But maybe I'm missing

something.

Yeah, that was surprising. It seems that geqo has a large overhead,

so

it takes a larger join problem for the asymptotic behavior to win

over

exhaustive search.

If I understand correctly to design - geqo should be slower for any
queries with smaller complexity. The question is how many queries in the
tested model are really complex.

Depends on what you mean by "really complex". TPC-DS queries are not
trivial, but the complexity may not be in the number of joins.

Of course, setting geqo_threshold to 2 may be too aggressive. Not sure.

I checked the TPC-H queries and almost all queries are simple - 5 x JOIN
-- 2x nested subselect

regards

--
Tomas Vondra