should we have a fast-path planning for OLTP starjoins?
Hi,
While running benchmarks for my "performance over 20 years" talk [1]https://www.postgresql.eu/events/pgconfeu2024/schedule/session/5585-performance-archaeology/,
I've been been also looking for common cases that don't perform well
(and thus might be a good topic for optimization, with significant
speedup helping a lot of deployments).
One such simple example that I ran into is "OLTP starjoin". You're
probably familiar with star schema in the DSS field [2]https://en.wikipedia.org/wiki/Star_schema, as a large fact
table with many small-ish dimensions. The OLTP variant is exactly the
same thing, but with selective WHERE conditions on the fact table.
So you can imagine it as a query of this shape:
SELECT * FROM fact_table f
JOIN dim1 ON (f.id1 = dim1.id)
JOIN dim2 ON (f.id2 = dim2.id)
JOIN dim3 ON (f.id3 = dim3.id)
...
WHERE f.id = 2398723;
This is a surprisingly common query pattern in OLTP applications, thanks
to normalization. For example the "fact" may be a table of transactions
with some basic common details, dimensions are additional "details" for
special types of transactions. When loading info about a transaction of
unknown type, this allows you to load everything at once.
Or maybe the fact table is "users" and the dimensions have all kinds of
info about the user (address, primary e-mail address, balance, ...).
Anyway, this pattern is quite common, yet it performs quite poorly.
Let's join a fact table with 10 dimensions - see the attached create
script to build such schema, and the test.sql script for pgbench.
On my new ryzen machine, this peaks at about ~16k tps with 16 clients.
The machine can easily do 1M tps in read-only pgbench, for example. And
if you increase the join_collapse_limit to 12 (because the default 8 is
not enough for the 10 dimensions), the throughput drops to ~2k tps.
That's not great.
AFAIK this is a consequence of the star joins allowing arbitrary join
order of the dimensions - those only have join conditions to the fact
relation, so it allows many join orders. So exploring them takes a lot
of time, of course.
But for starjoins, a lot of this is not really needed. In the simplest
case (no conditions on dimensions etc) it does not really matter in what
order we join those, and filters on dimensions make it only a little bit
more complicated (join the most selective first).
So I've been wondering how difficult would it be to have a special
fast-path mode for starjoins, completely skipping most of this. I
cobbled together a WIP/PoC patch (attached) on the way from FOSDEM, and
it seems to help quite a bit.
I definitely don't claim the patch is correct for all interesting cases,
just for the example query. And I'm sure there's plenty of things to fix
or improve (e.g. handling of outer joins, more complex joins, ...).
But these are the rough results for 1 and 16 clients:
build 1 16
--------------------------------------
master 1600 16000
patched 4400 46000
So that about triples the throughput. If you bump join_collapse_limit to
12, it gets even clearer
build 1 16
--------------------------------------
master 200 2000
patched 4500 48000
That's a 20x improvement - not bad. Sure, this is on a tiny dataset, and
with larger data sets it might need to do I/O, diminishing the benefits.
It's just an example to demonstrate the benefits.
If you want to try the patch, there's a new GUC enable_starjoin to
enable this optimization (off by default).
The patch does roughly this:
1) It tries to detect a "star join" before doing the full join order
search. It simply looks for the largest relation (not considering the
conditions), and assumes it's a fact. And then it searches for relations
that only join to the fact - those are the dimensions.
2) With the relations found in (1) it just builds the join relations
directly (one per level), without exploring all the possibilities. This
is where the speedup comes from.
3) If there are additional relations, those are then left to the regular
join order search algorithm.
There's a lot of stuff that could / should be improved on the current
patch. For (1) we might add support for more complex cases with
snowflake schemas [3]https://en.wikipedia.org/wiki/Snowflake_schema or with multiple fact tables. At the same time (1)
needs to be very cheap, so that it does not regress every non-starjoin
query.
For (2) it might pick a particular order we join the dimensions (by
size, selectivity, ...), and it might consider whether to join them
before/after the other relations.
FWIW I suspect there's a fair amount of research papers looking at
starjoins and what is the optimal plan for such queries, but I didn't
have time to look at that yet. Pointers welcome!
But the bigger question is whether it makes sense to have such fast-path
modes for certain query shapes. The patch "hard-codes" the planning for
starjoin queries, but we clearly can't do that for every possible join
shape (because then why have dynamic join search at all?).
I do think starjoins might be sufficiently unique / special to justify
this, but maybe it would be possible to instead improve the regular join
order search to handle this case better? I don't have a very clear idea
what would that look like, though :-(
I did check what do some other databases do, and they often have some
sort of "hint" to nudge the let the optimizer know this is a starjoin.
I also looked at what are the main bottlenecks with the simpler starjoin
planning enabled - see the attached flamegraphs. The optimizations seem
to break the stacktraces a bit, so there's a svg for "-O0 -ggdb3" too,
that doesn't have this issue (the shape is different, but the conclusion
are about the same).
In both cases about 40% of the time is spent in initial_cost_mergejoin,
which seems like a lot - and yes, disabling mergejoin doubles the
throughput. And most of the cost is in get_actual_variable_range,
looking up the range in the btrees. That seems like a lot, considering
the indexes are perfectly clean (we used to have problems with deleted
tuples, but this is not the case). I wonder if maybe we could start
caching this kind of info somewhere.
regards
[1]: https://www.postgresql.eu/events/pgconfeu2024/schedule/session/5585-performance-archaeology/
https://www.postgresql.eu/events/pgconfeu2024/schedule/session/5585-performance-archaeology/
[2]: https://en.wikipedia.org/wiki/Star_schema
[3]: https://en.wikipedia.org/wiki/Snowflake_schema
--
Tomas Vondra
Attachments:
On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
This is a surprisingly common query pattern in OLTP applications,
thanks
to normalization.
+1. Creating a small lookup table should be encouraged rather than
penalized.
Your test data includes a fact table with 10k rows and no index on the
filter condition. In OLTP applications the fact table might often fit
in memory, but I'd still expect it to have an index on the filter
condition. That might not change your overall point, but I'm curious
why you constructed the test that way?
There's a lot of stuff that could / should be improved on the current
patch. For (1) we might add support for more complex cases with
snowflake schemas [3] or with multiple fact tables. At the same time
(1)
needs to be very cheap, so that it does not regress every non-
starjoin
query.
The patch only considers the largest table as the fact table, which is
a good heuristic of course. However, I'm curious if other approaches
might work. For instance, could we consider the table involved in the
most join conditions to be the fact table?
If you base it on the join conditions rather than the size of the
table, then detection of the star join would be based purely on the
query structure (not stats), which would be nice for predictability.
But the bigger question is whether it makes sense to have such fast-
path
modes for certain query shapes.
We should explore what kinds of surprising cases it might create, or
what maintenance headaches might come up with future planner changes.
But the performance numbers you posted suggest that we should do
something here.
Regards,
Jeff Davis
On 2/4/25 20:43, Jeff Davis wrote:
On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
This is a surprisingly common query pattern in OLTP applications,
thanks
to normalization.+1. Creating a small lookup table should be encouraged rather than
penalized.Your test data includes a fact table with 10k rows and no index on the
filter condition. In OLTP applications the fact table might often fit
in memory, but I'd still expect it to have an index on the filter
condition. That might not change your overall point, but I'm curious
why you constructed the test that way?
No particular reason. I think I intended to make it a lookup by PK
(which would match the use case examples), and I forgot about that. But
yeah, I would expect an index too.
There's a lot of stuff that could / should be improved on the current
patch. For (1) we might add support for more complex cases with
snowflake schemas [3] or with multiple fact tables. At the same time
(1)
needs to be very cheap, so that it does not regress every non-
starjoin
query.The patch only considers the largest table as the fact table, which is
a good heuristic of course. However, I'm curious if other approaches
might work. For instance, could we consider the table involved in the
most join conditions to be the fact table?If you base it on the join conditions rather than the size of the
table, then detection of the star join would be based purely on the
query structure (not stats), which would be nice for predictability.
Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.
I think the cost of the heuristic will be an important metric - I don't
know if the number of join conditions is more expensive to determine
than what the patch does now, though.
But the bigger question is whether it makes sense to have such fast-
path
modes for certain query shapes.We should explore what kinds of surprising cases it might create, or
what maintenance headaches might come up with future planner changes.
But the performance numbers you posted suggest that we should do
something here.
Yes, it seems like an interesting opportunity for starjoin queries. It's
a pretty common query pattern, but it also happens to be very expensive
to plan because the dimensions can be reordered almost arbitrarily.
regards
--
Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes:
On 2/4/25 20:43, Jeff Davis wrote:
If you base it on the join conditions rather than the size of the
table, then detection of the star join would be based purely on the
query structure (not stats), which would be nice for predictability.
Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.
I think you need to insist on foreign keys. Otherwise you don't know
whether the joins will eliminate fact-table rows. If that's a
possibility then it's no longer sensible to ignore different join
orders.
I'm kind of imagining a planner rule like "if table X is joined to
using a match of a foreign-key column to its PK (so that the join
removes no rows from the other table) and there are not other
restriction conditions on table X, then force X to be joined last.
And if there are multiple such tables X, it doesn't matter what
order they are joined in as long as they're last."
The interesting thing about this is we pretty much have all the
infrastructure for detecting such FK-related join conditions
already. Possibly the join order forcing could be done with
existing infrastructure too (by manipulating the joinlist).
regards, tom lane
On 2/4/25 09:00, Tomas Vondra wrote:
There's a lot of stuff that could / should be improved on the current
patch. For (1) we might add support for more complex cases with
snowflake schemas [3] or with multiple fact tables. At the same time (1)
needs to be very cheap, so that it does not regress every non-starjoin
query.For (2) it might pick a particular order we join the dimensions (by
size, selectivity, ...), and it might consider whether to join them
before/after the other relations.FWIW I suspect there's a fair amount of research papers looking at
starjoins and what is the optimal plan for such queries, but I didn't
have time to look at that yet. Pointers welcome!But the bigger question is whether it makes sense to have such fast-path
modes for certain query shapes. The patch "hard-codes" the planning for
starjoin queries, but we clearly can't do that for every possible join
shape (because then why have dynamic join search at all?).
+ /*
+ * Try simplified planning for starjoin. If it succeeds, we should
+ * continue at level startlev.
+ */
+ startlev = starjoin_join_search(root, initial_rels, 2);
(I should probably don a flame retardant suit, but...)
This sounds like an interesting idea, but it makes me wonder if we
should have a more generic mechanism here so that if "some pattern is
matched" then "use some simplified planning method" -- of which the
starjoin is the first and builtin example, but allowing for others to be
plugged in via extensions.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 2/4/25 21:34, Joe Conway wrote:
On 2/4/25 09:00, Tomas Vondra wrote:
There's a lot of stuff that could / should be improved on the current
patch. For (1) we might add support for more complex cases with
snowflake schemas [3] or with multiple fact tables. At the same time (1)
needs to be very cheap, so that it does not regress every non-starjoin
query.For (2) it might pick a particular order we join the dimensions (by
size, selectivity, ...), and it might consider whether to join them
before/after the other relations.FWIW I suspect there's a fair amount of research papers looking at
starjoins and what is the optimal plan for such queries, but I didn't
have time to look at that yet. Pointers welcome!But the bigger question is whether it makes sense to have such fast-path
modes for certain query shapes. The patch "hard-codes" the planning for
starjoin queries, but we clearly can't do that for every possible join
shape (because then why have dynamic join search at all?).+Â Â Â /* +Â Â Â Â * Try simplified planning for starjoin. If it succeeds, we should +Â Â Â Â * continue at level startlev. +Â Â Â Â */ +Â Â Â startlev = starjoin_join_search(root, initial_rels, 2);(I should probably don a flame retardant suit, but...)
This sounds like an interesting idea, but it makes me wonder if we
should have a more generic mechanism here so that if "some pattern is
matched" then "use some simplified planning method" -- of which the
starjoin is the first and builtin example, but allowing for others to be
plugged in via extensions.
We already have join_search_hook_type. I haven't used that in the PoC,
because I wanted to use joinrels.c functions defined as static, etc.
The main challenge would be handling queries that have multiple of such
patterns. The current hook is expected to process the whole list, while
what we'd need is more like splitting the list into chunks (one chunk
per query pattern), and then calling the hooks to handle the chunks in
some order.
But I don't think the patch should be required to invent this. We don't
even have an example of a second pattern.
regards
--
Tomas Vondra
On 2/4/25 21:23, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
On 2/4/25 20:43, Jeff Davis wrote:
If you base it on the join conditions rather than the size of the
table, then detection of the star join would be based purely on the
query structure (not stats), which would be nice for predictability.Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.I think you need to insist on foreign keys. Otherwise you don't know
whether the joins will eliminate fact-table rows. If that's a
possibility then it's no longer sensible to ignore different join
orders.
Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.
Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.
I was thinking that if we allow the dimensions to eliminate rows in the
fact table, we'd simply join them starting from the most selective ones.
But that doesn't work if the joins might have different per-row costs
(e.g. because some dimensions are much larger etc). Doing something
smarter would likely end up fairly close to the regular join order
algorithm ...
I'm kind of imagining a planner rule like "if table X is joined to
using a match of a foreign-key column to its PK (so that the join
removes no rows from the other table) and there are not other
restriction conditions on table X, then force X to be joined last.
And if there are multiple such tables X, it doesn't matter what
order they are joined in as long as they're last."
I think it'd need to be a bit smarter, to handle (a) snowflake schemas
and (b) additional joins referencing the starjoin result.
The (a) shouldn't be too hard, except that it needs to check the
'secondary dimension' is also joined by FK and has no restrictions, and
then do that join later.
For (b), I don't have numbers but I've seen queries that first do a
starjoin and then add more data to that, e.g. by joining to a
combination of attributes from multiple dimensions (think region +
payment type). Or by joining to some "summary" table that does not have
an explicit FK. Still, we could leave at least some of the joins until
the very end, I guess. But even for the dimensions joined earlier the
order does not really matter.
I think (a) is something we should definitely handle. (b) is more a
DWH/BI thing, not really an OLTP query (which is what this thread is about).
The interesting thing about this is we pretty much have all the
infrastructure for detecting such FK-related join conditions
already. Possibly the join order forcing could be done with
existing infrastructure too (by manipulating the joinlist).
Maybe, interesting. I've ruled out relying on the FKeys early in the
coding, but I'm sure there's infrastructure the patch could use. It'd
still need to check the transitive FK relationships for snowflake joins
to work, ofc. Which is not something we need to consider right now.
What kind of "manipulation" of the joinlist you have in mind?
regards
--
Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes:
The interesting thing about this is we pretty much have all the
infrastructure for detecting such FK-related join conditions
already. Possibly the join order forcing could be done with
existing infrastructure too (by manipulating the joinlist).
Maybe, interesting. I've ruled out relying on the FKeys early in the
coding, but I'm sure there's infrastructure the patch could use.
It would be very sad to do that work twice in a patch that purports
to reduce planning time. If it's done too late to suit you now,
could we move it to happen earlier?
What kind of "manipulation" of the joinlist you have in mind?
Right now, if we have four tables to join, we have a joinlist
(A B C D). (Really they're integer relids, but let's use names here.)
If we decide to force C to be joined last, it should be sufficient to
convert this to ((A B D) C). If C and D both look like candidates for
this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
This is pretty much the same thing that happens if you set
join_collapse_limit to 1 and use JOIN syntax to force a join order.
In fact, IIRC we start out with nested joinlists and there is some
code that normally flattens them until it decides it'd be creating
too large a sub-problem. I'm suggesting selectively reversing the
flattening.
regards, tom lane
On Tue, Feb 4, 2025 at 3:42 PM Tomas Vondra <tomas@vondra.me> wrote:
On 2/4/25 21:23, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
On 2/4/25 20:43, Jeff Davis wrote:
If you base it on the join conditions rather than the size of the
table, then detection of the star join would be based purely on the
query structure (not stats), which would be nice for predictability.Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.I think you need to insist on foreign keys. Otherwise you don't know
whether the joins will eliminate fact-table rows. If that's a
possibility then it's no longer sensible to ignore different join
orders.Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.I was thinking that if we allow the dimensions to eliminate rows in the
fact table, we'd simply join them starting from the most selective ones.
But that doesn't work if the joins might have different per-row costs
(e.g. because some dimensions are much larger etc). Doing something
smarter would likely end up fairly close to the regular join order
algorithm ...
As long as the join is still happening there doesn't appear to be a
correctness issue here, so I'm not sure mandating FKs makes sense.
The reason this matters is that highly concurrent FK checks can get VERY
expensive (due to the cost of creating multiXacts). While it'd be great to
fix that issue the reality today is it's not uncommon for people to remove
FKs because of the high performance penalty.
Jim Nasby <jnasby@upgrade.com> writes:
On Tue, Feb 4, 2025 at 3:42â¯PM Tomas Vondra <tomas@vondra.me> wrote:
Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.
As long as the join is still happening there doesn't appear to be a
correctness issue here, so I'm not sure mandating FKs makes sense.
The reason this matters is that highly concurrent FK checks can get VERY
expensive (due to the cost of creating multiXacts). While it'd be great to
fix that issue the reality today is it's not uncommon for people to remove
FKs because of the high performance penalty.
Meh. If we don't apply this optimization when there's no FK, we have
not made those folks' life any worse. If we apply it despite there
being no FK, we might choose a materially worse plan than before, and
that *will* make their lives worse.
regards, tom lane
On Wed, Feb 5, 2025 at 5:42 AM Tomas Vondra <tomas@vondra.me> wrote:
Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.I was thinking that if we allow the dimensions to eliminate rows in the
fact table, we'd simply join them starting from the most selective ones.
But that doesn't work if the joins might have different per-row costs
(e.g. because some dimensions are much larger etc). Doing something
smarter would likely end up fairly close to the regular join order
algorithm ...
Yeah, we need to ensure that the joins to the fact table don't affect
its rows; otherwise, the join order matters for the final query plan,
and we'd better run the regular join search algorithm in this case.
For inner joins, using the foreign key seems ideal for this. For left
joins, we might be able to leverage rel_is_distinct_for() to handle
that.
Thanks
Richard
On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Right now, if we have four tables to join, we have a joinlist
(A B C D). (Really they're integer relids, but let's use names here.)
If we decide to force C to be joined last, it should be sufficient to
convert this to ((A B D) C). If C and D both look like candidates for
this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
This is pretty much the same thing that happens if you set
join_collapse_limit to 1 and use JOIN syntax to force a join order.
In fact, IIRC we start out with nested joinlists and there is some
code that normally flattens them until it decides it'd be creating
too large a sub-problem. I'm suggesting selectively reversing the
flattening.
This should not be too difficult to implement. Outer joins seem to
add some complexity, though. We need to ensure that the resulting
joins in each sub-list are legal given the query's join order
constraints. For example, if we make the joinlist be (((A B) C) D),
we need to ensure that the A/B join and the (A/B)/C join are legal.
Thanks
Richard
On 2/5/25 09:27, Richard Guo wrote:
On Wed, Feb 5, 2025 at 5:55â¯AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Right now, if we have four tables to join, we have a joinlist
(A B C D). (Really they're integer relids, but let's use names here.)
If we decide to force C to be joined last, it should be sufficient to
convert this to ((A B D) C). If C and D both look like candidates for
this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
This is pretty much the same thing that happens if you set
join_collapse_limit to 1 and use JOIN syntax to force a join order.
In fact, IIRC we start out with nested joinlists and there is some
code that normally flattens them until it decides it'd be creating
too large a sub-problem. I'm suggesting selectively reversing the
flattening.This should not be too difficult to implement. Outer joins seem to
add some complexity, though. We need to ensure that the resulting
joins in each sub-list are legal given the query's join order
constraints. For example, if we make the joinlist be (((A B) C) D),
we need to ensure that the A/B join and the (A/B)/C join are legal.
If the requirement is that all "dimensions" only join to the fact table
(which in this example would be "A" I think) through a FK, then why
would these joins be illegal?
We'd also need to require either an outer (left) join, or "NOT NULL" on
the fact table side, right? IIRC we already do that when using the FKeys
for join estimates.
Essentially, this defines a "dimension" as a relation that is joined
through a PK, without any other restrictions, both of which seems fairly
simple to check, and it's a "local" feature. And we'd simply mark those
as "join at the very end, in arbitrary order". Easy enough, I guess.
I'm thinking about some more complex cases:
(a) Query with multiple starjoins (a special case of that is snowflake
schema) - but I guess this is not too difficult, it just needs to
consider the FKs as "transitive" (a bit like Dijkstra's algorithm). In
the worst case we might need to "split" the whole query into multiple
smaller subproblems.
(b) Joining additional stuff to the dimensions (not through a FK,
possibly to multiple dimensions, ...). Imagine a "diamond join" with
some summary table, etc. IMHO this is a fairly rare case / expensive
enough to make the planning part less important.
I'm also wondering how this should interact with join_collapse_limit. It
would seems ideal to do this optimization before splitting the join list
into subproblems (so that the "dimensions" are do not even count against
the limit, pretty much). But that would mean join_collapse_limit can no
longer be used to enforce a join order like today ...
regards
--
Tomas Vondra
Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.
Indeed, many installations specifically _remove_ foreign keys because of
the dreaded RI check on delete. Basically, if you delete one or more rows
in dim1 then the referencing fact1 must be scanned to ensure that it does
not contain a reference to the deleted row. Often the referencing field on
fact1 is not indexed, because the index is almost never useful in an actual
select query, so even if you did index it several unused index metrics will
identify it as a candidate for deletion. What you get is one sequential
scan of fact1 for every row deleted from dim1. Now, we could get around
this by changing how we do delete RI checks, either by moving to statement
level triggers or bypassing triggers entirely, but until we do so, it is
likely that many customers avoid otherwise useful FK references.
On Wed, Feb 5, 2025 at 4:23 AM Tomas Vondra <tomas@vondra.me> wrote:
If the requirement is that all "dimensions" only join to the fact table
(which in this example would be "A" I think) through a FK, then why
would these joins be illegal?...
Essentially, this defines a "dimension" as a relation that is joined
through a PK, without any other restrictions, both of which seems fairly
simple to check, and it's a "local" feature. And we'd simply mark those
as "join at the very end, in arbitrary order". Easy enough, I guess.
As I understand your proposal, you want to detect queries that join a
large number, N, of tables -- which means doing an exhaustive search
of all possible join orders is expensive -- where N - 1 of the tables
do not join to each other, but join only to the Nth table.
PostgreSQL already falls back on geqo when it hits some heuristic that
says exhaustive search is too expensive, but you're proposing an
additional, better heuristic.
Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you
gave, the single-table predicate on F makes it small enough, I think,
that F will be the "build" side of any Hash Join, right? You're
assuming, I think, that the cardinality |F| = 1, after applying the
filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <=
k < N. So then the join order does not matter. I think this is what
you mean by "OLTP star join."
For *OLAP* star joins, Oracle's Star Transformation [1]https://blogs.oracle.com/optimizer/post/optimizer-transformations-star-transformation works
reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs
Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq
Scan on F.
So, I suggest:
1. Add an *OLTP* optimization similar to what you described, but
instead of classifying the largest table as fact F, look for the "hub"
of the star and classify it as F. And then enable your optimization if
and only if the estimated nrows for F is very small.
2. For an *OLAP* optimization, do something like Oracle's Star Transformation.
Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star
queries, because the fact table F is *small* (post-filtering). And
because F is small, it doesn't matter so much in what order you join
the dimension tables, because the result is "likely" to be small as
well.
Tom correctly points out that you really need foreign key constraints
to ensure the previous sentence's "likely," but since your
optimization is just intended to avoid considering unnecessary join
orders, you may be able to get away with asking the optimizer what it
thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and
just fall back on the existing join-search logic when the optimizer
thinks that Dk will create lots of rows (and so the join order
matters...).
So much for the OLTP case. For completeness, some discussion about the
OLAP case; fwiw, let me start by plugging my "credentials" [2]https://patents.google.com/patent/US20150088856A1/en.
The OLAP case is more complicated than the OLTP case, in that the bad
thing about *OLAP* star joins is that joins are pairwise. With OLAP
star joins, you assume that |F| is always much larger than |Dk|, and
by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than
|D(k+1)|. And the problem for OLAP is that while every Dk potentially
filters rows out from F, you have to join to the Dk's one at a time,
so you never get as much filtering as you'd like.
For OLAP, you can take the Cartesian product of D1, ..., DN , and then
scan F to aggregate into the resulting cube; see [3]https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/optimizing-in-memory-aggregation.html . (Link [2]https://patents.google.com/patent/US20150088856A1/en is
related to transformation.)
Or, you can scan D1, ..., DN first, without joining anything,
constructing Hash tables and Bloom filters from your scans; then push
the Bloom filters down to the scan of F; and finally join the
(Bloom-filtered) F back to D1, ..., DN. This is what link [1]https://blogs.oracle.com/optimizer/post/optimizer-transformations-star-transformation
describes. Note that [1]https://blogs.oracle.com/optimizer/post/optimizer-transformations-star-transformation came out before [3]https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/optimizing-in-memory-aggregation.html.
However... for OLAP, you see from the above discussion that it's not
compilation that takes too long, but rather execution. So the
optimizations require significant changes to the SQL executor.
What you're proposing, IIUC, is a nice optimization to compilation
times, which is why (I think) you're focused on the OLTP use case. In
that case, I suggest focusing on an OLTP-specific solution, maybe a
straw man like:
1. I see a query where N-1 relations join to the Nth relation, but not
to each other (except transitively, of course).
2. Estimated cardinality for F, after pushing down single table
predicates, is very small.
3. OK, let's start joining tables D1, ..., D(N-1) in order, since
we're assuming (thanks to (1) and (2)) that the join order won't
matter.
4. Continue joining tables in this fixed (arbitrary) order, unless we
come to a Dk where the optimizer thinks joining to Dk will generate a
significant number of rows.
5. Either we join all tables in order (fast compilation!); or we hit
the case in (4), so we just fall back on the existing join logic.
Thanks,
James
[1]: https://blogs.oracle.com/optimizer/post/optimizer-transformations-star-transformation
[2]: https://patents.google.com/patent/US20150088856A1/en
[3]: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/optimizing-in-memory-aggregation.html
On 2/7/25 20:11, James Hunter wrote:
On Wed, Feb 5, 2025 at 4:23â¯AM Tomas Vondra <tomas@vondra.me> wrote:
If the requirement is that all "dimensions" only join to the fact table
(which in this example would be "A" I think) through a FK, then why
would these joins be illegal?...
Essentially, this defines a "dimension" as a relation that is joined
through a PK, without any other restrictions, both of which seems fairly
simple to check, and it's a "local" feature. And we'd simply mark those
as "join at the very end, in arbitrary order". Easy enough, I guess.As I understand your proposal, you want to detect queries that join a
large number, N, of tables -- which means doing an exhaustive search
of all possible join orders is expensive -- where N - 1 of the tables
do not join to each other, but join only to the Nth table.
Yes. Essentially, it reduces the size of the problem by ignoring joins
for which we know the optimal order. We know the dimensions can be
joined last, and it does not matter in which exact order we join them.
The starjoins are a bit of the "worst case" for our heuristics, because
there are no dependencies between the dimensions, and we end up
exploring the n! possible join orders, more or less. For other joins we
quickly prune the space.
PostgreSQL already falls back on geqo when it hits some heuristic that
says exhaustive search is too expensive, but you're proposing an
additional, better heuristic.
True, but most people will never actually hit the GEQO, because the
default threshold are set like this:
join_collapse_limit = 8
geqo_threshold = 12
So the planner will not "create" join search problems with more than 8
relations, but geqo only kicks in at 12. Most systems run with the
default values for these GUCs, so they don't really use GEQO.
FWIW I don't know a lot about the GEQO internals, but I heard it doesn't
work all that well for the join order problem anyway. Not sure.
Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you
gave, the single-table predicate on F makes it small enough, I think,
that F will be the "build" side of any Hash Join, right? You're
assuming, I think, that the cardinality |F| = 1, after applying the
filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <=
k < N. So then the join order does not matter. I think this is what
you mean by "OLTP star join."
I don't think it matters very much on which side of the join the F will
end up (or if it's a hash join, it can easily be NL). It will definitely
be in the first join, though, because all other dimensions join to it
(assuming this is just a starjoin, with only fact + dimensions).
It also doesn't really matter what's the exact cardinality of |F|. The
example used a PK lookup, so that would be 1 row, but the point is that
this is (much) cheaper than the planning. E.g. the planning might take
3ms while the execution only takes 1ms, etc. In the OLAP cases this is
usually not the case, because the queries are processing a lot of data
from the fact table, and the planning is negligible.
For *OLAP* star joins, Oracle's Star Transformation [1] works
reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs
Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq
Scan on F.
I don't care about OLAP star joins, at least no in this patch. It's a
completely different / separate use case, and it affects very different
parts of the planner (and also the executor, which this patch does not
need to touch at all).
So, I suggest:
1. Add an *OLTP* optimization similar to what you described, but
instead of classifying the largest table as fact F, look for the "hub"
of the star and classify it as F. And then enable your optimization if
and only if the estimated nrows for F is very small.
Right. I believe this is mostly what looking for FKs (as suggested by
Tom) would end up doing. It doesn't need to consider the cardinality of
F at all.
2. For an *OLAP* optimization, do something like Oracle's Star
Transformation.
I consider that well outside the scope of this patch.
Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star
queries, because the fact table F is *small* (post-filtering). And
because F is small, it doesn't matter so much in what order you join
the dimension tables, because the result is "likely" to be small as
well.
I don't think that's quite true. The order of dimension joins does not
matter because the joins do not affect the join size at all. The size of
|F| has nothing to do with that, I think. We'll do the same number of
lookups against the dimensions no matter in what order we join them. And
we know it's best to join them as late as possible, after all the joins
that reduce the size (and before joins that "add" rows, I think).
Tom correctly points out that you really need foreign key constraints
to ensure the previous sentence's "likely," but since your
optimization is just intended to avoid considering unnecessary join
orders, you may be able to get away with asking the optimizer what it
thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and
just fall back on the existing join-search logic when the optimizer
thinks that Dk will create lots of rows (and so the join order
matters...).
Possibly, but TBH the join cardinality estimates can be quite dubious
pretty easily. The FK is a much more reliable (definitive) information.
So much for the OLTP case. For completeness, some discussion about the
OLAP case; fwiw, let me start by plugging my "credentials" [2].
Thanks ;-)
The OLAP case is more complicated than the OLTP case, in that the bad
thing about *OLAP* star joins is that joins are pairwise. With OLAP
star joins, you assume that |F| is always much larger than |Dk|, and
by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than
|D(k+1)|. And the problem for OLAP is that while every Dk potentially
filters rows out from F, you have to join to the Dk's one at a time,
so you never get as much filtering as you'd like.For OLAP, you can take the Cartesian product of D1, ..., DN , and then
scan F to aggregate into the resulting cube; see [3] . (Link [2] is
related to transformation.)Or, you can scan D1, ..., DN first, without joining anything,
constructing Hash tables and Bloom filters from your scans; then push
the Bloom filters down to the scan of F; and finally join the
(Bloom-filtered) F back to D1, ..., DN. This is what link [1]
describes. Note that [1] came out before [3].However... for OLAP, you see from the above discussion that it's not
compilation that takes too long, but rather execution. So the
optimizations require significant changes to the SQL executor.
Agreed. I'm not against improving the OLAP case too, but it's not what
this thread was about. It seems it'll need changes in very different
places, etc.
What you're proposing, IIUC, is a nice optimization to compilation
times, which is why (I think) you're focused on the OLTP use case. In
that case, I suggest focusing on an OLTP-specific solution, maybe a
straw man like:
1. I see a query where N-1 relations join to the Nth relation, but not
to each other (except transitively, of course).
2. Estimated cardinality for F, after pushing down single table
predicates, is very small.
3. OK, let's start joining tables D1, ..., D(N-1) in order, since
we're assuming (thanks to (1) and (2)) that the join order won't
matter.
4. Continue joining tables in this fixed (arbitrary) order, unless we
come to a Dk where the optimizer thinks joining to Dk will generate a
significant number of rows.
5. Either we join all tables in order (fast compilation!); or we hit
the case in (4), so we just fall back on the existing join logic.
Yes, I think that's pretty much the idea. Except that I don't think we
need to look at the |F| at all - it will have more impact for small |F|,
of course, but it doesn't hurt for large |F|.
I think it'll probably need to consider which joins increase/decrease
the cardinality, and "inject" the dimension joins in between those.
regards
--
Tomas Vondra
On Fri, Feb 7, 2025 at 12:09 PM Tomas Vondra <tomas@vondra.me> wrote:
...
Yes, I think that's pretty much the idea. Except that I don't think we
need to look at the |F| at all - it will have more impact for small |F|,
of course, but it doesn't hurt for large |F|.I think it'll probably need to consider which joins increase/decrease
the cardinality, and "inject" the dimension joins in between those.
YMMV, but I suspect you may find it much easier to look at |F|, |F
JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| /
|F|, etc. (In other words, I suspect that considering absolute
cardinalities will end up easier/cleaner than considering ratios of
increases/decreases in cardinalities.) But I have not thought about
this much, so I am not putting too much weight on my suspicions.
James
On 2/7/25 23:43, James Hunter wrote:
On Fri, Feb 7, 2025 at 12:09â¯PM Tomas Vondra <tomas@vondra.me> wrote:
...
Yes, I think that's pretty much the idea. Except that I don't think we
need to look at the |F| at all - it will have more impact for small |F|,
of course, but it doesn't hurt for large |F|.I think it'll probably need to consider which joins increase/decrease
the cardinality, and "inject" the dimension joins in between those.YMMV, but I suspect you may find it much easier to look at |F|, |F
JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| /
|F|, etc. (In other words, I suspect that considering absolute
cardinalities will end up easier/cleaner than considering ratios of
increases/decreases in cardinalities.) But I have not thought about
this much, so I am not putting too much weight on my suspicions.
That's not what I meant when I mentioned joins that increase/decrease
cardinality. That wasn't referring to the "dimension" joins, which we
expect to have FK and thus should not affect the cardinality at all.
Instead, I was thinking about the "other" joins (if there are any), that
may add or remove rows. AFAIK we want to join the dimensions at the
place with the lowest cardinality - the discussion mostly assumed the
joins would only reduce the cardinality, in which case we'd just leave
the dimensions until the very end.
But ISTM that may not be necessarily true. Let's say there's a join that
"multiplies" each row. It'll probably be done at the end, and the
dimension joins should probably happen right before it ... not sure.
cheers
--
Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes:
Instead, I was thinking about the "other" joins (if there are any), that
may add or remove rows. AFAIK we want to join the dimensions at the
place with the lowest cardinality - the discussion mostly assumed the
joins would only reduce the cardinality, in which case we'd just leave
the dimensions until the very end.
But ISTM that may not be necessarily true. Let's say there's a join that
"multiplies" each row. It'll probably be done at the end, and the
dimension joins should probably happen right before it ... not sure.
I thought the idea here was to get rid of as much join order searching
as we could. Insisting that we get the best possible plan anyway
seems counterproductive, not to mention very messy to implement.
So I'd just push all these joins to the end and be done with it.
regards, tom lane
On 2/8/25 01:23, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
Instead, I was thinking about the "other" joins (if there are any), that
may add or remove rows. AFAIK we want to join the dimensions at the
place with the lowest cardinality - the discussion mostly assumed the
joins would only reduce the cardinality, in which case we'd just leave
the dimensions until the very end.But ISTM that may not be necessarily true. Let's say there's a join that
"multiplies" each row. It'll probably be done at the end, and the
dimension joins should probably happen right before it ... not sure.I thought the idea here was to get rid of as much join order searching
as we could. Insisting that we get the best possible plan anyway
seems counterproductive, not to mention very messy to implement.
So I'd just push all these joins to the end and be done with it.
Right, cutting down on the join order searching is the point. But most
of the savings comes (I think) from not considering different ordering
of the dimensions, because those are all essentially the same.
Consider a join with 16 relations, 10 of which are dimensions. There are
10! possible orders of the dimensions, but all of them behave pretty
much exactly the same. In a way, this behaves almost like a join with 7
relations, one of which represents all the 10 dimensions.
I think this "join group" abstraction (a relation representing a bunch
of relations in a particular order) would make this reasonably clean to
implement. I haven't tried yet, though.
Yes, this means we'd explore more orderings, compared to just pushing
all the dimensions to the end. In the example above, that'd be 7!/6!, so
up to ~7x orderings.
I don't know if this is worth the extra complexity, of course.
thanks
--
Tomas Vondra
On 2/5/25 09:27, Richard Guo wrote:
On Wed, Feb 5, 2025 at 5:55â¯AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Right now, if we have four tables to join, we have a joinlist
(A B C D). (Really they're integer relids, but let's use names here.)
If we decide to force C to be joined last, it should be sufficient to
convert this to ((A B D) C). If C and D both look like candidates for
this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
This is pretty much the same thing that happens if you set
join_collapse_limit to 1 and use JOIN syntax to force a join order.
In fact, IIRC we start out with nested joinlists and there is some
code that normally flattens them until it decides it'd be creating
too large a sub-problem. I'm suggesting selectively reversing the
flattening.This should not be too difficult to implement. Outer joins seem to
add some complexity, though. We need to ensure that the resulting
joins in each sub-list are legal given the query's join order
constraints. For example, if we make the joinlist be (((A B) C) D),
we need to ensure that the A/B join and the (A/B)/C join are legal.
I've not done anything like this with joins before, but I AFAICS the
interesting stuff happens in deconstruct_recurse(), especially close to
the end where we check join_collapse_limit and do
joinlist = list_make2(leftpart, rightpart);
So I guess one way to "reverse the flattening" could be to do something
in deconstruct_recourse(). But I don't think that'd work all that well,
because of the recursion. We don't want to add a "pipeline break" into
the join list, we want to move the "dimension" to the end - even if only
within the group defined by join_collapse_limit.
E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1
and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say
deconstruct_recurse() sees them in this particular order
[F, T1, T2, D1, D2, T3, T4, T5]
AFAICS doing something in deconstruct_recurse() would likely split the
joinlist into four parts
[F, T1, T2] [D1] [D2] [T3, T4, T5]
which does treat the D1,D2 as if join_collapse_limit=1, but it also
splits the remaining relations into two groups, when we'd probably want
something more like this:
[F, T1, T2, T3, T4, T5] [D1] [D2]
Which should be legal, because a requirement is that D1/D2 don't have
any other join restrictions (I guess this could be relaxed a bit to only
restrictions within that particular group).
Which leads me to the conclusion that the best place to do this kind of
stuff is deconstruct_jointree(), once we have the "complete" joinlist.
We could walk it and reorder/split some of the joinlists again.
regards
--
Tomas Vondra
On Mon, Feb 10, 2025 at 9:32 AM Tomas Vondra <tomas@vondra.me> wrote:
E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1
and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say
deconstruct_recurse() sees them in this particular order[F, T1, T2, D1, D2, T3, T4, T5]
AFAICS doing something in deconstruct_recurse() would likely split the
joinlist into four parts[F, T1, T2] [D1] [D2] [T3, T4, T5]
which does treat the D1,D2 as if join_collapse_limit=1, but it also
splits the remaining relations into two groups, when we'd probably want
something more like this:[F, T1, T2, T3, T4, T5] [D1] [D2]
Which should be legal, because a requirement is that D1/D2 don't have
any other join restrictions (I guess this could be relaxed a bit to only
restrictions within that particular group).
Hmm, I'm still a little concerned about whether the resulting joins
are legal. Suppose we have a join pattern like the one below.
F left join
(D1 inner join T on true) on F.b = D1.b
left join D2 on F.c = D2.c;
For this query, the original joinlist is [F, D1, T, D2]. If we
reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
produce any joins, as the F/T join is not legal.
This may not be the pattern we are targeting. But if we intend to
support it, I think we need a way to ensure that the resulting joins
are legal.
Thanks
Richard
On 2/10/25 08:29, Richard Guo wrote:
On Mon, Feb 10, 2025 at 9:32â¯AM Tomas Vondra <tomas@vondra.me> wrote:
E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1
and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say
deconstruct_recurse() sees them in this particular order[F, T1, T2, D1, D2, T3, T4, T5]
AFAICS doing something in deconstruct_recurse() would likely split the
joinlist into four parts[F, T1, T2] [D1] [D2] [T3, T4, T5]
which does treat the D1,D2 as if join_collapse_limit=1, but it also
splits the remaining relations into two groups, when we'd probably want
something more like this:[F, T1, T2, T3, T4, T5] [D1] [D2]
Which should be legal, because a requirement is that D1/D2 don't have
any other join restrictions (I guess this could be relaxed a bit to only
restrictions within that particular group).Hmm, I'm still a little concerned about whether the resulting joins
are legal. Suppose we have a join pattern like the one below.F left join
(D1 inner join T on true) on F.b = D1.b
left join D2 on F.c = D2.c;For this query, the original joinlist is [F, D1, T, D2]. If we
reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
produce any joins, as the F/T join is not legal.This may not be the pattern we are targeting. But if we intend to
support it, I think we need a way to ensure that the resulting joins
are legal.
It's quite possible the PoC patch I posted fails to ensure this, but I
think the assumption is we'd not reorder joins for dimensions that any
any join order restrictions (except for the FK join).
regards
--
Tomas Vondra
On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas@vondra.me> wrote:
I don't think that's quite true. The order of dimension joins does not
matter because the joins do not affect the join size at all. The size of
|F| has nothing to do with that, I think. We'll do the same number of
lookups against the dimensions no matter in what order we join them. And
we know it's best to join them as late as possible, after all the joins
that reduce the size (and before joins that "add" rows, I think).
This is often not quite true, because there are often restriction
clauses on the fact tables that result in some rows being eliminated.
e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id
JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia';
However, I think that trying to somehow leverage the existence of
either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases,
many of the joins don't change the row count, so we don't really need
to explore all possible orderings of those joins. We might be able to
define some concept of "join that does't change the row count at all"
or maybe better "join that doesn't change the row count very much".
And then if we have a lot of such joins, we can consider them as a
group. Say we have 2 joins that do change the row count significantly,
and then 10 more than don't. The 10 that don't can be done before,
between, or after the two that do, but it doesn't seem necessary to
consider doing some of them at one point and some at another.
Maybe that's not the right way to think about this problem; I haven't
read the academic literature on star-join optimization. But it has
always felt stupid to me that we spend a bunch of time considering
join orders that are not meaningfully different, and I think what
makes two join orders not meaningfully different is that we're
commuting joins that are not changing the row count.
(Also worth noting: even joins of this general form change the row
count, they can only reduce it.)
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Feb 10, 2025 at 5:35 PM Tomas Vondra <tomas@vondra.me> wrote:
On 2/10/25 08:29, Richard Guo wrote:
Hmm, I'm still a little concerned about whether the resulting joins
are legal. Suppose we have a join pattern like the one below.F left join
(D1 inner join T on true) on F.b = D1.b
left join D2 on F.c = D2.c;For this query, the original joinlist is [F, D1, T, D2]. If we
reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
produce any joins, as the F/T join is not legal.This may not be the pattern we are targeting. But if we intend to
support it, I think we need a way to ensure that the resulting joins
are legal.
It's quite possible the PoC patch I posted fails to ensure this, but I
think the assumption is we'd not reorder joins for dimensions that any
any join order restrictions (except for the FK join).
Then, we'll need a way to determine if a given relation has join-order
restrictions, which doesn't seem like a trivial task. We do have the
has_join_restriction() function, but it considers any relations
involved in an outer join as having join restrictions, and that makes
it unsuitable for our needs here.
Thanks
Richard
On 2/10/25 22:36, Robert Haas wrote:
On Fri, Feb 7, 2025 at 3:09â¯PM Tomas Vondra <tomas@vondra.me> wrote:
I don't think that's quite true. The order of dimension joins does not
matter because the joins do not affect the join size at all. The size of
|F| has nothing to do with that, I think. We'll do the same number of
lookups against the dimensions no matter in what order we join them. And
we know it's best to join them as late as possible, after all the joins
that reduce the size (and before joins that "add" rows, I think).This is often not quite true, because there are often restriction
clauses on the fact tables that result in some rows being eliminated.e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id
JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia';
True. I think this was discussed earlier in this thread - dimensions
with additional restrictions may affect the row count, and thus would be
exempt from this (and would instead go through the "regular" join order
search algorithm). So I assumed the "dimensions" don't have any such
restrictions in my message, I should have mentioned that.
However, I think that trying to somehow leverage the existence of
either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases,
many of the joins don't change the row count, so we don't really need
to explore all possible orderings of those joins. We might be able to
define some concept of "join that does't change the row count at all"
or maybe better "join that doesn't change the row count very much".
And then if we have a lot of such joins, we can consider them as a
group. Say we have 2 joins that do change the row count significantly,
and then 10 more than don't. The 10 that don't can be done before,
between, or after the two that do, but it doesn't seem necessary to
consider doing some of them at one point and some at another.Maybe that's not the right way to think about this problem; I haven't
read the academic literature on star-join optimization. But it has
always felt stupid to me that we spend a bunch of time considering
join orders that are not meaningfully different, and I think what
makes two join orders not meaningfully different is that we're
commuting joins that are not changing the row count.(Also worth noting: even joins of this general form change the row
count, they can only reduce it.)
I searched for papers on star-joins, but pretty much everything I found
focuses on the OLAP case. Which is interesting, I think the OLTP
star-join I described seems quite different, and I'm not sure the trade
offs are necessarily the same.
My gut feeling is that in the first "phase" we should focus on the case
with no restrictions - that makes it obvious what the optimal order is,
and it will help a significant fraction of cases.
And then in the next step we can try doing something for cases with
restrictions - be it some sort of greedy algorithm, something that
leverages knowledge of the selectivity to prune join orders early
(instead of actually exploring all N! join orders like today). Or
something else, not sure.
regards
--
Tomas Vondra
On 2/11/25 10:28, Richard Guo wrote:
On Mon, Feb 10, 2025 at 5:35â¯PM Tomas Vondra <tomas@vondra.me> wrote:
On 2/10/25 08:29, Richard Guo wrote:
Hmm, I'm still a little concerned about whether the resulting joins
are legal. Suppose we have a join pattern like the one below.F left join
(D1 inner join T on true) on F.b = D1.b
left join D2 on F.c = D2.c;For this query, the original joinlist is [F, D1, T, D2]. If we
reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to
produce any joins, as the F/T join is not legal.This may not be the pattern we are targeting. But if we intend to
support it, I think we need a way to ensure that the resulting joins
are legal.It's quite possible the PoC patch I posted fails to ensure this, but I
think the assumption is we'd not reorder joins for dimensions that any
any join order restrictions (except for the FK join).Then, we'll need a way to determine if a given relation has join-order
restrictions, which doesn't seem like a trivial task. We do have the
has_join_restriction() function, but it considers any relations
involved in an outer join as having join restrictions, and that makes
it unsuitable for our needs here.
I admit knowing next to nothing about join order planning :-( Could you
maybe explain why it would be non-trivial to determine if a relation has
join-order restrictions? Surely we already determine that, no? So what
would we need to do differently?
Or are you saying that because has_join_restriction() treats each
relation with an outer join as having a restriction, that makes it
unusable for the purpose of this optimization/patch? And we'd need to
invent something more elaborate?
I'm not sure that's quite true. The problem with joining the dimensions
(with inner joins) is *exactly* the lack of restrictions, which means
that explore possible orders of those dimensions (all N! of them). With
the restrictions (e.g. from LEFT JOIN), that's no longer true - in a
way, this is similar to what the patch does. And in fact, replacing the
inner joins with LEFT JOINs makes the queries much faster. I've seen
this used as a workaround to cut down on planning time ...
So I don't think treating outer joins as "having restriction" is a
problem. It doesn't regress any queries, although it might lead to a bit
strange situation that "less restricted" joins are faster to plan.
regards
--
Tomas Vondra
On 2/4/25 22:55, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
The interesting thing about this is we pretty much have all the
infrastructure for detecting such FK-related join conditions
already. Possibly the join order forcing could be done with
existing infrastructure too (by manipulating the joinlist).Maybe, interesting. I've ruled out relying on the FKeys early in the
coding, but I'm sure there's infrastructure the patch could use.It would be very sad to do that work twice in a patch that purports
to reduce planning time. If it's done too late to suit you now,
could we move it to happen earlier?What kind of "manipulation" of the joinlist you have in mind?
Right now, if we have four tables to join, we have a joinlist
(A B C D). (Really they're integer relids, but let's use names here.)
If we decide to force C to be joined last, it should be sufficient to
convert this to ((A B D) C). If C and D both look like candidates for
this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
This is pretty much the same thing that happens if you set
join_collapse_limit to 1 and use JOIN syntax to force a join order.
In fact, IIRC we start out with nested joinlists and there is some
code that normally flattens them until it decides it'd be creating
too large a sub-problem. I'm suggesting selectively reversing the
flattening.regards, tom lane
Here's a patch trying to do it more like this - by manipulating the
lists describing the join problems, before it's passed the the actual
join search algorithm (which is where the PoC patch did this).
I wonder if this is roughly the place where you imagined this would be
done, or if you envision some other issue with this approach. The patch
is definitely incomplete, there's plenty of XXX comments about places
missing some code, etc.
I initially tried to manipulate the joinlist much earlier - pretty much
right at the end of deconstruct_jointree. But that turned out to be way
too early. To identify dimensions etc. we need to check stuff about
foreign keys, join clauses, ... and that's not available that early.
So I think this needs to happen much later in query_planner(), and the
patch does it right before the make_one_rel() call. Maybe that's too
late, but it needs to happen after match_foreign_keys_to_quals(), as it
relies on some of the FK info built by that call. Maybe we could call
match_foreign_keys_to_quals() earlier, but I don't quite see any
benefits of doing that ...
On 2/8/25 02:49, Tomas Vondra wrote:
On 2/8/25 01:23, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
Instead, I was thinking about the "other" joins (if there are any), that
may add or remove rows. AFAIK we want to join the dimensions at the
place with the lowest cardinality - the discussion mostly assumed the
joins would only reduce the cardinality, in which case we'd just leave
the dimensions until the very end.But ISTM that may not be necessarily true. Let's say there's a join that
"multiplies" each row. It'll probably be done at the end, and the
dimension joins should probably happen right before it ... not sure.I thought the idea here was to get rid of as much join order searching
as we could. Insisting that we get the best possible plan anyway
seems counterproductive, not to mention very messy to implement.
So I'd just push all these joins to the end and be done with it.Right, cutting down on the join order searching is the point. But most
of the savings comes (I think) from not considering different ordering
of the dimensions, because those are all essentially the same.Consider a join with 16 relations, 10 of which are dimensions. There are
10! possible orders of the dimensions, but all of them behave pretty
much exactly the same. In a way, this behaves almost like a join with 7
relations, one of which represents all the 10 dimensions.I think this "join group" abstraction (a relation representing a bunch
of relations in a particular order) would make this reasonably clean to
implement. I haven't tried yet, though.Yes, this means we'd explore more orderings, compared to just pushing
all the dimensions to the end. In the example above, that'd be 7!/6!, so
up to ~7x orderings.I don't know if this is worth the extra complexity, of course.
I'm still concerned about regressions when happen to postpone some
expensive dimension joins after a join that increases the cardinality.
And the "join group" would address that. But It probably is not a very
common query pattern, and it'd require changes to join_search_one_level.
I'm not sure what could / should count as 'dimension'. The patch doesn't
implement this yet, but I think it can mostly copy how we match the FK
to the join in get_foreign_key_join_selectivity. There's probably more
to think about, though. For example, don't we need to check NOT NULL /
nullfrac on the referencing side? Also, it probably interacts with
LEFT/OUTER joins. I plan to start strict and then relax and handle some
additional cases.
I'm however struggling with the concept of join order restrictions a
bit. I suspect we need to worry about that when walking the relation
list and figuring out what can be a dimension, but I've never worked
with this, so my mental model of how this works is not great.
Another question is if this planning shortcut (which for the dimensions
mostly picks a random join order) could have some unexpected impact on
the rest of the planning. For example, could we "miss" some join
producing tuples in an interesting order? Or could we fail to consider a
partition-wise join?
Could this "shortcut" restrict the rest of the plan in some undesirable
way? For example, if some of the tables are partitioned, could this mean
we no longer can do partition-wise joins with the (mostly arbitrary)
join order we picked?
There's also a "patch" directory, with some SQL scripts creating two
simple examples of schemas/queries that would benefit from this. The
"create-1/select-1" examples are the simple starjoins, this thread
focuses on. It might be modified to do "snowflake" join, which is
fundamentally a variant of this query type.
The second example (create-2/select-2) is quite different, in that it's
nor a starjoin schema. It still joins one "main" table with multiple
"dimensions", but the FKs go in the other direction (to a single column
in the main table). But it has a very similar bottleneck - the order of
the joins is expensive, but it often does not matter very much, because
the query matches just one row anyway. And even if it returns more rows,
isn't the join order determined just by the selectivity of each join?
Maybe the starjoin optimization could be made to work for this type too?
regards
--
Tomas Vondra
Attachments:
[ sorry for ridiculously slow response to this ]
Tomas Vondra <tomas@vondra.me> writes:
Here's a patch trying to do it more like this - by manipulating the
lists describing the join problems, before it's passed the the actual
join search algorithm (which is where the PoC patch did this).
I wonder if this is roughly the place where you imagined this would be
done, or if you envision some other issue with this approach.
Cool. This is proof-of-concept that manipulating the joinlist can
do what we need done. So we can move on to what heuristics we need
to use.
I initially tried to manipulate the joinlist much earlier - pretty much
right at the end of deconstruct_jointree. But that turned out to be way
too early. To identify dimensions etc. we need to check stuff about
foreign keys, join clauses, ... and that's not available that early.
So I think this needs to happen much later in query_planner(), and the
patch does it right before the make_one_rel() call. Maybe that's too
late, but it needs to happen after match_foreign_keys_to_quals(), as it
relies on some of the FK info built by that call. Maybe we could call
match_foreign_keys_to_quals() earlier, but I don't quite see any
benefits of doing that ...
I don't have a problem with doing it where you did it, but the comment
should explain the placement. What you do have in the comment mostly
belongs with the code, too; it's not the business of the caller. So
in planmain.c something like
+ /*
+ * Try to simplify the join search problem for starjoin-like joins.
+ * This step relies on info about FK relationships, so we can't do it
+ * till after match_foreign_keys_to_quals().
+ */
would be more appropriate IMO. I'd be slightly inclined to put the
GUC test there, too:
+ if (enable_starjoin_join_search)
+ joinlist = starjoin_adjust_joins(root, joinlist);
I agree that you need to worry about join order restrictions,
and that it's not immediately clear how to do that. join_is_legal
would work if we could call it, but the problem is that at this
stage we'll only have RelOptInfos for base rels not join rels.
If we have a joinlist (A B C D) and we are considering treating
C as a dimension table, then the questions we have to ask are:
(a) is it okay to build the (A B D) join without C?
(b) is it okay to join (A B D) to C?
In this simple case, I think (b) must be true if (a) is, but
I'm not quite sure that that's so in more complex cases with
multiple candidates for dimension tables. In any case,
join_is_legal won't help us if we don't have join RelOptInfos.
I'm inclined to start by using has_join_restriction: if that
says "false" for a candidate dimension table, it should be safe
to postpone the join to the dimension table. We might be able
to refine that later.
The second example (create-2/select-2) is quite different, in that it's
nor a starjoin schema. It still joins one "main" table with multiple
"dimensions", but the FKs go in the other direction (to a single column
in the main table). But it has a very similar bottleneck - the order of
the joins is expensive, but it often does not matter very much, because
the query matches just one row anyway. And even if it returns more rows,
isn't the join order determined just by the selectivity of each join?
Maybe the starjoin optimization could be made to work for this type too?
Yeah, I'm slightly itchy about relying on FKs in this heuristic at
all; it doesn't seem like quite the right thing. I think we do want
one side of the join to be joining to a PK or at least a unique index,
but I'm not sure we need to insist on there being an FK relationship.
A couple of minor coding notes:
* There's no point in doing anything (except recursing) if the joinlist
contains fewer than 3 items, and maybe as a further heuristic
this shouldn't kick in till later yet, like 5 or so items.
When there are just a few items, the possibility of missing the
best plan seems to outweigh the minimal savings in plan time.
* Joinlists never contain anything but RangeTblRefs and sub-lists.
See make_rel_from_joinlist.
* Your reconstructed joinlist is overly complicated. Instead of
+ newlist = list_make2(newlist, list_make1(lfirst(lc)));
you could just do
+ newlist = list_make2(newlist, lfirst(lc));
because a single-element subproblem is useless.
I notice that the patch doesn't apply cleanly anymore because of
the introduction of guc_parameters.dat. So here's a v3 that
rebases over that, and I took the liberty of fixing the joinlist
construction as above, but I didn't do anything else.
regards, tom lane
Attachments:
On 9/23/25 21:46, Tom Lane wrote:
[ sorry for ridiculously slow response to this ]
Tomas Vondra <tomas@vondra.me> writes:
Here's a patch trying to do it more like this - by manipulating the
lists describing the join problems, before it's passed the the actual
join search algorithm (which is where the PoC patch did this).
I wonder if this is roughly the place where you imagined this would be
done, or if you envision some other issue with this approach.Cool. This is proof-of-concept that manipulating the joinlist can
do what we need done. So we can move on to what heuristics we need
to use.
Thanks. Good to hear this seems like a reasonable place.
I initially tried to manipulate the joinlist much earlier - pretty much
right at the end of deconstruct_jointree. But that turned out to be way
too early. To identify dimensions etc. we need to check stuff about
foreign keys, join clauses, ... and that's not available that early.So I think this needs to happen much later in query_planner(), and the
patch does it right before the make_one_rel() call. Maybe that's too
late, but it needs to happen after match_foreign_keys_to_quals(), as it
relies on some of the FK info built by that call. Maybe we could call
match_foreign_keys_to_quals() earlier, but I don't quite see any
benefits of doing that ...I don't have a problem with doing it where you did it, but the comment
should explain the placement. What you do have in the comment mostly
belongs with the code, too; it's not the business of the caller. So
in planmain.c something like+ /* + * Try to simplify the join search problem for starjoin-like joins. + * This step relies on info about FK relationships, so we can't do it + * till after match_foreign_keys_to_quals(). + */would be more appropriate IMO.
I agree. I've adopted your wording, and moved the original comment to
starjoin_adjust_joins (with some changes).
I'd be slightly inclined to put the GUC test there, too:
+ if (enable_starjoin_join_search) + joinlist = starjoin_adjust_joins(root, joinlist);
I'm not sure I like this very mcuh. No other call in query_planner()
does it like that. Most don't have such GUC, but at least
remove_useless_self_joins does, and it still doesn't check it here.
I agree that you need to worry about join order restrictions,
and that it's not immediately clear how to do that. join_is_legal
would work if we could call it, but the problem is that at this
stage we'll only have RelOptInfos for base rels not join rels.
If we have a joinlist (A B C D) and we are considering treating
C as a dimension table, then the questions we have to ask are:
(a) is it okay to build the (A B D) join without C?
(b) is it okay to join (A B D) to C?In this simple case, I think (b) must be true if (a) is, but
I'm not quite sure that that's so in more complex cases with
multiple candidates for dimension tables. In any case,
join_is_legal won't help us if we don't have join RelOptInfos.I'm inclined to start by using has_join_restriction: if that
says "false" for a candidate dimension table, it should be safe
to postpone the join to the dimension table. We might be able
to refine that later.
Thanks. I agree has_join_restriction seems like a good start, I'll give
that a try in the next patch version.
When thinking about this, I realized the has_join_restriction() is only
ever used in join_search_one_level(), i.e. when dealing with each small
join order problem. Doesn't this mean the deconstructed jointree must
already consider the restrictions in some way? I don't see any explicit
mentions of such join order restrictions in deconstruct_recurse. It must
not violate any ordering restrictions by splitting the joins in a
"wrong" way, right? If I set join_collapse_limit=1 it still needs to
satisfy all the rules.
I was wondering if maybe we could piggy-back on that, somehow. But maybe
that's not very practical, and has_join_restriction() is the way to go.
It's been a while since I looked at this patch, so it's possible I
already concluded that wouldn't work, and forgot about it :-/
The second example (create-2/select-2) is quite different, in that it's
nor a starjoin schema. It still joins one "main" table with multiple
"dimensions", but the FKs go in the other direction (to a single column
in the main table). But it has a very similar bottleneck - the order of
the joins is expensive, but it often does not matter very much, because
the query matches just one row anyway. And even if it returns more rows,
isn't the join order determined just by the selectivity of each join?
Maybe the starjoin optimization could be made to work for this type too?Yeah, I'm slightly itchy about relying on FKs in this heuristic at
all; it doesn't seem like quite the right thing. I think we do want
one side of the join to be joining to a PK or at least a unique index,
but I'm not sure we need to insist on there being an FK relationship.
True, requiring the FK may be unnecessary in this case. We do need to
guarantee the cardinality does not change, but a UNIQUE + LEFT JOIN
seems to be enough for that.
BTW the v3 lost the patch/ directory. I assume that wasn't intentional,
so I added it back into v4.
A couple of minor coding notes:
* There's no point in doing anything (except recursing) if the joinlist
contains fewer than 3 items, and maybe as a further heuristic
this shouldn't kick in till later yet, like 5 or so items.
When there are just a few items, the possibility of missing the
best plan seems to outweigh the minimal savings in plan time.* Joinlists never contain anything but RangeTblRefs and sub-lists.
See make_rel_from_joinlist.* Your reconstructed joinlist is overly complicated. Instead of
+ newlist = list_make2(newlist, list_make1(lfirst(lc)));
you could just do
+ newlist = list_make2(newlist, lfirst(lc));
because a single-element subproblem is useless.
I notice that the patch doesn't apply cleanly anymore because of
the introduction of guc_parameters.dat. So here's a v3 that
rebases over that, and I took the liberty of fixing the joinlist
construction as above, but I didn't do anything else.
Thanks. I agree with all of these comments, and updated v4 accordingly.
cfbot started complaining about guc_parameters.dat and a couple more
things, so v4 fixes that too.
regards
--
Tomas Vondra
Attachments:
[ Don't have time to read the v4 patch right now, but a couple
of quick responses: ]
Tomas Vondra <tomas@vondra.me> writes:
On 9/23/25 21:46, Tom Lane wrote:
I'd be slightly inclined to put the GUC test there, too:
+ if (enable_starjoin_join_search) + joinlist = starjoin_adjust_joins(root, joinlist);
I'm not sure I like this very mcuh. No other call in query_planner()
does it like that. Most don't have such GUC, but at least
remove_useless_self_joins does, and it still doesn't check it here.
Fair enough, it was just a suggestion.
When thinking about this, I realized the has_join_restriction() is only
ever used in join_search_one_level(), i.e. when dealing with each small
join order problem. Doesn't this mean the deconstructed jointree must
already consider the restrictions in some way? I don't see any explicit
mentions of such join order restrictions in deconstruct_recurse. It must
not violate any ordering restrictions by splitting the joins in a
"wrong" way, right? If I set join_collapse_limit=1 it still needs to
satisfy all the rules.
Performing outer joins in syntactic order is always OK by definition,
and setting join_collapse_limit to 1 just forces that to happen.
So I guess you could say that the original jointree "considers the
restrictions", and it's only after we flatten an outer join's two
sides into a joinlist (along with other rels) that we have to worry.
Or is that not what you meant?
regards, tom lane
On 11/8/25 21:36, Tom Lane wrote:
[ Don't have time to read the v4 patch right now, but a couple
of quick responses: ]Tomas Vondra <tomas@vondra.me> writes:
On 9/23/25 21:46, Tom Lane wrote:
I'd be slightly inclined to put the GUC test there, too:
+ if (enable_starjoin_join_search) + joinlist = starjoin_adjust_joins(root, joinlist);I'm not sure I like this very mcuh. No other call in query_planner()
does it like that. Most don't have such GUC, but at least
remove_useless_self_joins does, and it still doesn't check it here.Fair enough, it was just a suggestion.
When thinking about this, I realized the has_join_restriction() is only
ever used in join_search_one_level(), i.e. when dealing with each small
join order problem. Doesn't this mean the deconstructed jointree must
already consider the restrictions in some way? I don't see any explicit
mentions of such join order restrictions in deconstruct_recurse. It must
not violate any ordering restrictions by splitting the joins in a
"wrong" way, right? If I set join_collapse_limit=1 it still needs to
satisfy all the rules.Performing outer joins in syntactic order is always OK by definition,
and setting join_collapse_limit to 1 just forces that to happen.
So I guess you could say that the original jointree "considers the
restrictions", and it's only after we flatten an outer join's two
sides into a joinlist (along with other rels) that we have to worry.
Or is that not what you meant?
I'm not sure, but I wasn't talking just about outer joins.
AFAICS even queries with inner joins will get the jointree deconstructed
like this. Consider the query from select-1.sql:
select * from t
join dim1 on (dim1.id = id1)
join dim2 on (dim2.id = id2)
join dim3 on (dim3.id = id3)
join dim4 on (dim4.id = id4)
join dim5 on (dim5.id = id5)
join dim6 on (dim6.id = id6)
join dim7 on (dim7.id = id7);
If I set join_collapse_limit=1, then standard_join_search() only sees
problems of size 2, i.e. (list_length(initial_rels) == 2). And we only
look at has_join_restriction() *inside* these small problems, i.e. the
jointree must not be deconstructed in a way that would violate this.
Doesn't that mean deconstruct_jointree() has to somehow "consider" the
join restrictions (even if not explicitly). It that's the case, could we
maybe leverage that, eliminating the need to call has_join_restriction?
It's just a hunch, though. Maybe splitting the jointree like this is
always legal, because deconstruct_jointree() does not try to "reorder"
the elements.
regards
--
Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes:
If I set join_collapse_limit=1, then standard_join_search() only sees
problems of size 2, i.e. (list_length(initial_rels) == 2). And we only
look at has_join_restriction() *inside* these small problems, i.e. the
jointree must not be deconstructed in a way that would violate this.
Doesn't that mean deconstruct_jointree() has to somehow "consider" the
join restrictions (even if not explicitly).
It mustn't build subproblems that don't have any legal solutions, sure.
But that is automatic given that it only folds up within the syntactic
structure --- it doesn't go combining rels from random places within
the jointree.
It that's the case, could we
maybe leverage that, eliminating the need to call has_join_restriction?
Don't see how. Once we've folded an outer join into an upper
subproblem, some but (usually) not all of the join orders within that
subproblem will be legal.
It might be that we could make has_join_restriction and friends
faster/simpler with some other representation of the join tree.
I've not really thought hard about alternatives.
regards, tom lane
On 11/9/25 19:42, Tom Lane wrote:
Tomas Vondra <tomas@vondra.me> writes:
If I set join_collapse_limit=1, then standard_join_search() only sees
problems of size 2, i.e. (list_length(initial_rels) == 2). And we only
look at has_join_restriction() *inside* these small problems, i.e. the
jointree must not be deconstructed in a way that would violate this.Doesn't that mean deconstruct_jointree() has to somehow "consider" the
join restrictions (even if not explicitly).It mustn't build subproblems that don't have any legal solutions, sure.
But that is automatic given that it only folds up within the syntactic
structure --- it doesn't go combining rels from random places within
the jointree.
Ah, I see. I didn't realize it's driven purely by the syntactic
structure, I got convinced it's aware of more stuff. But yeah, this
means it can't help the patch.
It that's the case, could we
maybe leverage that, eliminating the need to call has_join_restriction?Don't see how. Once we've folded an outer join into an upper
subproblem, some but (usually) not all of the join orders within that
subproblem will be legal.It might be that we could make has_join_restriction and friends
faster/simpler with some other representation of the join tree.
I've not really thought hard about alternatives.
No idea. I'm not familiar enough with this to have good ideas on how to
rework it, and I assume has_join_restriction will be cheap enough for
this patch.
regards
--
Tomas Vondra
Hi,
Here's a v5, addressing some (but not all) of the things discussed
earlier in this thread.
This does nothing about the "opposite" type of join, with many small
tables linking to a single "central" table. I'm not convinced it makes
sense to handle that together with starjoins. But I haven't tried yet.
The main improvements/changes in v5 are:
1) comment cleanup / clarification
----------------------------------
A lot of comments was stale, or open questions answered since the
comment was written. So clean that up. Rewording and clarification of
various comments - a lot of places talking about the same thing, so I
de-duplicated that (I hope).
2) starjoin_match_to_foreign_key: improved matching FK to join clauses
----------------------------------------------------------------------
The check is split into starjoin_foreign_key_matched_by_clauses and
starjoin_clauses_matched_by_foreign_key, each doing the check in one
direction. It might be possible to do both in a single function, but I
don't think it'd be more efficient and this seemed simpler.
I was a bit surprised it doesn't need to inspect the EC members, it's
enough to check if the FK matches the EC. At least I don't think it's
needed, and it seems to be working without it. Maybe there's some more
complex case where we actually need to look at the members?
The one remaining XXX is about ensuring the referencing table has the FK
columns marked as NOT NULL, to guarantee the join does not reduce
cardinality. But it's related to the question of OUTER joins, which is
discussed later.
3) has_join_restriction(), but allowing some join order restrictions
--------------------------------------------------------------------
starjoin_is_dimension() now calls has_join_restriction(), and for inner
joins this works fine. But as soon as there's an outer join (say, left
join), it disabled the simplified planning. I find that unfortunate, but
I think we can actually do better - IIUC it's OK to treat a relation
with join restrictions as a dimension, as long as we don't reorder
relations with restrictions (the sublist).
Consider a join list with 8 baserels, and an empty list of dimensions.
[F, E1, D2, E3, D4, E5, D6, E7] []
The "F" is the fact, "D" are dimensions, "E" are non-dimension tables.
We can simply move the "D" rels to the dimension list:
[F, E1, E3, E5, E7] [D2, D4, D6]
The v4 patch would have stopped here, but I think we can do better - we
can move the "E" rels to the dimension list, as long as the only reason
preventing that was "has_join_restriction". We move the rels to the
beginning of the dimensions list, to keep the syntactic join order. And
we stop once we find the first rel that can't be moved (due to not
having a matching FK or has WHERE filter).
starjoin_adjust_joins does that by walking the filter repeatedly, and
stopping when it finds no dimension. I now realize it won't actually
need more than two loops (and one to find nothing) but that's a detail.
This is related to the NOT NULL vs. outer join check mentioned in (2),
because the restrictions are usually due to outer joins, and outer joins
don't need the check (and probably even can't have that). But I'm not
sure what's the best way to check if the order restriction is due to
some kind of outer join, or something else. Not sure.
I kept this separated in 0002, for easier review.
snowflake joins
---------------
There's another possible improvement that I'd like to address in the
next version of the patch - handling snowflake schemas. Right now the
leaf dimensions will be handled fine, but the "inner" dimensions won't
because they reference other tables (the leafs). Which gets rejected by
starjoin_clauses_matched_by_foreign_key, because those join clauses are
not matched by the incoming FK.
I plan to modify starjoin_is_dimension() to allow join clauses pointing
to "known" dimensions, so that the next loop can add the "inner" ones.
some experiments
----------------
To verify if this is effective, I ran the two starjoin and snowflake
selects (select-1 and select-3) with inner/outer joins, on master and
with 0001 and 0002. There's a "run.sh" script in "patch/" directory.
The results look like this:
| master | 0001/off 0001/on | 0002/off 0002/on
------------------|---------|-------------------|------------------
starjoin inner | 2299 | 2295 15325 | 2299 15131
starjoin outer | 2270 | 2272 2257 | 2249 14312
snowflake inner | 2718 | 2667 7223 | 2654 7106
snowflake outer | 2282 | 2264 2254 | 2273 6210
This is throughput (tps) from a single pgbench run with a single client.
It's quite stable, but there's a bit of noise.
The master and "off" results are virtually the same (and it gives you a
good idea of how much noise is there). 0001 helps with inner joins, but
not the outer joins (because of the restrictions). 0002 fixes that.
The improvement for snowflake joins is smaller because of the "inner"
dimensions. The current patches identify only the leaf ones.
regards
--
Tomas Vondra
Attachments:
On Wed, Nov 12, 2025 at 04:39:50PM +0100, Tomas Vondra wrote:
Hi,
Here's a v5, addressing some (but not all) of the things discussed
earlier in this thread.This does nothing about the "opposite" type of join, with many small
tables linking to a single "central" table. I'm not convinced it makes
sense to handle that together with starjoins. But I haven't tried yet.
I read this thread and the patch. I have a few questions which might
have already been answered but they used terminology I might not have
understood. I want to explain what I think is happening and perhaps
someone can tell me if these ideas are new or are already covered.
So, assume a fact table with a primary-key first column, and ten more
columns, each with its own dimension table.
So, a star join would query the fact table with some filter, and then
join each of the ten columns to its own dimension table, e.g.,
fact.col2 joins to dim2.primary_key
fact.col3 joins to dim3.primary_key
...
and the problem is that the dimension tables don't join to each other,
but only to the fact table, and our existing optimizer considers join
orders of:
F to D2
F to D3
...
and then
F to D3
F to D4
...
and there are 10! possible combinations, and Tomas is saying that the
dimension tables are almost all the same in their affect on the row
count, so why bother to consider all 10! join orders. Also, some joins
might increase the row count, so a foreign key guarantees only one row
will be matched in the dimension.
I found this code comment in the recent patch which is helpful:
+ * The query may involve joins to additional (non-dimension) tables, and
+ * those may alter cardinality in either direction. In principle, it'd be
+ * best to first perform all the joins that reduce join size, then join all
+ * the dimensions, and finally perform joins that may increase the join
+ * size. Imagine a joinlist:
+ *
+ * (D1, D2, A, B, F)
+ *
+ * with fact F, dimensions D1 and D2, and non-dimensions A and B. If A
+ * increases cardinality, and B does not (or even reduces it), we could
+ * use this join tree:
+ *
+ * (A, (D2, (D1, (B, F))))
+ *
+ * For now we simply leave the dimension joins at the end, assuming
+ * that the earlier joins did not inflate the join too much.
And then there is the problem of detecting when this happens.
I think my big question is, when we join A->B->C->D, we do a lot of work
in the optimizer to figure out what order to use, but when we do A->B,
A->C, A->D, why are we spending the same amount of optimizer effort?
Could we just order B, C, D in order of which have restrictions, or
based on size? I assume we can't because we don't know if A->C or
another join would increase the number of rows, and this patch is saying
if there is a foreign key relationship, they can't increase the rows so
just short-circuit and order them simply. Is that accurate?
Crazy question, if we have A->B, A->C, and A->D, why can't we just sort
B,C,D based in increasing order of adding rows to the result, and just
use that ordering, without requiring foreign keys?
I am very glad someone is working on this problem.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On 11/14/25 22:07, Bruce Momjian wrote:
On Wed, Nov 12, 2025 at 04:39:50PM +0100, Tomas Vondra wrote:
Hi,
Here's a v5, addressing some (but not all) of the things discussed
earlier in this thread.This does nothing about the "opposite" type of join, with many small
tables linking to a single "central" table. I'm not convinced it makes
sense to handle that together with starjoins. But I haven't tried yet.I read this thread and the patch. I have a few questions which might
have already been answered but they used terminology I might not have
understood. I want to explain what I think is happening and perhaps
someone can tell me if these ideas are new or are already covered.So, assume a fact table with a primary-key first column, and ten more
columns, each with its own dimension table.So, a star join would query the fact table with some filter, and then
join each of the ten columns to its own dimension table, e.g.,fact.col2 joins to dim2.primary_key
fact.col3 joins to dim3.primary_key
...and the problem is that the dimension tables don't join to each other,
but only to the fact table, and our existing optimizer considers join
orders of:F to D2
F to D3
...and then
F to D3
F to D4
...and there are 10! possible combinations, and Tomas is saying that the
dimension tables are almost all the same in their affect on the row
count, so why bother to consider all 10! join orders. Also, some joins
might increase the row count, so a foreign key guarantees only one row
will be matched in the dimension.
Right. A join between F and a "dimension" has the same cardinality as F,
i.e. each row in F has one matching row in the dimension. It's entirely
irrelevant in which order we actually join the dimensions.
And then there may be some other joins that affect the cardinality, and
the question is what to do about these ...
I found this code comment in the recent patch which is helpful:
+ * The query may involve joins to additional (non-dimension) tables, and + * those may alter cardinality in either direction. In principle, it'd be + * best to first perform all the joins that reduce join size, then join all + * the dimensions, and finally perform joins that may increase the join + * size. Imagine a joinlist: + * + * (D1, D2, A, B, F) + * + * with fact F, dimensions D1 and D2, and non-dimensions A and B. If A + * increases cardinality, and B does not (or even reduces it), we could + * use this join tree: + * + * (A, (D2, (D1, (B, F)))) + * + * For now we simply leave the dimension joins at the end, assuming + * that the earlier joins did not inflate the join too much.And then there is the problem of detecting when this happens.
I think my big question is, when we join A->B->C->D, we do a lot of work
in the optimizer to figure out what order to use, but when we do A->B,
A->C, A->D, why are we spending the same amount of optimizer effort?
I'm sorry, I don't quite understand what's the question here. What does
A->B->C->D mean, exactly?
The standard join algorithm is (intentionally) generic, it handles all
kinds of joins, and it simply doesn't have any special cases for joins
with a particular structure (like a starjoin).
Could we just order B, C, D in order of which have restrictions, or
based on size? I assume we can't because we don't know if A->C or
another join would increase the number of rows, and this patch is saying
if there is a foreign key relationship, they can't increase the rows so
just short-circuit and order them simply. Is that accurate?Crazy question, if we have A->B, A->C, and A->D, why can't we just sort
B,C,D based in increasing order of adding rows to the result, and just
use that ordering, without requiring foreign keys?
Yeah, that's mostly the idea one of the comments in the patch suggests.
To do the joins that reduce the cardinality first, then the dimensions,
and finally the joins that increase the cardinality.
However, the examples make it look like we're joining pairs of tables,
but that's not necessarily true. The join may be between F and relation
that is really a join itself. And now you need to know how this join
changes the cardinality, which is more expensive than when only looking
at joins of pairs of tables.
So I think we'd need to first identify these "independent join groups"
first. But that seems non-trivial, because we don't know which table to
start from (we don't know what's the "F" table).
I'm sure there's an algorithm to decompose the join tree like this, but
I'm not sure how expensive / invasive it'd be. The premise of this patch
is that it's a cheap optimization, that doesn't need to do this.
It simply "peels off" the dimensions from the join, based on things it
can prove locally, without having to decompose the whole jointree.
In any case, this is my current understanding of the problem. It's
entirely possible I'm missing some obvious solution, described in a
paper somewhere.
regards
--
Tomas Vondra
On Sat, Nov 15, 2025 at 01:41:04AM +0100, Tomas Vondra wrote:
And then there is the problem of detecting when this happens.
I think my big question is, when we join A->B->C->D, we do a lot of work
in the optimizer to figure out what order to use, but when we do A->B,
A->C, A->D, why are we spending the same amount of optimizer effort?I'm sorry, I don't quite understand what's the question here. What does
A->B->C->D mean, exactly?
It means table A joins B, and B joins C, and C joins D. I can see that
as a much harder problem, and one we have code for in the optimizer,
than A joining to B, C, and D.
Could we just order B, C, D in order of which have restrictions, or
based on size? I assume we can't because we don't know if A->C or
another join would increase the number of rows, and this patch is saying
if there is a foreign key relationship, they can't increase the rows so
just short-circuit and order them simply. Is that accurate?Crazy question, if we have A->B, A->C, and A->D, why can't we just sort
B,C,D based in increasing order of adding rows to the result, and just
use that ordering, without requiring foreign keys?Yeah, that's mostly the idea one of the comments in the patch suggests.
To do the joins that reduce the cardinality first, then the dimensions,
and finally the joins that increase the cardinality.
Yes, I saw that.
However, the examples make it look like we're joining pairs of tables,
but that's not necessarily true. The join may be between F and relation
that is really a join itself. And now you need to know how this join
changes the cardinality, which is more expensive than when only looking
at joins of pairs of tables.
Yes, if the join is from F to D, and D joins to something else,
especially if it is not a foreign key where we know there is only one
match, I think we have to give up and go back to the normal optimizer
process.
So I think we'd need to first identify these "independent join groups"
first. But that seems non-trivial, because we don't know which table to
start from (we don't know what's the "F" table).
Do we easily know how many relations each relation joins to? Does that
help us?
I'm sure there's an algorithm to decompose the join tree like this, but
I'm not sure how expensive / invasive it'd be. The premise of this patch
is that it's a cheap optimization, that doesn't need to do this.
Yeah, I can see expense being an issue, which explains, as you said, why
many other databases have star join "flags", which ideally we can avoid
since very few people are going to use the flag.
It simply "peels off" the dimensions from the join, based on things it
can prove locally, without having to decompose the whole jointree.
I see, and I think understand better now. Thanks.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On Tue, Feb 04, 2025 at 03:00:49PM +0100, Tomas Vondra wrote:
I'm late to this party. I have apps that do start queries like this a
lot. These apps fall into this type:
Or maybe the fact table is "users" and the dimensions have all kinds of
info about the user (address, primary e-mail address, balance, ...).
In my case the schema is an EAV schema, but essentially it stores users,
groups, and many other such things -- the sorts you expect to find in a
directory service.
Some unsolicited advice:
- the table source with the longest index key prefix (or full key)
determined by WHERE clause terms is the table that should lead the
query plan
This is really important for the case where the query is a VIEW and
the WHERE clause terms can get pushed into it, then:
SELECT ...
FROM t0
JOIN t1 ON ...
JOIN t2 ON ...
..
JOIN tN ON ...
-- tX's PK is (a, b, c), or maybe (a, b, c, d), but (a, b, c) is a
-- very selective prefix of that PK index, so the query plan should
-- start with tX
WHERE tX.a = ... AND tX.b = ... AND tX.c = ...
- if there is no such table source then we're asking for "all the
data", and we might as well start with a full table scan of some
table source, but, which? my answer: the lexically first one in the
query! (why? because it gives the query author the power to pick
which table goes first in this case)
SELECT ...
FROM t0
JOIN ...
...; -- no WHERE clause or no terms that provide values for prefixes
-- of indices that we could use to find a good choice of
-- starting table, so start with t0
Anyway, this pattern is quite common, yet it performs quite poorly.
Yes.
But for starjoins, a lot of this is not really needed. In the simplest
case (no conditions on dimensions etc) it does not really matter in what
order we join those, and filters on dimensions make it only a little bit
But here you can just use the order that the SQL uses. It gives the
author some power.
more complicated (join the most selective first).
Yes.
So I've been wondering how difficult would it be to have a special
fast-path mode for starjoins, completely skipping most of this. I
cobbled together a WIP/PoC patch (attached) on the way from FOSDEM, and
it seems to help quite a bit.
Yay! _Thank you_!
So that about triples the throughput. If you bump join_collapse_limit to
12, it gets even clearerbuild 1 16
--------------------------------------
master 200 2000
patched 4500 48000That's a 20x improvement - not bad. Sure, this is on a tiny dataset, and
with larger data sets it might need to do I/O, diminishing the benefits.
It's just an example to demonstrate the benefits.
Fantastic! I can't wait to use this in prod.
But the bigger question is whether it makes sense to have such fast-path
modes for certain query shapes. The patch "hard-codes" the planning for
starjoin queries, but we clearly can't do that for every possible join
shape (because then why have dynamic join search at all?).
IMO: Yes for starjoins. The reason is:
I do think starjoins might be sufficiently unique / special to justify
this, but maybe it would be possible to instead improve the regular join
order search to handle this case better? I don't have a very clear idea
what would that look like, though :-(
If you're not pessimizing other cases and you're getting a 4x to 20x
improvement then the uniqueness of the starjoin case and the frequent
use of starjoin queries makes this fast-path worthwhile.
Moreover, I think in general if you can cheaply determine a "kind" of
query and then apply query plan searches / optimizations that are most
relevant to the query's "kind" then I think that's a good way to unlock
more useful optimizations.
I did check what do some other databases do, and they often have some
sort of "hint" to nudge the let the optimizer know this is a starjoin.
I don't think a hint is needed here.
BTW, I like hints, but only out-of-band, not embedded in the SQL.
Unfortunately out-of-band hinting is not really viable because to
introduce it one would need new APIs (and possibly protocol work).
Nico
--
Nico Williams <nico@cryptonector.com> writes:
Some unsolicited advice:
...
But here you can just use the order that the SQL uses. It gives the
author some power.
If that's the approach you want, it's been possible for decades:
"set join_collapse_limit = 1" and away you go. I don't feel a
need to invent a different version of that for star schemas.
I do not think this patch should have ambitions beyond the stated
one of avoiding useless join-order search effort. If you try to
load more than that onto the plate you'll probably end in failure.
regards, tom lane
On Fri, Nov 14, 2025 at 09:52:41PM -0500, Bruce Momjian wrote:
On Sat, Nov 15, 2025 at 01:41:04AM +0100, Tomas Vondra wrote:
And then there is the problem of detecting when this happens.
I think my big question is, when we join A->B->C->D, we do a lot of work
in the optimizer to figure out what order to use, but when we do A->B,
A->C, A->D, why are we spending the same amount of optimizer effort?I'm sorry, I don't quite understand what's the question here. What does
A->B->C->D mean, exactly?It means table A joins B, and B joins C, and C joins D. I can see that
as a much harder problem, and one we have code for in the optimizer,
than A joining to B, C, and D.
I guess fundamentally it is the case of splitting a big problem into
smaller ones, e.g., 2 + 3 + 3 = 8, but 2! * 3! * 3! = 72 and 8! = 40320.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On 11/15/25 16:57, Tom Lane wrote:
Nico Williams <nico@cryptonector.com> writes:
Some unsolicited advice:
...
But here you can just use the order that the SQL uses. It gives the
author some power.If that's the approach you want, it's been possible for decades:
"set join_collapse_limit = 1" and away you go. I don't feel a
need to invent a different version of that for star schemas.I do not think this patch should have ambitions beyond the stated
one of avoiding useless join-order search effort. If you try to
load more than that onto the plate you'll probably end in failure.
FWIW I certainly agree with this. The motivation is to speed up planning
with starjoin-like queries, and that's still the primary goal. If it
could handle more complex cases (snowflake, inverse starjoin), great.
But I have no ambition to make it somehow generic and much more complex.
The main thing I'm really unsure about is what to do about joins that
increase the cardinality. AFAICS that's the only possible regression if
we simply move joins with dimensions at the end. Not sure what to do
about that before the actual join search ...
regards
--
Tomas Vondra
I spent a little time staring at the v5 patches. Obviously there
are a bunch of minor details to be verified, which you've carefully
provided XXX comments about, and I didn't really go through those
yet. There are two big-picture questions that are bothering me:
1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses. ISTM that a typical
query might be like
select sum(o.total_price) from
orders o
join customers c on c.id = o.c_id
join products p on p.id = o.p_id
where c.customer_name = 'Wile E Coyote'
and p.product_name = 'Rocket Skates';
The only reason to join a dimension table that lacks a restriction
clause is if you need some of its fields in the output, which you
might but I'm not sure that's such a common case. (Have you got
evidence to the contrary?) So I feel like we're not going to be
getting all that much win if we are not willing to treat such tables
as dimension tables. We could do something simplistic like order
those dimensions by the selectivity of their baserestrict clauses,
joining the most-restricted ones first and any restriction-free ones
last.
2. I'm pretty un-excited about the 0002 patch altogether. I'm having
a hard time visualizing cases where it helps, other than left joins
to dimension tables which I don't really think are common either.
I did a bit of poking around on the net and found that it seems to
be common to restrict star-join optimizations to equijoins (e.g.
SAP says explicitly that they only handle that case). I think we'd
be better off to focus on the allow-baserestrict-clauses extension
than the allow-join-order-restrictions extension.
regards, tom lane
On Fri, Nov 21, 2025 at 03:14:15PM -0500, Tom Lane wrote:
I spent a little time staring at the v5 patches. Obviously there
are a bunch of minor details to be verified, which you've carefully
provided XXX comments about, and I didn't really go through those
yet. There are two big-picture questions that are bothering me:1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses. ISTM that a typical
query might be likeselect sum(o.total_price) from
orders o
join customers c on c.id = o.c_id
join products p on p.id = o.p_id
where c.customer_name = 'Wile E Coyote'
and p.product_name = 'Rocket Skates';
Yes, I am sure it is typical because I have seen cartoons use exactly
those products. ;-)
The only reason to join a dimension table that lacks a restriction
clause is if you need some of its fields in the output, which you
might but I'm not sure that's such a common case. (Have you got
evidence to the contrary?) So I feel like we're not going to be
getting all that much win if we are not willing to treat such tables
as dimension tables. We could do something simplistic like order
those dimensions by the selectivity of their baserestrict clauses,
joining the most-restricted ones first and any restriction-free ones
last.
Oh, I thought the patch already did this, e.g., the patch was going to
make groups, e.g., foreign keys with restrictions, foreign keys without
restrictions, and no foreign key (might add rows). The first group was
going to be sorted by their selectivity, and the last group was going to
be sorted by how much they add rows, if that is possible.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On 11/21/25 21:14, Tom Lane wrote:
I spent a little time staring at the v5 patches. Obviously there
are a bunch of minor details to be verified, which you've carefully
provided XXX comments about, and I didn't really go through those
yet. There are two big-picture questions that are bothering me:1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses. ISTM that a typical
query might be likeselect sum(o.total_price) from
orders o
join customers c on c.id = o.c_id
join products p on p.id = o.p_id
where c.customer_name = 'Wile E Coyote'
and p.product_name = 'Rocket Skates';The only reason to join a dimension table that lacks a restriction
clause is if you need some of its fields in the output, which you
might but I'm not sure that's such a common case. (Have you got
evidence to the contrary?) So I feel like we're not going to be
getting all that much win if we are not willing to treat such tables
as dimension tables. We could do something simplistic like order
those dimensions by the selectivity of their baserestrict clauses,
joining the most-restricted ones first and any restriction-free ones
last.
Good question. I don't have a great evidence such joins to dimensions
(without additional restrictions) are a common case. It's partially a
guess and partially based on my past experience.
I have seen a lot of such joins in analytical workloads, where the join
is followed by an aggregation, with GROUP BY referencing attributes from
the dimensions. Of course, that may be an argument against worrying
about the planning too much, because with enough data the timing is
going to be dominated by the join/aggregation execution. However, it's
surprising how little data many analytical workloads actually access, so
it's not that clear.
The other use case I've been thinking about is poorly written queries,
joining more tables than needed. A traditional example is an ORM loading
more data than needed, to load the whole "object". I don't know how
prevalent this is today - it used to be a pretty common issue, and I
doubt it improved. I think it's not that different from the self-join
removal (the tradeoffs may be different, of course). I realize we try
not to add complexity for such cases, especially if it might hurt well
written queries.
Actually, I initially investigated at the opposite example, i.e. all
dimensions joining to the fact.id, see create-2/select-2 scripts. And
then I realized starjoins have mostly the same issue. But it's true the
v5 patch does not actually help this original query.
2. I'm pretty un-excited about the 0002 patch altogether. I'm having
a hard time visualizing cases where it helps, other than left joins
to dimension tables which I don't really think are common either.
I did a bit of poking around on the net and found that it seems to
be common to restrict star-join optimizations to equijoins (e.g.
SAP says explicitly that they only handle that case). I think we'd
be better off to focus on the allow-baserestrict-clauses extension
than the allow-join-order-restrictions extension.
I recall seen such queries (with LEFT joins) in analytics workloads, but
it's definitely less common than inner starjoins. So I agree focusing on
allowing baserestrict clauses is probably more useful/important.
FWIW I tried searching for more info too, but all the SAP pages
suggested by google return 404 to me :-(
regards
--
Tomas Vondra
On 11/21/25 21:47, Bruce Momjian wrote:
On Fri, Nov 21, 2025 at 03:14:15PM -0500, Tom Lane wrote:
I spent a little time staring at the v5 patches. Obviously there
are a bunch of minor details to be verified, which you've carefully
provided XXX comments about, and I didn't really go through those
yet. There are two big-picture questions that are bothering me:1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses. ISTM that a typical
query might be likeselect sum(o.total_price) from
orders o
join customers c on c.id = o.c_id
join products p on p.id = o.p_id
where c.customer_name = 'Wile E Coyote'
and p.product_name = 'Rocket Skates';Yes, I am sure it is typical because I have seen cartoons use exactly
those products. ;-)
;-)
The only reason to join a dimension table that lacks a restriction
clause is if you need some of its fields in the output, which you
might but I'm not sure that's such a common case. (Have you got
evidence to the contrary?) So I feel like we're not going to be
getting all that much win if we are not willing to treat such tables
as dimension tables. We could do something simplistic like order
those dimensions by the selectivity of their baserestrict clauses,
joining the most-restricted ones first and any restriction-free ones
last.Oh, I thought the patch already did this, e.g., the patch was going to
make groups, e.g., foreign keys with restrictions, foreign keys without
restrictions, and no foreign key (might add rows). The first group was
going to be sorted by their selectivity, and the last group was going to
be sorted by how much they add rows, if that is possible.
No, the patch never did that. The various XXX comments discuss that as a
future optimization. Aren't the comments clear enough?
I think it'd work about the way you described, except that joins without
foreign keys can both increase and decrease the cardinality, and those
that reduce cardinality would need to be moved to the first group.
Another question is what to do about snowflake joins, as a common
extension/generalization of starjoins. I think we'd need to identify the
groups of dimensions (for the branches), and treat those as a single
logical dimension.
regards
--
Tomas Vondra
On Sun, Nov 23, 2025 at 03:53:17PM +0100, Tomas Vondra wrote:
The only reason to join a dimension table that lacks a restriction
clause is if you need some of its fields in the output, which you
might but I'm not sure that's such a common case. (Have you got
evidence to the contrary?) So I feel like we're not going to be
getting all that much win if we are not willing to treat such tables
as dimension tables. We could do something simplistic like order
those dimensions by the selectivity of their baserestrict clauses,
joining the most-restricted ones first and any restriction-free ones
last.Oh, I thought the patch already did this, e.g., the patch was going to
make groups, e.g., foreign keys with restrictions, foreign keys without
restrictions, and no foreign key (might add rows). The first group was
going to be sorted by their selectivity, and the last group was going to
be sorted by how much they add rows, if that is possible.No, the patch never did that. The various XXX comments discuss that as a
future optimization. Aren't the comments clear enough?
I think my brain got lost in the patch --- I was happy I got as far as I
did in understanding it. :-)
I think it'd work about the way you described, except that joins without
foreign keys can both increase and decrease the cardinality, and those
that reduce cardinality would need to be moved to the first group.
I see, makes sense.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On Sun, Nov 23, 2025 at 9:39 AM Tomas Vondra <tomas@vondra.me> wrote:
1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses. ISTM that a typical
query might be likeselect sum(o.total_price) from
orders o
join customers c on c.id = o.c_id
join products p on p.id = o.p_id
where c.customer_name = 'Wile E Coyote'
and p.product_name = 'Rocket Skates';Good question. I don't have a great evidence such joins to dimensions
(without additional restrictions) are a common case. It's partially a
guess and partially based on my past experience.
In my experience, restriction clauses on dimension tables are very common.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 11/24/25 21:55, Robert Haas wrote:
On Sun, Nov 23, 2025 at 9:39â¯AM Tomas Vondra <tomas@vondra.me> wrote:
1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses. ISTM that a typical
query might be likeselect sum(o.total_price) from
orders o
join customers c on c.id = o.c_id
join products p on p.id = o.p_id
where c.customer_name = 'Wile E Coyote'
and p.product_name = 'Rocket Skates';Good question. I don't have a great evidence such joins to dimensions
(without additional restrictions) are a common case. It's partially a
guess and partially based on my past experience.In my experience, restriction clauses on dimension tables are very common.
Sure, but does that imply the inverse case (dimensions without non-join
restrictions) are not? I'm not sure.
regards
--
Tomas Vondra
On Wed, Nov 26, 2025 at 1:30 PM Tomas Vondra <tomas@vondra.me> wrote:
In my experience, restriction clauses on dimension tables are very common.
Sure, but does that imply the inverse case (dimensions without non-join
restrictions) are not? I'm not sure.
Obviously that depends on a lot of things, and I don't completely
understand what the patch does and doesn't do. But, I think it would
be sad to implement an optimization that falls over catastrophically
when such restriction clauses are present. For example, a long time
ago, I used to build web applications. Twenty, even thirty table joins
were common. There certainly wouldn't be a restriction clause on every
dimension table, but it would be an unusual situation if there were NO
restriction clauses on ANY dimension table. It's maybe also worth
mentioning that in those applications, it wasn't always a pure star
join: one central fact table would join to a bunch of codes tables,
but also very often to some other fact tables that had their own codes
tables. Point being that optimizations like this can be shown to have
a LOT of value in individual test cases even if the circumstances in
which they can be applied are very restricted, but lifting some of
those restrictions can enormously expand the number of real-world
cases to which they apply. My intuition is that a smaller gain on a
larger class of queries will win us more praise than the reverse.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 11/28/25 19:57, Robert Haas wrote:
On Wed, Nov 26, 2025 at 1:30â¯PM Tomas Vondra <tomas@vondra.me> wrote:
In my experience, restriction clauses on dimension tables are very common.
Sure, but does that imply the inverse case (dimensions without non-join
restrictions) are not? I'm not sure.Obviously that depends on a lot of things, and I don't completely
understand what the patch does and doesn't do. But, I think it would
be sad to implement an optimization that falls over catastrophically
when such restriction clauses are present. For example, a long time
ago, I used to build web applications. Twenty, even thirty table joins
were common. There certainly wouldn't be a restriction clause on every
dimension table, but it would be an unusual situation if there were NO
restriction clauses on ANY dimension table.
I think it depends on what you mean by "falls over catastrophically".
The patch identifies dimensions without restrictions, moves them aside,
and does regular join search on the rest of the relations (some of which
may be dimensions with restrictions).
So the presence of dimensions with restrictions does not disable the
optimization entirely, it just means the dimensions with restrictions
won't benefit from it. So in the worst case, it'll perform just like
before (assuming the optimization is kept cheap enough).
I'd love if the optimization worked for all dimensions, even for those
with restrictions. But I don't know about such optimization.
It's maybe also worth
mentioning that in those applications, it wasn't always a pure star
join: one central fact table would join to a bunch of codes tables,
but also very often to some other fact tables that had their own codes
tables.
I certainly don't claim this optimization works for all queries, but
it's also not restricted to "pure" starjoins. It simply finds which
relations can be considered dimensions (i.e. joined through a FK). It
does not match the whole plan shape, or anything like that.
Yes, that may reduce the possible benefit of this optimization, because
the patch works within the "groups" generated by join_collapse_limit (so
within 8 relations by default). If those groups have few dimensions, the
patch may not help all that much.
Point being that optimizations like this can be shown to have
a LOT of value in individual test cases even if the circumstances in
which they can be applied are very restricted, but lifting some of
those restrictions can enormously expand the number of real-world
cases to which they apply. My intuition is that a smaller gain on a
larger class of queries will win us more praise than the reverse.
I don't disagree, but isn't this mostly what we're discussing now? I'm
trying to figure out if enough queries would benefit from this
optimization, which only applies to dimensions without restrictions.
regards
--
Tomas Vondra
On Fri, Nov 28, 2025 at 2:21 PM Tomas Vondra <tomas@vondra.me> wrote:=
The patch identifies dimensions without restrictions, moves them aside,
and does regular join search on the rest of the relations (some of which
may be dimensions with restrictions).So the presence of dimensions with restrictions does not disable the
optimization entirely, it just means the dimensions with restrictions
won't benefit from it. So in the worst case, it'll perform just like
before (assuming the optimization is kept cheap enough).
I'm guessing that the reason you're limiting this to relations without
restrictions is so that you don't have to reason about the join
causing cardinality changes. But I don't quite understand how you
avoid having that problem anyway. For example, suppose that we have a
fact table F which is joined to relations S1, S2, D1, D2, I1, and I2.
The joins to S1 and S2 are joins on FKs to unconstrained dimension
tables; i.e. cardinality stays the same. The joins to D1 and D2 are
joins on FKs to constrained dimension tables, i.e. cardinality
decreases. The joins to I1 and I2 on average match more than once,
i.e. cardinality increases. The optimal join order is presumably
something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing
through each level of the join tree is kept as small as possible, but
how do you achieve this if the joins to S1 and S2 are set aside? In
the presence of row-count-inflating joins, it's wrong to suppose that
S1 and S2 should be joined at the end.
What seems more likely to be true is that we should perform the join
to S1 and the join to S2 consecutively. It's very common in my
experience for complex reporting queries to have a bunch of joins the
results of which matter very little: each one changes the cardinality
very little or not at all, and so any ordering of those joins produces
essentially the same result, and probably it's best to do them all at
whatever point in the join sequence the cardinality of the other input
is at lowest. However, I'm not sure that even this is guaranteed. For
instance, suppose that in the above example there's no index on the
column of F that joins to S2. Could it be that the only reasonable way
to join to S2 is a merge join? If so, it might be best to start by
join F to S2 using an index scan on each, and then continue by joining
to D1, D2, S1, I1, I2 in that order.
Every time I start to think about this kind of optimization, I find
myself getting hung up on corner cases like these which are, maybe,
not all that probable, but which I think people almost certainly do
rely on the planner to get right. I don't know what to do about that.
I certainly agree with the idea that we waste a lot of energy
searching through functionally identical join orders and that we
should find a way to avoid that, but I'm a little suspicious that
avoiding regressions, or even finding out whether we've introduced
any, will prove difficult.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 11/28/25 23:50, Robert Haas wrote:
On Fri, Nov 28, 2025 at 2:21â¯PM Tomas Vondra <tomas@vondra.me> wrote:=
The patch identifies dimensions without restrictions, moves them aside,
and does regular join search on the rest of the relations (some of which
may be dimensions with restrictions).So the presence of dimensions with restrictions does not disable the
optimization entirely, it just means the dimensions with restrictions
won't benefit from it. So in the worst case, it'll perform just like
before (assuming the optimization is kept cheap enough).I'm guessing that the reason you're limiting this to relations without
restrictions is so that you don't have to reason about the join
causing cardinality changes. But I don't quite understand how you
avoid having that problem anyway. For example, suppose that we have a
fact table F which is joined to relations S1, S2, D1, D2, I1, and I2.
The joins to S1 and S2 are joins on FKs to unconstrained dimension
tables; i.e. cardinality stays the same. The joins to D1 and D2 are
joins on FKs to constrained dimension tables, i.e. cardinality
decreases. The joins to I1 and I2 on average match more than once,
i.e. cardinality increases. The optimal join order is presumably
something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing
through each level of the join tree is kept as small as possible, but
how do you achieve this if the joins to S1 and S2 are set aside? In
the presence of row-count-inflating joins, it's wrong to suppose that
S1 and S2 should be joined at the end.
Yes, that's certainly possible. It was discussed at the very beginning
of this thread, and at that point there was a suggestion to keep it
simple and just push the joins to the end:
/messages/by-id/1751009.1738974197@sss.pgh.pa.us
I'm not claiming that's good enough, though. Maybe we should reconsider
and it needs a better solution.
What seems more likely to be true is that we should perform the join
to S1 and the join to S2 consecutively. It's very common in my
experience for complex reporting queries to have a bunch of joins the
results of which matter very little: each one changes the cardinality
very little or not at all, and so any ordering of those joins produces
essentially the same result, and probably it's best to do them all at
whatever point in the join sequence the cardinality of the other input
is at lowest. However, I'm not sure that even this is guaranteed. For
instance, suppose that in the above example there's no index on the
column of F that joins to S2. Could it be that the only reasonable way
to join to S2 is a merge join? If so, it might be best to start by
join F to S2 using an index scan on each, and then continue by joining
to D1, D2, S1, I1, I2 in that order.
Good points.
The first part reminds me the approach I mentioned a couple messages
back, We might preprocess the join lists, but instead of "removing the
dimensions" from the search, we'd group them together, and treat each
group as a single item in join_search_one_level. Then, whenever
join_search_one_level hits the group, it'd expand it into a sequence of
joins. I have not tried that yet, but it seems doable ...
However, I'm not sure about your second point - what if the join order
matters after all, e.g. because some join order can leverage ordering
(pathkeys) of the inputs, or produces ordering better for the later joins?
Every time I start to think about this kind of optimization, I find
myself getting hung up on corner cases like these which are, maybe,
not all that probable, but which I think people almost certainly do
rely on the planner to get right. I don't know what to do about that.
I certainly agree with the idea that we waste a lot of energy
searching through functionally identical join orders and that we
should find a way to avoid that, but I'm a little suspicious that
avoiding regressions, or even finding out whether we've introduced
any, will prove difficult.
True. I started working on this with two assumptions: (a) we can detect
cases when this is guaranteed to be the optimal join order, and (b) it
would be cheap to do so. Maybe one of those assumptions is not correct.
thanks
--
Tomas Vondra
On Fri, Nov 28, 2025 at 6:34 PM Tomas Vondra <tomas@vondra.me> wrote:
True. I started working on this with two assumptions: (a) we can detect
cases when this is guaranteed to be the optimal join order, and (b) it
would be cheap to do so. Maybe one of those assumptions is not correct.
Probably needs testing, at least.
I wonder if there's any way that we could rule out the presence of
row-count-inflating joins, or at least identify the joins that are
definitely not row-count-inflating. I have a general feeling that we
postpone some of the estimation work that the planner does for too
long, and that moving some of it earlier would allow better
decision-making. It's very common for a query to involve a driving
table and then a bunch of joins to side tables that are all
essentially independent of each other. That is, each of those joins
will multiply the row count from the other side of the join by a
constant that does not depend very much on the join order. I don't
really know exactly what we could do that would enable us to notice
that pattern and do something about it, but it feels like we're
duplicating a lot of work
For instance, consider a 4-way join between A, B, C, and D, where A is
connected to the other three tables by join clauses, and those are the
only join clauses. Let's say that the join to B is going to multiply
the row count by 2, the join to C is going to multiply it by 1 (i.e.
no change), and the join to D is going to multiply it by 0.1 (i.e.
eliminate 90% of rows). Well, we're going to consider the A-B join and
discover that it has 2 times the cardinality of A, and then later
we'll consider the (A-C)-B join and discover that it has 2 times the
cardinality of A-C, and then later we'll consider the (A-D)-B join and
discover that it has two times the cardinality of A-D, and then later
we'll consider the (A-B-C)-D join and discover that it has two times
the cardinality of A-B-C. As the number of tables grows, the number of
times we rediscover the effect of joining to a certain table grows, I
believe, exponentially.
Of course, in theory, there's no reason why the idea that any
particular join multiplies the row count by a constant that is
independent of the join order has to be correct. For instance, if the
A-B join multiplies the same rows that the A-D join eliminates, then
you can't set a fixed multiplier for either one. But in practice, even
when that's the case, we're not aware of it: when evaluating a join
column from, say, the output of the A-B join, we use the statistics
for the underlying column of A or B, without any modification
reflecting what the A-B join did to the distribution. So I think that
our estimates will behave like join-order-independent multipliers even
when the reality is otherwise. I'm not at all sure that I'm totally
right about this, and I sort of expect you or Tom to point out why I'm
totally weak in the head for thinking that it's the case, but I feel
like there might be the kernel of an idea here even if the details are
wrong.
Because, like, I think the general complexity of determining the
optimal join order is known to be some horrifically non-polynomial
time thing, but there are lots of real-world problems where I think a
human being would try to do it in linear time, by initially choosing
the driving table, and then what to join to first, second, third, etc.
And I think that's the kind of case that you're trying to pursue with
this optimization, so it feels intuitively logical that something
should be possible. That said, I don't know whether from a theoretical
perspective it really is possible to do something like this in a
reliable way, making it a pure optimization, or whether you would be
doomed to always get some cases wrong, making it more like an optional
planner mode or something. Either way, I can't shake the feeling that
determining essentially every important fact about every join only
when we perform the main join search makes it a lot harder to imagine
ever avoiding the main join search.
--
Robert Haas
EDB: http://www.enterprisedb.com