CROSS JOIN performance

Started by Andy Chambersabout 14 years ago2 messagesgeneral
Jump to latest
#1Andy Chambers
achambers@mcna.net

Hi,

In our porting of a big mysql app to postgres, we're finding lots of
queries like

select foo
from (foo f, bar b)
left join caz c on f.id = f.caz_id
where f.id = b.foo_id

I've seen the message where Tom explains why this is invalid in ANSI
SQL so I converted it to

select foo
from foo f CROSS JOIN bar b
left join caz c on f.id = f.caz_id
where f.id = b.foo_id

...and it works. However, sometimes quite slowly. When we've looked
into the slow ones, we've found that changing it again to

select foo
from foo f INNER JOIN bar b ON f.id = b.foo_id
left join caz c on f.id = f.caz_id

makes it perform much better.

Furthermore, we're starting to find that performance of the 3rd is
significantly better than the 2nd, *ONLY* when the CROSS JOINs are
followed by more joins (like in this case). If there are no more
tables being joined, changing to the 3rd version yields no performance
gain.

Are these three queries logically equivalent (well, at least the
latter two since the first isn't valid SQL)? If so, does it make
sense that the optimizer has difficulty with the second case.

Cheers,
Andy

--
Andy Chambers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Chambers (#1)
Re: CROSS JOIN performance

Andy Chambers <achambers@mcna.net> writes:

In our porting of a big mysql app to postgres, we're finding lots of
queries like

select foo
from (foo f, bar b)
left join caz c on f.id = f.caz_id
where f.id = b.foo_id

I've seen the message where Tom explains why this is invalid in ANSI
SQL so I converted it to

select foo
from foo f CROSS JOIN bar b
left join caz c on f.id = f.caz_id
where f.id = b.foo_id

...and it works. However, sometimes quite slowly. When we've looked
into the slow ones, we've found that changing it again to

select foo
from foo f INNER JOIN bar b ON f.id = b.foo_id
left join caz c on f.id = f.caz_id

makes it perform much better.

Those formulations look equivalent to me. Could you provide a concrete
test case, or at least some specific queries and their EXPLAIN ANALYZE
results? Which PG version are we talking about? Oh, and are the LEFT
JOIN conditions really not constraining table c at all?

regards, tom lane