Block nested loop join

Started by Bramandia Ramadhanaover 17 years ago8 messages
#1Bramandia Ramadhana
bramandia@gmail.com

Hi all,

I am new to postgresql. I am currently doing research to optimize the query
performance of RDBMS, specifically postgresql. Hence, I am currently reading
out the code to understand the implementation of various query evaluation
algorithm in postgresql.

Currently, I am investigating the nested loop join algorithm in
nodeNestloop.c. After reading the code, my understanding is that it performs
simple nested loop join (not block nested loop join). Is this true? Does
postgresql support block nested loop join? If it does, where is it? I might
miss some buffering/caching mechanism.

I appreciate any helps/advice.

Regards,

Bramandia R.

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bramandia Ramadhana (#1)
Re: Block nested loop join

Bramandia Ramadhana wrote:

Currently, I am investigating the nested loop join algorithm in
nodeNestloop.c. After reading the code, my understanding is that it performs
simple nested loop join (not block nested loop join). Is this true?

Yep.

Does postgresql support block nested loop join?

Nope.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Gregory Stark
stark@enterprisedb.com
In reply to: Heikki Linnakangas (#2)
Re: Block nested loop join

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Does postgresql support block nested loop join?

Nope.

We do support Hash Join though so I think the only difference is that we can't
use the hash join for cartesian joins.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

#4Bramandia Ramadhana
bramandia@gmail.com
In reply to: Gregory Stark (#3)
Re: Block nested loop join

Thanks for the clarifications.

Just for curiosity, is there any reason of not having block nested-loop join
implementation? Is it rarely useful?

As far as I am aware of, in the case of cross product of two tables, block
nested-loop join is the most efficient algorithm.

Regards,

Bramandia R.

On Fri, Oct 10, 2008 at 3:30 PM, Gregory Stark <stark@enterprisedb.com>wrote:

Show quoted text

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Does postgresql support block nested loop join?

Nope.

We do support Hash Join though so I think the only difference is that we
can't
use the hash join for cartesian joins.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

#5Gregory Stark
stark@enterprisedb.com
In reply to: Bramandia Ramadhana (#4)
Re: Block nested loop join

"Bramandia Ramadhana" <bramandia@gmail.com> writes:

Thanks for the clarifications.

Just for curiosity, is there any reason of not having block nested-loop join
implementation? Is it rarely useful?

Oh, actually it occurs to me that we do implement something analogous to a
degenerate block nested loop where we materialize one side of the nested loop.

It looks "backward" since the materialized side is the "inner" side of the
loop but it's basically equivalent to a block nested loop with the entire
outer side in a single block.

So the use case of a real block nested loop would be doing a cartesian join of
two large tables where neither fits in RAM. That does seem like it might be
kind of narrow given how large the output would be.

But we won't know unless someone does the experiment.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#5)
Re: Block nested loop join

Gregory Stark <stark@enterprisedb.com> writes:

So the use case of a real block nested loop would be doing a cartesian join of
two large tables where neither fits in RAM. That does seem like it might be
kind of narrow given how large the output would be.

Yeah. If you have a hashable join condition then our existing batched
hash join code should be roughly equivalent to this method. So the use
case is joining a couple of large tables with an un-hashable,
un-indexable join condition (or none at all, ie cross product) and that
just isn't something we hear people wanting to do a lot. I can't really
see why we'd bother maintaining extra code for block nested loop.

regards, tom lane

#7Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#6)
Re: Block nested loop join

Tom Lane <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

So the use case of a real block nested loop would be doing a cartesian join of
two large tables where neither fits in RAM. That does seem like it might be
kind of narrow given how large the output would be.

Yeah. If you have a hashable join condition then our existing batched
hash join code should be roughly equivalent to this method. So the use
case is joining a couple of large tables with an un-hashable,
un-indexable join condition (or none at all, ie cross product) and that
just isn't something we hear people wanting to do a lot. I can't really
see why we'd bother maintaining extra code for block nested loop.

Hm, I hadn't thought of other non-hashable join conditions.

I wonder how much code it would be though if we just hacked hash join to
support returning the full cartesian product. Ie, instead of doing a hash
lookup do a full scan of the hash and recheck the join condition if any for
every combination.

That seems like it would be a pretty small change to HashJoin and would
effectively support precisely this join type.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

#8Bramandia Ramadhana
bramandia@gmail.com
In reply to: Tom Lane (#6)
Re: Block nested loop join

Dear All,

I took a look at the source code for hash join this morning and I realized
that the block nested loop join is somewhat similar to that.

Thanks for the discussions.

Bramandia R.

On Fri, Oct 10, 2008 at 8:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Gregory Stark <stark@enterprisedb.com> writes:

So the use case of a real block nested loop would be doing a cartesian

join of

two large tables where neither fits in RAM. That does seem like it might

be

kind of narrow given how large the output would be.

Yeah. If you have a hashable join condition then our existing batched
hash join code should be roughly equivalent to this method. So the use
case is joining a couple of large tables with an un-hashable,
un-indexable join condition (or none at all, ie cross product) and that
just isn't something we hear people wanting to do a lot. I can't really
see why we'd bother maintaining extra code for block nested loop.

regards, tom lane