Hash function constant in the Hash join algorithm
Hi,
As part part of my project, I had to compare the time taken by the Hashjoin
algorithm to that of Nested loop algorithm for the inner join and Natural
join queries, under two cases. I used six tables to join each other each
containing 5000 rows each approximately.
The two cases are given below.
1)When the hash function of the of hash join algorithm is not made to
return a constant value, In this case the time taken by the hashjoin
algorithm for the inner join is 1.45 secs and for the natural join is 4.58
secs and the time taken by the nested loop algorithm for inner join is
11.49 secs, and for natural join 46.94 secs.
2) when the hash function of the hash join algorithm is made to return a
constant value. Here, in the program hashfunc.c, I made all the hash
functions return a constant value. In this case the time taken by the
hashjoin algorithm for inner join is 8.44 secs and for natural join is
29.18 secs, and the time taken by the Nested loop algorithm for inner join
is 11.66 secs and for natural join is 46.9 secs.
According to my understanding when a hashfunction is made to return a
constant, all the tuples are hashed into a single bucket, and therefore the
hashjoin algorithm is converted into a nestedloop algorithm.
From the above two cases, I am understanding that even even when the hash
function is made to return a constant, The hashjoin agorithm is taking
significantly lower time compared to nested loop.
Could anyone explain why does the hashjoin algorithm takes significantly
lower time compared to nested loop when the hash function is made to return
a constant value or have I done any mistake at any part of the code?
Note - I am using the postgresql9.4 version for coding. I know the post is
long, There might be some grammar mistakes, I regret if I caused any
convenience to anyone.
Thank you.
--
Regards,
K.Ravikiran
ᐧ
Ravi Kiran <ravi.kolanpaka@gmail.com> writes:
From the above two cases, I am understanding that even even when the hash
function is made to return a constant, The hashjoin agorithm is taking
significantly lower time compared to nested loop.
Could anyone explain why does the hashjoin algorithm takes significantly
lower time compared to nested loop when the hash function is made to return
a constant value or have I done any mistake at any part of the code?
Well, hashjoin would suck the inner relation into memory (into its
hashtable) whereas nestloop would rescan the inner relation each time,
implying more buffer-access overhead, tuple visibility checking, etc.
A closer comparison might be to a nestloop that has a Materialize node
above its inner relation.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general