Range Merge Join v1
========
Example:
========
Find different people using the same website at the same time:
create table session(sessionid text, username text, during tstzrange);
SELECT s1.username, s2.username, s1.during * s2.during
FROM session s1, session s2
WHERE s1.during && s2.during AND s1.username < s2.username
=====================================
Brief summary of previous discussion:
=====================================
/messages/by-id/1334554850.10878.52.camel@jdavis
- can indexes solve it already (parameterized paths, etc.)?
- planner complexity (track path keys, mergejoinable equality, etc.)
- spatial join algorithm choice
- compare range merge join against existing indexes to see if any
indexing approach is viable
==========
Externals:
==========
No new syntax or other externals. Just improved execution strategy for
joining ranges using the overlaps operator (&&).
New syntax is possible, but I don't see a strong reason to support
special range join syntax at this time.
=============
Optimizer Design
=============
I took the path of least resistance, so if I am doing something wrong
please let me know. I hardwired it to look for the range overlaps
operator when checking if it could do a merge join, and then add
range_ops to restrictinfo->mergeopfamilies.
Then, I have to suppress adding it as an equivalence class, because
overlaps is not equality. It still adds single-member ECs to
restrictinfo->right_ec and left_ec, but (I think) that's OK.
Also, I have to prevent generating paths for right/full join, because
the range join algorithm can't (currently) handle those.
=============
Costing
=============
Needs more consideration. Seems to do reasonable things in the few
examples I tried.
=============
Executor Design
=============
See detailed comments in nodeMergejoin.c
=============
Performance
=============
Seems much better than index nest loop join when the join is not
selective. I will post detailed numbers soon.
If no index is available, range merge join is the only reasonable way
to execute a range join. For instance, if the inner is not a leaf in
the plan.
=============
Alternatives:
=============
It was suggested that I approach it as a general spatial-join
problem. That has merit, but I rejected it for now because the
algorithm that I think has the most promise is range-oriented
anyway. By that I mean we would need to extract all of the dimensions
into ranges first, and then perform the join. With that in mind, it
makes sense to implement range joins first; and then later provide the
APIs to get at the spatial dimensions so that we can implement other
spatial joins as range joins.
Another suggestion was to base it on hash join, but I never understood
that proposal and it didn't seem to map very well to a spatial join.
Yet another suggestion was to use some kind of temporary index. Some
brief experiments I did indicated that it would be fairly slow (though
more investigation might be useful here). Also, it doesn't provide any
alternative to the nestloop-with-inner-index we already offer at the
leaf level today.
Regards,
Jeff Davis
Attachments:
rangejoin-v1.patchtext/x-patch; charset=US-ASCII; name=rangejoin-v1.patchDownload+250-24
Version 2 attached. Fixed a few issues, expanded tests, added docs.
A simple performance test (script attached) shows about a 5X
improvement when comparing against a nested loop with an inner
index-only scan over a gist index.
Even better, this doesn't require an index, so it will work even if
the input relations are subqueries.
Regards,
Jeff Davis
Show quoted text
On Thu, Apr 6, 2017 at 1:43 AM, Jeff Davis <pgsql@j-davis.com> wrote:
========
Example:
========Find different people using the same website at the same time:
create table session(sessionid text, username text, during tstzrange);
SELECT s1.username, s2.username, s1.during * s2.during
FROM session s1, session s2
WHERE s1.during && s2.during AND s1.username < s2.username=====================================
Brief summary of previous discussion:
=====================================/messages/by-id/1334554850.10878.52.camel@jdavis
- can indexes solve it already (parameterized paths, etc.)?
- planner complexity (track path keys, mergejoinable equality, etc.)
- spatial join algorithm choice
- compare range merge join against existing indexes to see if any
indexing approach is viable==========
Externals:
==========No new syntax or other externals. Just improved execution strategy for
joining ranges using the overlaps operator (&&).New syntax is possible, but I don't see a strong reason to support
special range join syntax at this time.=============
Optimizer Design
=============I took the path of least resistance, so if I am doing something wrong
please let me know. I hardwired it to look for the range overlaps
operator when checking if it could do a merge join, and then add
range_ops to restrictinfo->mergeopfamilies.Then, I have to suppress adding it as an equivalence class, because
overlaps is not equality. It still adds single-member ECs to
restrictinfo->right_ec and left_ec, but (I think) that's OK.Also, I have to prevent generating paths for right/full join, because
the range join algorithm can't (currently) handle those.=============
Costing
=============Needs more consideration. Seems to do reasonable things in the few
examples I tried.=============
Executor Design
=============See detailed comments in nodeMergejoin.c
=============
Performance
=============Seems much better than index nest loop join when the join is not
selective. I will post detailed numbers soon.If no index is available, range merge join is the only reasonable way
to execute a range join. For instance, if the inner is not a leaf in
the plan.=============
Alternatives:
=============It was suggested that I approach it as a general spatial-join
problem. That has merit, but I rejected it for now because the
algorithm that I think has the most promise is range-oriented
anyway. By that I mean we would need to extract all of the dimensions
into ranges first, and then perform the join. With that in mind, it
makes sense to implement range joins first; and then later provide the
APIs to get at the spatial dimensions so that we can implement other
spatial joins as range joins.Another suggestion was to base it on hash join, but I never understood
that proposal and it didn't seem to map very well to a spatial join.Yet another suggestion was to use some kind of temporary index. Some
brief experiments I did indicated that it would be fairly slow (though
more investigation might be useful here). Also, it doesn't provide any
alternative to the nestloop-with-inner-index we already offer at the
leaf level today.Regards,
Jeff Davis
On Tue, Apr 11, 2017 at 12:17 AM, Jeff Davis <pgsql@j-davis.com> wrote:
Version 2 attached. Fixed a few issues, expanded tests, added docs.
It looks like the CF app only listed my perf test script. Re-attaching
rangejoin-v2.patch so that it appears in the CF app. Identical to
other rangejoin-v2.patch.
Regards,
Jeff Davis
Attachments:
rangejoin-v2.patchtext/x-patch; charset=US-ASCII; name=rangejoin-v2.patchDownload+963-12
On Thu, Apr 6, 2017 at 2:13 PM, Jeff Davis <pgsql@j-davis.com> wrote:
========
Example:
========Find different people using the same website at the same time:
create table session(sessionid text, username text, during tstzrange);
SELECT s1.username, s2.username, s1.during * s2.during
FROM session s1, session s2
WHERE s1.during && s2.during AND s1.username < s2.username=====================================
Brief summary of previous discussion:
=====================================/messages/by-id/1334554850.10878.52.camel@jdavis
- can indexes solve it already (parameterized paths, etc.)?
- planner complexity (track path keys, mergejoinable equality, etc.)
- spatial join algorithm choice
- compare range merge join against existing indexes to see if any
indexing approach is viable==========
Externals:
==========No new syntax or other externals. Just improved execution strategy for
joining ranges using the overlaps operator (&&).New syntax is possible, but I don't see a strong reason to support
special range join syntax at this time.=============
Optimizer Design
=============I took the path of least resistance, so if I am doing something wrong
please let me know. I hardwired it to look for the range overlaps
operator when checking if it could do a merge join, and then add
range_ops to restrictinfo->mergeopfamilies.Then, I have to suppress adding it as an equivalence class, because
overlaps is not equality. It still adds single-member ECs to
restrictinfo->right_ec and left_ec, but (I think) that's OK.Also, I have to prevent generating paths for right/full join, because
the range join algorithm can't (currently) handle those.=============
Costing
=============Needs more consideration. Seems to do reasonable things in the few
examples I tried.=============
Executor Design
=============See detailed comments in nodeMergejoin.c
=============
Performance
=============Seems much better than index nest loop join when the join is not
selective. I will post detailed numbers soon.If no index is available, range merge join is the only reasonable way
to execute a range join. For instance, if the inner is not a leaf in
the plan.=============
Alternatives:
=============It was suggested that I approach it as a general spatial-join
problem. That has merit, but I rejected it for now because the
algorithm that I think has the most promise is range-oriented
anyway. By that I mean we would need to extract all of the dimensions
into ranges first, and then perform the join. With that in mind, it
makes sense to implement range joins first; and then later provide the
APIs to get at the spatial dimensions so that we can implement other
spatial joins as range joins.Another suggestion was to base it on hash join, but I never understood
that proposal and it didn't seem to map very well to a spatial join.Yet another suggestion was to use some kind of temporary index. Some
brief experiments I did indicated that it would be fairly slow (though
more investigation might be useful here). Also, it doesn't provide any
alternative to the nestloop-with-inner-index we already offer at the
leaf level today.Regards,
Jeff Davis
Looks like an interesting idea, however, in an attempt to test this patch I
found following error when compiling,
selfuncs.c: In function ‘mergejoinscansel’:
selfuncs.c:2901:12: error: ‘op_strategy’ undeclared (first use in this
function)
&op_strategy,
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
On Thu, Apr 20, 2017 at 2:05 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
Looks like an interesting idea, however, in an attempt to test this patch I
found following error when compiling,
selfuncs.c: In function ‘mergejoinscansel’:
selfuncs.c:2901:12: error: ‘op_strategy’ undeclared (first use in this
function)
&op_strategy,
Looks like filterdiff destroyed my patch from git. Attaching unified
version against master 3820c63d.
Thanks!
Jeff Davis
Attachments:
rangejoin-v2.unified.patchtext/x-patch; charset=US-ASCII; name=rangejoin-v2.unified.patchDownload+1109-27
Hi, Jeff!
I'm planning to provide the review for this patch in future. I've done
some performance tesing (see attachment).
All in all, nothing important, everything works fine.
Tests were executed under current HEAD on Ubuntu 16 over MacBook Air.
I observe that:
1. Patch brings performance improvement for specified query
2. Performance improvement of Range Merge Join vs GiST Nested Loop
Join (on Index Only Scan) is between 4x and 6x
3. Performance improvement of RMJ with B-tree index vs GiST is between
4x and 10x
(due to lack of actually competing cases, here I omit T-tests, they
are not that important when speaking about 4x difference)
4. Range Merge Join is capable to consume expressional index with
exactly same effect as direct index
5. Other attributes residing in joined tables do not affect
performance improvement
I'll make code review some time later, probably next week. (I hope
there is no urge in this, since commitfest is in July, or is it
urgent?) BTW fixe typo in index specification of original performance
test (both indexes were for same table)
Best regards, Andrey Borodin.
Attachments:
Hi, Jeff!
Sorry for being late. Actually, I had several unsuccessful attempts to
find something wrong with the patch.
Here's my review.
in pathkey.c
ecs = (EquivalenceClass **) palloc(nClauses * sizeof(EquivalenceClass *));
scores = (int *) palloc(nClauses * sizeof(int));
range_ecs = palloc(nClauses * sizeof(bool));
Third assignment has no cast.
And I have few questions:
1. Are there any types, which could benefit from Range Merge and are
not covered by this patch?
2. Can Range Merge handle merge of different ranges? Like int4range()
&& int8range() ?
My perf test script from the previous message was broken, here's fixed
one in the attachment.
This patch implements feature, contains new tests and passes old
tests, is documented and spec compliant. I do not see any reason why
not mark it "Ready for committer".
Best regards, Andrey Borodin.
Attachments:
perf2_fixed.sqltext/plain; charset=US-ASCII; name=perf2_fixed.sqlDownload
On Tue, May 30, 2017 at 11:44 PM, Andrew Borodin <borodin@octonica.com> wrote:
Hi, Jeff!
Hi!
Sorry for being late. Actually, I had several unsuccessful attempts to
find something wrong with the patch.
Here's my review.in pathkey.c
ecs = (EquivalenceClass **) palloc(nClauses * sizeof(EquivalenceClass *));
scores = (int *) palloc(nClauses * sizeof(int));
range_ecs = palloc(nClauses * sizeof(bool));Third assignment has no cast.
Will fix.
And I have few questions:
1. Are there any types, which could benefit from Range Merge and are
not covered by this patch?
I thought about this for a while, and the only thing I can think of
are range joins that don't explicitly use range types.
2. Can Range Merge handle merge of different ranges? Like int4range()
&& int8range() ?
Right now, there aren't even casts between range types. I think the
best way to handle that at this point would be to add casts among the
numeric ranges. There may be advantages to supporting any two ranges
where the contained types are part of the same opfamily, but it seems
a little early to add that complication.
My perf test script from the previous message was broken, here's fixed
one in the attachment.This patch implements feature, contains new tests and passes old
tests, is documented and spec compliant. I do not see any reason why
not mark it "Ready for committer".
Great!
I think there are a couple more things that could be done if we want
to. Let me know if you think these things should be done now, or if
they should be a separate patch later when the need arises:
* Support for r1 @> r2 joins (join on "contains" rather than "overlaps").
* Better integration with the catalog so that users could add their
own types that support range merge join.
Thank you for the review.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2017-06-02 19:42 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
On Tue, May 30, 2017 at 11:44 PM, Andrew Borodin <borodin@octonica.com> wrote:
1. Are there any types, which could benefit from Range Merge and are
not covered by this patch?I thought about this for a while, and the only thing I can think of
are range joins that don't explicitly use range types.
Let me try to write && in SQL
select * from a join b where (a.min<=b.max and a.min>=b.min) or
(a.max<=b.max and a.max>=b.min);
Quite complicated. Here user knows that min <= max, but DB don't. If
we write it precisely we get hell lot of permutations.
Here's what I think:
1. For me, this feature seems hard to implement.
2. This feature also will be hard to commit solely, since it's use
case will be rare.
3. If this could yield unexpected performance for queries like
select * from t where x<y and b>c or a!=z or [other conditions]
and optimizer could think: "Aha! I'll sort it and do it fast" it'd be cool.
I do not think range joins that don't explicitly use range types are
possible right now...
2. Can Range Merge handle merge of different ranges? Like int4range()
&& int8range() ?Right now, there aren't even casts between range types. I think the
best way to handle that at this point would be to add casts among the
numeric ranges. There may be advantages to supporting any two ranges
where the contained types are part of the same opfamily, but it seems
a little early to add that complication.
Agree.
I think there are a couple more things that could be done if we want
to. Let me know if you think these things should be done now, or if
they should be a separate patch later when the need arises:* Support for r1 @> r2 joins (join on "contains" rather than "overlaps").
* Better integration with the catalog so that users could add their
own types that support range merge join.
1. I believe this changes can be incremental. They constitute value
for the end user, then they are committable. This patch is not overly
complicated, but it's easier to do this stuff in small pieces.
2. The commitfest is 3 months away. If you will have new versions of
the patch, I'll review again. Maybe will spot some new things :)
Best regards, Andrey Borodin, Octonica.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Jeff,
Fast range joins are very nice to have, thank you for working on this.
I've been doing a somewhat related thing recently, trying to teach the
merge join executor to perform full joins on comparison clauses [1]/messages/by-id/1d23ad41-a9d9-1d1e-1d79-83b002d6f776@postgrespro.ru. I
have some comments after reading your patch:
* At the moment, "mergejoinable clause" and "equality clause" mean the
same thing to the planner, and those clauses are used both to create
equivalence classes and to perform merge joins. If we start performing
merge joins for different kinds of clauses, such as comparison or range
intersection, it makes sense to separate these two meanings. I tried
this in my patch and it didn't require many changes. I use
RestrictInfo.equivopfamilies to build equivalence classes, and
RestrictInfo.mergeopfamilies for mergejoinable clauses.
* Semantically, MJCompare() is a function that determines whether you
should emit a join result or advance one of the pointers. This meaning
is not explicit in the code and is not well encapsulated: the function
returns and int and 'compareNoMatch' flag, and the calling code combines
them in various ways to derive the final result. This meaning can be
made explicit by making MJCompare return enum values {Join, NextInner,
NextOuter}, and putting inside it all the logic that decides whether we
join or advance. ExecMergeJoin looks intimidating already, and I think
this change would help readability. Again, you can find an illustration
in my patch.
I hope you find these points helpful.
[1]: /messages/by-id/1d23ad41-a9d9-1d1e-1d79-83b002d6f776@postgrespro.ru
/messages/by-id/1d23ad41-a9d9-1d1e-1d79-83b002d6f776@postgrespro.ru
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 25, 2017 at 10:19 AM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Hi Jeff,
Hi,
Thank you for the review and suggestions!
* At the moment, "mergejoinable clause" and "equality clause" mean the same
thing to the planner, and those clauses are used both to create equivalence
classes and to perform merge joins. If we start performing merge joins for
different kinds of clauses, such as comparison or range intersection, it
makes sense to separate these two meanings. I tried this in my patch and it
didn't require many changes. I use RestrictInfo.equivopfamilies to build
equivalence classes, and RestrictInfo.mergeopfamilies for mergejoinable
clauses.
I like the idea. I will look in more detail and I can either change my
patch or piggyback on yours.
* Semantically, MJCompare() is a function that determines whether you should
emit a join result or advance one of the pointers. This meaning is not
explicit in the code and is not well encapsulated: the function returns and
int and 'compareNoMatch' flag, and the calling code combines them in various
ways to derive the final result. This meaning can be made explicit by making
MJCompare return enum values {Join, NextInner, NextOuter}, and putting
inside it all the logic that decides whether we join or advance.
ExecMergeJoin looks intimidating already, and I think this change would help
readability. Again, you can find an illustration in my patch.
I actually tried doing something similar in my patch, but there are
four cases to consider in EXEC_MJ_SKIP_TEST:
1. Overlaps: mark and then join the tuples
2. left-of: SKIPOUTER_ADVANCE
3. right-of: SKIPINNER_ADVANCE
4. None of the above: mark and NEXTINNER
However, you are right that the current code is ugly, and I should
refactor into 4 explicit cases. I don't think I will call them by the
same names as the join states, though, because there's not a direct
mapping.
Updated patch attached. Changelog:
* Rebased
* Changed MJCompare to return an enum as suggested, but it has 4
possible values rather than 3.
* Added support for joining on contains or contained by (@> or <@) and
updated tests.
Regards,
Jeff Davis
Attachments:
rangejoin-v3.difftext/plain; charset=US-ASCII; name=rangejoin-v3.diffDownload+1311-72
On Thu, Aug 31, 2017 at 1:52 AM, Jeff Davis <pgsql@j-davis.com> wrote:
Updated patch attached. Changelog:
* Rebased
* Changed MJCompare to return an enum as suggested, but it has 4
possible values rather than 3.
* Added support for joining on contains or contained by (@> or <@) and
updated tests.
The current patch does not work well with multiple keys, and I believe
it's important to solve because it will make it usable for
multi-dimension spatial joins.
The problem is this: the algorithm for a single key demands that the
input is sorted by (lower(r) NULLS FIRST, upper(r) NULLS LAST). That's
easy, because that's also the sort order for the range operator class,
so everything just works.
For multiple keys, the order of the input is:
lower(r1) NULLS FIRST, lower(r2) NULLS FIRST,
upper(r1) NULLS LAST, upper(r2) NULLS LAST
But that can't match up with any opclass, because an opclass can only
order one attribute at a time. In this case, the lower bound of r2 is
more significant than the upper bound of r1.
It's easy enough to adapt the execution to make this work as long as
the input is properly sorted. The challenge is making the optimizer
choose the sort keys properly.
I have tried a few approaches. The current approach that I'm using is:
* have a new, special range operator family with no operator classes
* in check_mergejoinable(), detect the && operator and set the
mergeopfamilies to contain only the special operator family
* in try_mergejoin_path(), it will convert the pathkeys using that
operator class into pathkeys over a "lower" expression over the same
EC; and then add on additional pathkeys for the "upper" expressions
onto the end of the pathkey list
Any comments or alternative suggestions welcome. This will probably
take a few days at least, so I put the patch in "waiting on author"
state.
Regards,
Jeff Davis
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 18, 2017 at 2:24 AM, Jeff Davis <pgsql@j-davis.com> wrote:
Any comments or alternative suggestions welcome. This will probably
take a few days at least, so I put the patch in "waiting on author"
state.
This did not receive an update for two months. I am marking it as
returned with feedback.
--
Michael