PoC: full merge join on comparison clause

Started by Alexander Kuzmenkovalmost 9 years ago29 messageshackers
Jump to latest
#1Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru

Hi hackers,

As you know, at this time Postgres cannot perform a full join on a
comparison clause. For example, if we have two tables with numeric
columns and run a query like 'select * from t1 full join t2 on t1.a >
t2.a', we get an error: "FULL JOIN is only supported with merge-joinable
or hash-joinable join conditions". Such queries are legitimate SQL and
sometimes arise when porting applications from different DBMS, so it
would be good to support them in Postgres. They can be rewritten as
union of right and left joins, but it requires manual work and runs
somewhat slower (see the benchmark at the end of the letter). This
proof-of-concept patch explores the possibility of performing such
queries as merge joins.

Consider the following example where outer and inner relations are in
ascending order, and we are asked to return outer tuples that are
greater than inner.
outer > inner
outer tuple - 6 4 - marked tuple
7 5
8 6 - inner tuple
8 7

The main difference from normal merge join is that we do not need to
advance the marked tuple. This behavior can be implemented with some
simple changes to the function that compares inner and outer tuples.
However, for the join clause 'outer < inner' we would have to advance
the marked tuple, which would require adding a new state to the merge
join executor node. We do not do this. Instead, at the path creation
stage, we make sure that the particular combination of sorting order and
join clause allows us to perform merge join the simple way.

The optimizer requires some other changes to support these joins.
Currently, it uses the same clauses to generate equivalence classes and
to perform merge joins. This patch has to separate these two uses.
Clauses that correspond to a btree equality operator are used to
construct equivalence classes; the operator families for these clauses
are recorded in the 'equivopfamilies' field of RestrictInfo struct.
Clauses that correspond to btree equality or comparison are used to
perform merge joins, and have their operator families recorded in the
'mergeopfamilies'.

The optimizer also has to check whether the particular join clause list
can be used for merge join, and ensure that it is compatible with
inner/outer path ordering. These checks are performed by
'can_sort_for_mergejoin()' and 'outer_sort_suitable_for_mergejoin()'.

There is an important unsolved problem in this patch. When generating
index paths for base relations, the optimizer tries to use only one scan
direction to limit the number of paths. This direction might not be
suitable for a given join clause, and the input path will have to be
sorted. We could generate paths for both directions, but this was
specifically removed for optimization (SHA 834ddc62 by Tom Lane,
10/27/2007 09:45 AM).

For inner joins one would expect the merge join to be slower than the
nested loop, because it has more complex code paths, and indeed this can
be seen on simple benchmarks (see the end of the letter). Costs should
be revised further to reflect this difference.

I would be glad to hear your opinion on this approach.

Some benchmarks:

===== Full join vs union of left and right joins
========================================

test1=# explain analyze select * from t4 right join t1 on t4.a < t1.a
union all select * from t4 left join t1 on t4.a < t1.a where t1.a is null;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Append (cost=809.69..70703.19 rows=3340000 width=8) (actual
time=8.336..1195.534 rows=5007546 loops=1)
-> Merge Left Join (cost=809.69..34230.49 rows=3333333 width=8)
(actual time=8.335..920.442 rows=5007537 loops=1)
Merge Cond: (t1.a > t4.a)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27
rows=1000 width=4) (actual time=0.027..0.395 rows=1001 loops=1)
Heap Fetches: 97
-> Sort (cost=809.39..834.39 rows=10000 width=4) (actual
time=8.300..356.821 rows=5007538 loops=1)
Sort Key: t4.a
Sort Method: quicksort Memory: 931kB
-> Seq Scan on t4 (cost=0.00..145.00 rows=10000
width=4) (actual time=0.019..2.533 rows=10000 loops=1)
-> Nested Loop Anti Join (cost=0.28..3072.71 rows=6667 width=8)
(actual time=4.685..35.421 rows=9 loops=1)
-> Seq Scan on t4 t4_1 (cost=0.00..145.00 rows=10000
width=4) (actual time=0.010..0.656 rows=10000 loops=1)
-> Index Only Scan using idx_t1_a on t1 t1_1 (cost=0.28..6.10
rows=333 width=4) (actual time=0.003..0.003 rows=1 loops=10000)
Index Cond: (a > t4_1.a)
Heap Fetches: 971
Planning time: 1.414 ms
Execution time: 1324.985 ms
(16 rows)

test1=# explain analyze select * from t4 full join t1 on t4.a < t1.a;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Merge Full Join (cost=809.66..34230.49 rows=3333333 width=8) (actual
time=8.351..914.590 rows=5007546 loops=1)
Merge Cond: (t1.a > t4.a)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27
rows=1000 width=4) (actual time=0.035..0.368 rows=1001 loops=1)
Heap Fetches: 97
-> Sort (cost=809.39..834.39 rows=10000 width=4) (actual
time=8.309..347.851 rows=5007546 loops=1)
Sort Key: t4.a
Sort Method: quicksort Memory: 931kB
-> Seq Scan on t4 (cost=0.00..145.00 rows=10000 width=4)
(actual time=0.020..2.563 rows=10000 loops=1)
Planning time: 1.083 ms
Execution time: 1044.869 ms
(10 rows)

=== Merge vs nested loop ===========================================

test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.28..944713.00 rows=33333333 width=8) (actual
time=0.055..8718.840 rows=50014145 loops=1)
-> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4) (actual
time=0.019..6.428 rows=100000 loops=1)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..6.10 rows=333
width=4) (actual time=0.003..0.050 rows=500 loops=100000)
Index Cond: (a >= t5.a)
Heap Fetches: 9147995
Planning time: 2.209 ms
Execution time: 9942.176 ms
(7 rows)

test1=# set enable_mergejoin TO on;
SET
test1=# explain analyze select * from t5 join t1 on t5.a <= t1.a;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=9748.54..343618.88 rows=33333333 width=8) (actual
time=35.491..9281.482 rows=50014145 loops=1)
Merge Cond: (t1.a >= t5.a)
-> Index Only Scan using idx_t1_a on t1 (cost=0.28..35.27
rows=1000 width=4) (actual time=0.027..0.769 rows=1001 loops=1)
Heap Fetches: 97
-> Sort (cost=9747.82..9997.82 rows=100000 width=4) (actual
time=35.458..3906.652 rows=50014145 loops=1)
Sort Key: t5.a
Sort Method: quicksort Memory: 8541kB
-> Seq Scan on t5 (cost=0.00..1443.00 rows=100000 width=4)
(actual time=0.013..8.570 rows=100000 loops=1)
Planning time: 2.368 ms
Execution time: 10530.356 ms
(10 rows)

--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachments:

full-merge-join-v1.patchtext/x-diff; name=full-merge-join-v1.patchDownload+558-133
#2Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Kuzmenkov (#1)
Re: PoC: full merge join on comparison clause

On Fri, May 12, 2017 at 7:09 AM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

As you know, at this time Postgres cannot perform a full join on a
comparison clause. For example, if we have two tables with numeric columns
and run a query like 'select * from t1 full join t2 on t1.a > t2.a', we get
an error: "FULL JOIN is only supported with merge-joinable or hash-joinable
join conditions". Such queries are legitimate SQL and sometimes arise when
porting applications from different DBMS, so it would be good to support
them in Postgres. They can be rewritten as union of right and left joins,
but it requires manual work and runs somewhat slower (see the benchmark at
the end of the letter). This proof-of-concept patch explores the possibility
of performing such queries as merge joins.

Interesting. I suggest adding this to the next CommitFest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Robert Haas (#2)
Re: PoC: full merge join on comparison clause

On 16.05.2017 18:57, Robert Haas wrote:

Interesting. I suggest adding this to the next CommitFest.

Thank you, added: https://commitfest.postgresql.org/14/1141/

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Alexander Kuzmenkov (#1)
Re: PoC: full merge join on comparison clause

Here is a new version of the patch, rebased to 749c7c41 and with some
cosmetic changes.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

full-merge-join-v2.patchtext/x-patch; name=full-merge-join-v2.patchDownload+568-137
#5Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#4)
Re: PoC: full merge join on comparison clause

Hi Alexander,

On Fri, Aug 25, 2017 at 10:11 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Here is a new version of the patch, rebased to 749c7c41 and with some
cosmetic changes.

I looked at this patch briefly. This is a useful feature. This isn't a
design level review of the patch. I may get back to that later. But
here are some assorted comments

The patch applies cleanly, but there are some whitespace errors.
src/backend/executor/nodeMergejoin.c:231: trailing whitespace.
+               /*
src/backend/executor/nodeMergejoin.c:232: trailing whitespace.
+                * Determine whether we accept lesser and/or equal tuples
src/backend/optimizer/path/joinpath.c:499: trailing whitespace.
+ * a merge join. A merge join executor can only choose inner values that are
src/backend/optimizer/path/joinpath.c:501: trailing whitespace.
+ * have at most one non-equality clause.

The implementation may change, so fixing the white space errors may
not be priority now. The patch compiles cleanly.

You have renamed RestrictInfo member mergeopfamilies as
equivopfamilies. I don't think that's a good name; it doesn't convey
that these are opfamilies containing merge operators. The changes in
check_mergejoinable() suggest that an operator may act as equality
operator in few operator families and comparison operator in others.
That looks odd. Actually an operator family contains operators other
than equality operators, so you may want to retain this member and add
a new member to specify whether the clause is an equality clause or
not.

In mergejoinscansel() you have just removed Assert(op_strategy ==
BTEqualStrategyNumber); Probably this function is written considering
on equality operators. But now that we are using this for all other
operators, we will need more changes to this function. That may be the
reason why INNER join in your earlier example doesn't choose right
costing.

The comment change in final_cost_mergejoin() needs more work. n1, n2,
n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 +
n2 + n3 + ... = size of inner relation is correct. In that context I
am not able to understand your change
+    * If the merge clauses contain inequality, (n1 + n2 + ...) ~=
+    * (size of inner relation)^2.
Some stylistic comments
+       switch (join_op_strategy)
+       {
+           case BTEqualStrategyNumber:
+               parent->mj_UseEqual[iClause] = true;
+               break;
+           case BTLessEqualStrategyNumber:
+               parent->mj_UseEqual[iClause] = true;
+               /* fall through */
+           case BTLessStrategyNumber:
+               parent->mj_UseLesser[iClause] = true;
+               break;
+           case BTGreaterEqualStrategyNumber:
+               parent->mj_UseEqual[iClause] = true;
+               /* fall through */
+           case BTGreaterStrategyNumber:
+               parent->mj_UseLesser[iClause] = true;
+               break;
+           default:
+               Assert(false);

Add blank lines between different cases and you may want to replace
Assert in default case into an elog(). See for example exprType() or
get_jointype_name().

+       if (sort_result < 0)
+       {
+           result = MJCR_NextOuter;
+       }

We usually do not add {} around a single statement block.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Daniel Gustafsson
daniel@yesql.se
In reply to: Ashutosh Bapat (#5)
Re: PoC: full merge join on comparison clause

On 19 Sep 2017, at 15:18, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Aug 25, 2017 at 10:11 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Here is a new version of the patch, rebased to 749c7c41 and with some
cosmetic changes.

I looked at this patch briefly. This is a useful feature. This isn't a
design level review of the patch. I may get back to that later. But
here are some assorted comments

..

Looking forward to further review on this patch, but based on this feedback I’m
moving this to Waiting for author.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#5)
Re: PoC: full merge join on comparison clause

Hi Ashutosh,

Thanks for the review.

*Jeff*, I'm copying you because this is relevant to our discussion about
what to do with mergeopfamilies when adding new merge join types.

You have renamed RestrictInfo member mergeopfamilies as
equivopfamilies. I don't think that's a good name; it doesn't convey
that these are opfamilies containing merge operators. The changes in
check_mergejoinable() suggest that an operator may act as equality
operator in few operator families and comparison operator in others.
That looks odd. Actually an operator family contains operators other
than equality operators, so you may want to retain this member and add
a new member to specify whether the clause is an equality clause or
not.

For mergeopfamilies, I'm not sure what is the best thing to do. I'll try
to explain my understanding of the situation, please correct me if I'm
wrong.

Before the patch, mergeopfamilies was used for two things: creating
equivalence classes and performing merge joins.

For equivalence classes: we look at the restriction clauses, and if they
have mergeopfamilies set, it means that these clause are based on an
equality operator, and the left and right variables must be equal. To
record this fact, we create an equivalence class. The variables might be
equal for one equality operator and not equal for another, so we record
the particular operator families to which our equality operator belongs.

For merge join: we look at the join clauses, and if they have
mergeopfamilies set, it means that these clauses are based on an
equality operator, and we can try performing this particular join as
merge join. These opfamilies are also used beforehand to create the
equivalence classes for left and right variables. The equivalence
classes are used to match the join clauses to pathkeys describing the
ordering of join inputs.

So, if we want to start doing merge joins for operators other than
equality, we still need to record their opfamilies, but only use them
for the second case and not the first. I chose to put these opfamilies
to different variables, and
name the one used for equivalence classes 'equivopfamilies' and the one
used for merge joins 'mergeopfamilies'. The equality operators are used
for both cases, so we put their opfamilies into both of these variables.

I agree this might look confusing. Indeed, we could keep a single
variable for opfamilies, and add separate flags that show how they can
be used, be that for equivalence classes, merge joins, range joins or
some combination of them. This is similar to what Jeff did in his range
merge join patch [1]/messages/by-id/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com. I will think more about this and try to produce an
updated patch.

In mergejoinscansel() you have just removed Assert(op_strategy ==
BTEqualStrategyNumber); Probably this function is written considering
on equality operators. But now that we are using this for all other
operators, we will need more changes to this function. That may be the
reason why INNER join in your earlier example doesn't choose right
costing.

I changed mergejoinscansel() slightly to reflect the fact that the inner
relation is scanned from the beginning if we have an inequality merge
clause.

The comment change in final_cost_mergejoin() needs more work. n1, n2,
n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 +
n2 + n3 + ... = size of inner relation is correct. In that context I
am not able to understand your change
+    * If the merge clauses contain inequality, (n1 + n2 + ...) ~=
+    * (size of inner relation)^2.

I extended the comment in final_cost_mergejoin(). Not sure if that
approximation makes any sense, but this is the best I could think of.

Style problems are fixed.

Attached please find the new version of the patch that addresses all the
review comments except mergeopfamilies.

The current commitfest is ending, but I'd like to continue working on
this patch, so I am moving it to the next one.

[1]: /messages/by-id/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com
/messages/by-id/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

full-merge-join-v3.patchtext/x-patch; name=full-merge-join-v3.patchDownload+581-148
#8Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#7)
Re: PoC: full merge join on comparison clause

On Thu, Sep 28, 2017 at 8:57 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Hi Ashutosh,

Thanks for the review.

Jeff, I'm copying you because this is relevant to our discussion about what
to do with mergeopfamilies when adding new merge join types.

You have renamed RestrictInfo member mergeopfamilies as
equivopfamilies. I don't think that's a good name; it doesn't convey
that these are opfamilies containing merge operators. The changes in
check_mergejoinable() suggest that an operator may act as equality
operator in few operator families and comparison operator in others.
That looks odd. Actually an operator family contains operators other
than equality operators, so you may want to retain this member and add
a new member to specify whether the clause is an equality clause or
not.

For mergeopfamilies, I'm not sure what is the best thing to do. I'll try to
explain my understanding of the situation, please correct me if I'm wrong.

Before the patch, mergeopfamilies was used for two things: creating
equivalence classes and performing merge joins.

For equivalence classes: we look at the restriction clauses, and if they
have mergeopfamilies set, it means that these clause are based on an
equality operator, and the left and right variables must be equal. To record
this fact, we create an equivalence class. The variables might be equal for
one equality operator and not equal for another, so we record the particular
operator families to which our equality operator belongs.

For merge join: we look at the join clauses, and if they have
mergeopfamilies set, it means that these clauses are based on an equality
operator, and we can try performing this particular join as merge join.
These opfamilies are also used beforehand to create the equivalence classes
for left and right variables. The equivalence classes are used to match the
join clauses to pathkeys describing the ordering of join inputs.

So, if we want to start doing merge joins for operators other than equality,
we still need to record their opfamilies, but only use them for the second
case and not the first. I chose to put these opfamilies to different
variables, and
name the one used for equivalence classes 'equivopfamilies' and the one used
for merge joins 'mergeopfamilies'. The equality operators are used for both
cases, so we put their opfamilies into both of these variables.

I agree this might look confusing. Indeed, we could keep a single variable
for opfamilies, and add separate flags that show how they can be used, be
that for equivalence classes, merge joins, range joins or some combination
of them. This is similar to what Jeff did in his range merge join patch [1].
I will think more about this and try to produce an updated patch.

I think we have (ab?)used mergeopfamilies to indicate equality
condition, which needs some changes. May be these two patches are
where we can do those changes.

In mergejoinscansel() you have just removed Assert(op_strategy ==
BTEqualStrategyNumber); Probably this function is written considering
on equality operators. But now that we are using this for all other
operators, we will need more changes to this function. That may be the
reason why INNER join in your earlier example doesn't choose right
costing.

I changed mergejoinscansel() slightly to reflect the fact that the inner
relation is scanned from the beginning if we have an inequality merge
clause.

The comment change in final_cost_mergejoin() needs more work. n1, n2,
n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 +
n2 + n3 + ... = size of inner relation is correct. In that context I
am not able to understand your change
+    * If the merge clauses contain inequality, (n1 + n2 + ...) ~=
+    * (size of inner relation)^2.

I extended the comment in final_cost_mergejoin(). Not sure if that
approximation makes any sense, but this is the best I could think of.

Style problems are fixed.

Attached please find the new version of the patch that addresses all the
review comments except mergeopfamilies.

The current commitfest is ending, but I'd like to continue working on this
patch, so I am moving it to the next one.

Thanks for working on the comments. I am interested to continue
reviewing it in the next commitfest.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#8)
Re: PoC: full merge join on comparison clause

As discussed earlier, I changed the way we work with mergeopfamilies. I
use the "is_equality" flag to indicate whether the clause is an equality
one, and fill mergeopfamilies for both equality and inequality operators.
The updated patch is attached (rebased to 20b6552242).

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

full-merge-join-v4.patchtext/x-patch; name=full-merge-join-v4.patchDownload+554-121
#10Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#9)
Re: PoC: full merge join on comparison clause

Hi Alexander,
Commit c7a9fa399 has added another test on mergeopfamilies. I think
your patch will need to take care of that test.

On Wed, Oct 4, 2017 at 6:38 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

As discussed earlier, I changed the way we work with mergeopfamilies. I use
the "is_equality" flag to indicate whether the clause is an equality one,
and fill mergeopfamilies for both equality and inequality operators.
The updated patch is attached (rebased to 20b6552242).

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#10)
Re: PoC: full merge join on comparison clause

Hi,

I am attaching the updated patch, rebased to 820c03.

On 09.10.2017 13:47, Ashutosh Bapat wrote:

Hi Alexander,
Commit c7a9fa399 has added another test on mergeopfamilies. I think
your patch will need to take care of that test.

On Wed, Oct 4, 2017 at 6:38 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

As discussed earlier, I changed the way we work with mergeopfamilies. I use
the "is_equality" flag to indicate whether the clause is an equality one,
and fill mergeopfamilies for both equality and inequality operators.
The updated patch is attached (rebased to 20b6552242).

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

full-merge-join-v5.patchtext/x-patch; name=full-merge-join-v5.patchDownload+565-123
#12Michael Paquier
michael@paquier.xyz
In reply to: Alexander Kuzmenkov (#11)
Re: [HACKERS] PoC: full merge join on comparison clause

On Mon, Oct 30, 2017 at 9:25 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

I am attaching the updated patch, rebased to 820c03.

(Please avoid top-posting)
This patch has rotten and conflicts with recent changes in joinrels.c.
This did not get any reviews, so I am moving it to next CF with
"waiting on author" as status.
--
Michael

#13Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Michael Paquier (#12)
Re: [HACKERS] PoC: full merge join on comparison clause

Here is the patch rebased to a852cfe9.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

full-merge-join-v6.patchtext/x-patch; name=full-merge-join-v6.patchDownload+565-123
#14Stephen Frost
sfrost@snowman.net
In reply to: Alexander Kuzmenkov (#13)
Re: [HACKERS] PoC: full merge join on comparison clause

Greetings Alexander,

* Alexander Kuzmenkov (a.kuzmenkov@postgrespro.ru) wrote:

Here is the patch rebased to a852cfe9.

Thanks for updating it. This would definitely be nice to have.
Ashutosh, thanks for your previous review, would you have a chance to
look at it again? Would be great to at least get this to ready for
committer before the end of this CF.

Thanks!

Stephen

#15Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Stephen Frost (#14)
Re: [HACKERS] PoC: full merge join on comparison clause

On Tue, Jan 23, 2018 at 10:01 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings Alexander,

* Alexander Kuzmenkov (a.kuzmenkov@postgrespro.ru) wrote:

Here is the patch rebased to a852cfe9.

Thanks for updating it. This would definitely be nice to have.
Ashutosh, thanks for your previous review, would you have a chance to
look at it again? Would be great to at least get this to ready for
committer before the end of this CF.

The patch contains new code and also refactors some existing code. May
be it's better to separate these two into separate patches so that
it's easy to review patches. There's lot of executor code, which I
don't understand myself. So, I won't be able to complete the review in
this CF. Sorry.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#16Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#15)
Re: [HACKERS] PoC: full merge join on comparison clause

On 29.01.2018 08:40, Ashutosh Bapat wrote:

Maybe it's better to separate these two into separate patches so that
it's easy to review patches.

OK, I'll try doing this. For now, moving the patch entry to the next
commitfest.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#17Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Alexander Kuzmenkov (#16)
Re: [HACKERS] PoC: full merge join on comparison clause

Here are some updates on this patch.

I split it into two parts. The preparatory part contains some mechanical
changes to prepare for the main part. Most importantly, a new field is
added, `RestrictInfo.is_mj_equality`. It is a marker of mergejoinable
equality clauses, and `RestrictInfo.mergeopfamilies` is a more general
marker of clauses that are mergejoinable but not necessarily equality.
The usages are changed accordingly.

The main part consists of executor and planner changes required to
support inequality merge joins.

The executor changes are as described in the original post.

The planner part has changed significantly since the last version. It
used to apply some shady hacks to ensure we have the required sort
orders of inner and outer paths. Now I think I found a reasonable way to
generate the pathkeys we need. When we sort outer relation in
`sort_inner_and_outer()`, the pathkeys are generated by
`select_outer_pathkeys_for_merge()`. When we use the pathkeys we already
have for the outer relation in `match_unsorted_outer()`, mergeclauses
are selected by `find_mergeclauses_for_pathkeys()`. I changed these
functions to select the right pathkey direction for merge clauses, and
also ensure that we only have a single inequality merge clause and it is
the last one. Also, to use the index paths, I changed
`pathkeys_useful_for_merging()` to keep both pathkey directions for
inequality merge clauses.

Some basic joins work, but I couldn't properly test all the corner cases
with different orderings, because they depend on a bug in vanilla merge
joins [1].

To sum up, the preparatory and executor changes are stable, and the
planner part is WIP.

1.
/messages/by-id/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

ineq-merge-join-v7-01-main.patchtext/x-patch; name=ineq-merge-join-v7-01-main.patchDownload+621-90
ineq-merge-join-v7-00-prep.patchtext/x-patch; name=ineq-merge-join-v7-00-prep.patchDownload+106-84
#18Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Alexander Kuzmenkov (#17)
Re: [HACKERS] PoC: full merge join on comparison clause

On 22.02.2018 21:42, Alexander Kuzmenkov wrote:

Some basic joins work, but I couldn't properly test all the corner
cases with different orderings, because they depend on a bug in
vanilla merge joins [1].

The bug was fixed, so here is the rebased patch. The planner part of the
patch is stable now and can be reviewed, too.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

ineq-merge-join-v8-01-main.patchtext/x-patch; name=ineq-merge-join-v8-01-main.patchDownload+701-87
ineq-merge-join-v8-00-prep.patchtext/x-patch; name=ineq-merge-join-v8-00-prep.patchDownload+106-84
#19Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#18)
Re: [HACKERS] PoC: full merge join on comparison clause

On Fri, Mar 2, 2018 at 8:02 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

On 22.02.2018 21:42, Alexander Kuzmenkov wrote:

Some basic joins work, but I couldn't properly test all the corner cases
with different orderings, because they depend on a bug in vanilla merge
joins [1].

The bug was fixed, so here is the rebased patch. The planner part of the
patch is stable now and can be reviewed, too.

Both the patches are named 01. Their names tell the order in which
they need to be applied, so it's ok for these patches. But creating
such patches using git format-patch (with -v as some suggest) really
helps in general. All you need to do is prepare commits in your
repository, one per patch, including changes in each patch in separate
commits and then run git format-patch on that repository. I use git
format-patch @{upstream}, but there are other ways also. Then you can
use git rebase to rebase your patches periodically. If you are already
doing that, sorry for the noise.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#20Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#19)
Re: [HACKERS] PoC: full merge join on comparison clause

On 05.03.2018 08:30, Ashutosh Bapat wrote:

But creating such patches using git format-patch (with -v as some suggest) really
helps in general.

Thanks for the advice. I heard about this workflow, but never used it
myself. Perhaps it's time to try it.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#21Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#20)
#22Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#21)
#23Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#22)
#24Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#23)
#25Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#21)
#26Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Alexander Kuzmenkov (#25)
#27Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Ashutosh Bapat (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Kuzmenkov (#27)
#29Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru
In reply to: Tom Lane (#28)