d25ea01275 and partitionwise join

Started by Amit Langoteover 6 years ago28 messages

Amit Langote

amitlangote09@gmail.com

over 6 years ago

2 attachment(s)

Hi Tom,

I think an assumption of d25ea01275 breaks partitionwise join. Sorry
it took me a while to report it.

In /messages/by-id/8168.1560446056@sss.pgh.pa.us, Tom wrote:

I poked into this and found the cause. For the sample query, we have
an EquivalenceClass containing the expression
COALESCE(COALESCE(Var_1_1, Var_2_1), Var_3_1)
where each of the Vars belongs to an appendrel parent.
add_child_rel_equivalences() needs to add expressions representing the
transform of that to each child relation. That is, if the children
of table 1 are A1 and A2, of table 2 are B1 and B2, and of table 3
are C1 and C2, what we'd like to add are the expressions
COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_3_1)
COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_3_1)
COALESCE(COALESCE(Var_1_1, Var_B1_1), Var_3_1)
COALESCE(COALESCE(Var_1_1, Var_B2_1), Var_3_1)
COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C1_1)
COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C2_1)
However, what it's actually producing is additional combinations for
each appendrel after the first, because each call also mutates the
previously-added child expressions. So in this example we also get
COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_3_1)
COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_3_1)
COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_C1_1)
COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_C2_1)
COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_C1_1)
COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_C2_1)
With two appendrels involved, that's O(N^2) expressions; with
three appendrels, more like O(N^3).

This is by no means specific to FULL JOINs; you could get the same
behavior with join clauses like "WHERE t1.a + t2.b + t3.c = t4.d".

These extra expressions don't have any use, since we're not
going to join the children directly to each other.

...unless partition wise join thinks they can be joined. Partition
wise join can't handle 3-way full joins today, but only because it's
broken itself when trying to match a full join clause to the partition
key due to one side being a COALESCE expression. Consider this
example query:

-- p is defined as:
-- create table p (a int) partition by list (a);
-- create table p1 partition of p for values in (1);
-- create table p2 partition of p for values in (2);
explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sort (cost=16416733.32..16628145.85 rows=84565012 width=4)
Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
-> Merge Full Join (cost=536957.40..1813748.77 rows=84565012 width=4)
Merge Cond: (t4.a = (COALESCE(COALESCE(t1.a, t2.a), t3.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t4.a
-> Append (cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1 t4 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t4_1 (cost=0.00..35.50
rows=2550 width=4)
-> Materialize (cost=536546.83..553128.21 rows=3316275 width=12)
-> Sort (cost=536546.83..544837.52 rows=3316275 width=12)
Sort Key: (COALESCE(COALESCE(t1.a, t2.a), t3.a))
-> Merge Full Join (cost=14254.85..64024.48
rows=3316275 width=12)
Merge Cond: (t3.a = (COALESCE(t1.a, t2.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t3.a
-> Append (cost=0.00..96.50
rows=5100 width=4)
-> Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=13844.29..14169.41
rows=130050 width=8)
Sort Key: (COALESCE(t1.a, t2.a))
-> Merge Full Join
(cost=821.13..2797.38 rows=130050 width=8)
Merge Cond: (t1.a = t2.a)
-> Sort (cost=410.57..423.32
rows=5100 width=4)
Sort Key: t1.a
-> Append
(cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1
t1 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2
t1_1 (cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=410.57..423.32
rows=5100 width=4)
Sort Key: t2.a
-> Append
(cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1
t2 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2
t2_1 (cost=0.00..35.50 rows=2550 width=4)

-- turn on enable_partitionwise_join
set enable_partitionwise_join to on;
explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sort (cost=16385259.94..16596672.47 rows=84565012 width=4)
Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
-> Merge Full Join (cost=505484.02..1782275.39 rows=84565012 width=4)
Merge Cond: (t4.a = (COALESCE(COALESCE(t1.a, t2.a), t3.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t4.a
-> Append (cost=0.00..96.50 rows=5100 width=4)
-> Seq Scan on p1 t4 (cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t4_1 (cost=0.00..35.50
rows=2550 width=4)
-> Materialize (cost=505073.45..521654.83 rows=3316275 width=12)
-> Sort (cost=505073.45..513364.14 rows=3316275 width=12)
Sort Key: (COALESCE(COALESCE(t1.a, t2.a), t3.a))
-> Merge Full Join (cost=7653.92..32551.10
rows=3316275 width=12)
Merge Cond: (t3.a = (COALESCE(t1.a, t2.a)))
-> Sort (cost=410.57..423.32 rows=5100 width=4)
Sort Key: t3.a
-> Append (cost=0.00..96.50
rows=5100 width=4)
-> Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
-> Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
-> Sort (cost=7243.35..7405.91 rows=65024 width=8)
Sort Key: (COALESCE(t1.a, t2.a))
-> Result (cost=359.57..2045.11
rows=65024 width=8)
-> Append
(cost=359.57..2045.11 rows=65024 width=8)
-> Merge Full Join
(cost=359.57..860.00 rows=32512 width=8)
Merge Cond: (t1.a = t2.a)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t1.a
-> Seq Scan
on p1 t1 (cost=0.00..35.50 rows=2550 width=4)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t2.a
-> Seq Scan
on p1 t2 (cost=0.00..35.50 rows=2550 width=4)
-> Merge Full Join
(cost=359.57..860.00 rows=32512 width=8)
Merge Cond: (t1_1.a = t2_1.a)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t1_1.a
-> Seq Scan
on p2 t1_1 (cost=0.00..35.50 rows=2550 width=4)
-> Sort
(cost=179.78..186.16 rows=2550 width=4)
Sort Key: t2_1.a
-> Seq Scan
on p2 t2_1 (cost=0.00..35.50 rows=2550 width=4)

See how it only managed to use partition wise join up to 2-way join,
but gives up at 3-way join and higher, because the join condition
looks like this: t3.a = (COALESCE(t1.a, t2.a). When building the join
relation (t1, t2, t3) between (t3) and (t1, t2), it fails to see that
COALESCE(t1.a, t2.a) actually matches the partition key of (t1, t2).
When I fix the code that does the matching and run with merge joins
disabled, I can get a plan where the whole 4-way join is partitioned:

explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────
Gather Merge (cost=831480.11..1859235.87 rows=8808720 width=4)
Workers Planned: 2
-> Sort (cost=830480.09..841490.99 rows=4404360 width=4)
Sort Key: (COALESCE(COALESCE(COALESCE(t1.a, t2.a), t3.a), t4.a))
-> Parallel Append (cost=202.12..224012.93 rows=4404360 width=4)
-> Hash Full Join (cost=202.12..201991.13 rows=5285232 width=4)
Hash Cond: (COALESCE(COALESCE(t1.a, t2.a), t3.a) = t4.a)
-> Hash Full Join (cost=134.75..15904.32
rows=414528 width=12)
Hash Cond: (COALESCE(t1.a, t2.a) = t3.a)
-> Hash Full Join (cost=67.38..1247.18
rows=32512 width=8)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on p1 t1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p1 t2
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p1 t3
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p1 t4 (cost=0.00..35.50
rows=2550 width=4)
-> Hash Full Join (cost=202.12..201991.13 rows=5285232 width=4)
Hash Cond: (COALESCE(COALESCE(t1_1.a, t2_1.a),
t3_1.a) = t4_1.a)
-> Hash Full Join (cost=134.75..15904.32
rows=414528 width=12)
Hash Cond: (COALESCE(t1_1.a, t2_1.a) = t3_1.a)
-> Hash Full Join (cost=67.38..1247.18
rows=32512 width=8)
Hash Cond: (t1_1.a = t2_1.a)
-> Seq Scan on p2 t1_1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p2 t2_1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p2 t3_1
(cost=0.00..35.50 rows=2550 width=4)
-> Hash (cost=35.50..35.50 rows=2550 width=4)
-> Seq Scan on p2 t4_1 (cost=0.00..35.50
rows=2550 width=4)
(31 rows)

But with merge joins enabled:

explain select * from p t1 full outer join p t2 using (a) full outer
join p t3 using (a) full outer join p t4 using (a) order by 1;
ERROR: could not find pathkey item to sort

That's because, there's no child COALESCE(t1_1.a, t2_1.a) expression
in the EC that contains COALESCE(t1.a, t2.a), where t1_1 and t2_1
represent the 1st partition of t1 and t2, resp. The problem is that
add_child_rel_equivalences(), as of d25ea01275, only adds the
following child expressions of COALESCE(t1.a, t2.a):

-- when translating t1
COALESCE(t1_1.a, t2.a)
COALESCE(t1_2.a, t2.a)
-- when translating t2
COALESCE(t1.a, t2_1.a)
COALESCE(t1.a, t2_2.a)

whereas previously, the following would be added too when translating t2:

COALESCE(t1_1.a, t2_1.a)
COALESCE(t1_1.a, t2_2.a)
COALESCE(t1_2.a, t2_1.a)
COALESCE(t1_2.a, t2_2.a)

Note that of those, only COALESCE(t1_1.a, t2_1.a) and COALESCE(t1_2.a,
t2_2.a) are interesting, because partition wise join will only ever
consider pairs (t1_1, t2_1) and (t1_2, t2_2) to be joined.

We can get the needed child expressions and still avoid the
combinatorial explosion in the size of resulting EC members list if we
taught add_child_rel_equivalences() to only translate ECs that the
input parent relation is capable of producing. So, COALESCE(t1.a,
t2.a) will not be translated if the input relation is only (t1) or
(t2), that is, when called from set_append_rel_size(). Instead it
would be translated if it's passed the joinrel (t1, t2). IOW, teach
build_child_join_rel() to call add_child_rel_equivalences(), which
I've tried to implement in the attached.

I have attached two patches.

0001 - fix partitionwise join to work correctly with n-way joins of
which some are full joins (+ cosmetic improvements around the code
that was touched)
0002 - fix to translate multi-relation EC members correctly

Thanks,
Amit

Attachments:

0001-Fix-partitionwise-join-code-to-handle-FULL-OUTER-JOI.patchapplication/octet-stream; name=0001-Fix-partitionwise-join-code-to-handle-FULL-OUTER-JOI.patchDownload

From 1ce7db976f10cf5b2d515ffd726fac8936e55b4a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 25 Jun 2019 10:18:43 +0900
Subject: [PATCH 1/2] Fix partitionwise join code to handle FULL OUTER JOIN
 correctly

---
 src/backend/optimizer/path/joinrels.c        | 108 +++++++++++----
 src/backend/optimizer/util/plancat.c         |  20 +--
 src/backend/optimizer/util/relnode.c         |  92 +++++++-----
 src/include/nodes/pathnodes.h                |  36 +++--
 src/test/regress/expected/partition_join.out | 200 +++++++++++++++++++++++++++
 src/test/regress/sql/partition_join.sql      |  29 ++++
 6 files changed, 408 insertions(+), 77 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 43c3b7ea48..0b9e61a5cd 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -45,8 +45,9 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 												SpecialJoinInfo *parent_sjinfo,
 												Relids left_relids, Relids right_relids);
-static int	match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
+static int	match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel,
 										 bool strict_op);
+static List *extract_coalesce_args(Expr *expr);
 
 
 /*
@@ -1557,8 +1558,10 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 }
 
 /*
- * Returns true if there exists an equi-join condition for each pair of
- * partition keys from given relations being joined.
+ * have_partkey_equi_join
+ *		Returns true if there exist equi-join conditions involving pairs
+ *		of matching partition keys of the relations being joined for all
+ *		partition keys
  */
 bool
 have_partkey_equi_join(RelOptInfo *joinrel,
@@ -1631,10 +1634,10 @@ have_partkey_equi_join(RelOptInfo *joinrel,
 		 * Only clauses referencing the partition keys are useful for
 		 * partitionwise join.
 		 */
-		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
+		ipk1 = match_join_arg_to_partition_keys(expr1, rel1, strict_op);
 		if (ipk1 < 0)
 			continue;
-		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
+		ipk2 = match_join_arg_to_partition_keys(expr2, rel2, strict_op);
 		if (ipk2 < 0)
 			continue;
 
@@ -1674,13 +1677,19 @@ have_partkey_equi_join(RelOptInfo *joinrel,
 }
 
 /*
- * Find the partition key from the given relation matching the given
- * expression. If found, return the index of the partition key, else return -1.
+ * match_join_arg_to_partition_keys
+ *		Tries to match a join clause argument expression to one of the nullable
+ *		or non-nullable partition keys and if a match is found, returns the
+ *		matched	key's ordinal position; -1 is returned if the expression
+ *		doesn't match any of the keys or if strict_op being false prevents
+ *		nullable keys to be matched
  */
 static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 {
 	int			cnt;
+	int			matched = -1;
+	List	   *nullable_exprs;
 
 	/* This function should be called only for partitioned relations. */
 	Assert(rel->part_scheme);
@@ -1689,34 +1698,85 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 	while (IsA(expr, RelabelType))
 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
 
+	/*
+	 * Extract the arguments from possibly nested COALESCE expressions.  Each
+	 * of these arguments could be null when joining, so these expressions are
+	 * called as such and are to be matched only with the nullable partition
+	 * keys.
+	 */
+	if (IsA(expr, CoalesceExpr))
+		nullable_exprs = extract_coalesce_args(expr);
+	else
+		/*
+		 * expr may or may not be nullable but add to the list anyway to
+		 * simplify the coding below.
+		 */
+		nullable_exprs = list_make1(expr);
+
 	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
 	{
-		ListCell   *lc;
-
 		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
+
+		/* Is the expression one of the non-nullable partition keys? */
+		if (list_member(rel->partexprs[cnt], expr))
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 
+		/*
+		 * Nope, so check if it is one of the nullable keys.  Allowing
+		 * nullable keys won't work if the join operator is not strict,
+		 * because null partition keys may then join with rows from other
+		 * partitions.  XXX - would that ever be true if the operator is
+		 * already determined to be mergejoin- and hashjoin-able?
+		 */
 		if (!strict_op)
 			continue;
 
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
+		/* OK to match with nullable keys. */
 		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
+		if (list_intersection(rel->nullable_partexprs[cnt],
+							  nullable_exprs) != NIL)
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 	}
 
-	return -1;
+	Assert(list_length(nullable_exprs) >= 1);
+	list_free(nullable_exprs);
+
+	return matched;
+}
+
+/*
+ * extract_coalesce_args
+ *		Extract all arguments from arbitrarily nested CoalesceExpr's
+ *
+ * Note: caller should free the List structure when done using it.
+ */
+static List *
+extract_coalesce_args(Expr *expr)
+{
+	List   *coalesce_args = NIL;
+
+	while (expr && IsA(expr, CoalesceExpr))
+	{
+		CoalesceExpr *cexpr = (CoalesceExpr *) expr;
+		ListCell *lc;
+
+		expr = NULL;
+		foreach(lc, cexpr->args)
+		{
+			if (IsA(lfirst(lc), CoalesceExpr))
+				expr = lfirst(lc);
+			else
+				coalesce_args = lappend(coalesce_args, lfirst(lc));
+		}
+
+		Assert(expr == NULL || IsA(expr, CoalesceExpr));
+	}
+
+	return coalesce_args;
 }
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 40f497660d..e58bace542 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2254,9 +2254,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
 /*
  * set_baserel_partition_key_exprs
  *
- * Builds partition key expressions for the given base relation and sets them
- * in given RelOptInfo.  Any single column partition keys are converted to Var
- * nodes.  All Var nodes are restamped with the relid of given relation.
+ * Builds partition key expressions for the given base relation and sets
+ * rel->partexprs.
  */
 static void
 set_baserel_partition_key_exprs(Relation relation,
@@ -2304,16 +2303,19 @@ set_baserel_partition_key_exprs(Relation relation,
 			lc = lnext(lc);
 		}
 
+		/* Base relations have a single expression per key. */
 		partexprs[cnt] = list_make1(partexpr);
 	}
 
+	/*
+	 * For base relations, we assume that the partition keys are non-nullable,
+	 * although they are nullable in principle; list and hash partitioned
+	 * tables may contain nulls in the partition key(s), for example.
+	 * Assuming non-nullability is okay for the considerations of partition
+	 * pruning, because pruning is never performed with non-strict operators.
+	 */
 	rel->partexprs = partexprs;
 
-	/*
-	 * A base relation can not have nullable partition key expressions. We
-	 * still allocate array of empty expressions lists to keep partition key
-	 * expression handling code simple. See build_joinrel_partition_info() and
-	 * match_expr_to_partition_keys().
-	 */
+	/* Assigning NIL for each key means there are no nullable keys. */
 	rel->nullable_partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..80de20f13d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -58,6 +58,9 @@ static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
 static void build_joinrel_partition_info(RelOptInfo *joinrel,
 										 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 										 List *restrictlist, JoinType jointype);
+static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype);
 static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *parentrel,
 									   RelOptInfo *childrel,
@@ -1591,18 +1594,18 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer)
 
 /*
  * build_joinrel_partition_info
- *		If the two relations have same partitioning scheme, their join may be
- *		partitioned and will follow the same partitioning scheme as the joining
- *		relations. Set the partition scheme and partition key expressions in
- *		the join relation.
+ *		Checks if the two relations being joined can use partitionwise join
+ *		and if yes, initialize partitioning information of the resulting
+ *		partitioned relation
+ *
+ * This will set part_scheme and partition key expressions (partexprs and
+ * nullable_partexprs) if required.
  */
 static void
 build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 							 RelOptInfo *inner_rel, List *restrictlist,
 							 JoinType jointype)
 {
-	int			partnatts;
-	int			cnt;
 	PartitionScheme part_scheme;
 
 	/* Nothing to do if partitionwise join technique is disabled. */
@@ -1669,11 +1672,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	 */
 	joinrel->part_scheme = part_scheme;
 	joinrel->boundinfo = outer_rel->boundinfo;
-	partnatts = joinrel->part_scheme->partnatts;
-	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
-	joinrel->nullable_partexprs =
-		(List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nparts = outer_rel->nparts;
+	set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, jointype);
 	joinrel->part_rels =
 		(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * joinrel->nparts);
 
@@ -1683,32 +1683,31 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	Assert(outer_rel->consider_partitionwise_join);
 	Assert(inner_rel->consider_partitionwise_join);
 	joinrel->consider_partitionwise_join = true;
+}
+
+/*
+ * set_joinrel_partition_key_exprs
+ *		Initialize partition key expressions
+ */
+static void
+set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype)
+{
+	int		partnatts;
+	int		cnt;
+
+	Assert(joinrel->part_scheme != NULL);
+
+	partnatts = joinrel->part_scheme->partnatts;
+	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
+	joinrel->nullable_partexprs =
+		(List **) palloc0(sizeof(List *) * partnatts);
 
 	/*
-	 * Construct partition keys for the join.
-	 *
-	 * An INNER join between two partitioned relations can be regarded as
-	 * partitioned by either key expression.  For example, A INNER JOIN B ON
-	 * A.a = B.b can be regarded as partitioned on A.a or on B.b; they are
-	 * equivalent.
-	 *
-	 * For a SEMI or ANTI join, the result can only be regarded as being
-	 * partitioned in the same manner as the outer side, since the inner
-	 * columns are not retained.
-	 *
-	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
-	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
-	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b and
-	 * thus partition keys of an OUTER join should include partition key
-	 * expressions from the OUTER side only.  However, because all
-	 * commonly-used comparison operators are strict, the presence of nulls on
-	 * the outer side doesn't cause any problem; they can't match anything at
-	 * future join levels anyway.  Therefore, we track two sets of
-	 * expressions: those that authentically partition the relation
-	 * (partexprs) and those that partition the relation with the exception
-	 * that extra nulls may be present (nullable_partexprs).  When the
-	 * comparison operator is strict, the latter is just as good as the
-	 * former.
+	 * Join type determines which partition keys are assumed by the resulting
+	 * join relation.  Note that these keys are to be considered when checking
+	 * if any further joins involving this joinrel may be partitioned.
 	 */
 	for (cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1726,18 +1725,37 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 
 		switch (jointype)
 		{
+			/*
+			 * Join relation resulting from an INNER join may be regarded as
+			 * partitioned by either of inner and outer relation keys.  For
+			 * example, A INNER JOIN B ON A.a = B.b can be regarded as
+			 * partitioned on either A.a or B.b.
+			 */
 			case JOIN_INNER:
 				partexpr = list_concat(outer_expr, inner_expr);
 				nullable_partexpr = list_concat(outer_null_expr,
 												inner_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a SEMI or ANTI join may be
+			 * regarded as partitioned on the outer relation keys, since the
+			 * inner columns are omitted from the output.
+			 */
 			case JOIN_SEMI:
 			case JOIN_ANTI:
 				partexpr = outer_expr;
 				nullable_partexpr = outer_null_expr;
 				break;
 
+			/*
+			 * Join relation resulting from a LEFT OUTER JOIN likewise may be
+			 * regarded as partitioned on the (non-nullable) outer relation
+			 * keys.  The nullability of inner relation keys prevents them to
+			 * be considered partition keys of the join relation in all cases,
+			 * but they are okay as partition keys for further joins that
+			 * involve strict join operators.
+			 */
 			case JOIN_LEFT:
 				partexpr = outer_expr;
 				nullable_partexpr = list_concat(inner_expr,
@@ -1746,6 +1764,12 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 												inner_null_expr);
 				break;
 
+			/*
+			 * For FULL OUTER JOINs, both relations are nullable, so the
+			 * resulting join relation may be regarded as partitioned on
+			 * either of inner and outer relation keys, but only for joins
+			 * that involve strict join operators.
+			 */
 			case JOIN_FULL:
 				nullable_partexpr = list_concat(outer_expr,
 												inner_expr);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 441e64eca9..3648fc8d3c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -582,16 +582,32 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *								 this relation that are partitioned tables
  *								 themselves, in hierarchical order
  *
- * Note: A base relation always has only one set of partition keys, but a join
- * relation may have as many sets of partition keys as the number of relations
- * being joined. partexprs and nullable_partexprs are arrays containing
- * part_scheme->partnatts elements each. Each of these elements is a list of
- * partition key expressions.  For a base relation each list in partexprs
- * contains only one expression and nullable_partexprs is not populated. For a
- * join relation, partexprs and nullable_partexprs contain partition key
- * expressions from non-nullable and nullable relations resp. Lists at any
- * given position in those arrays together contain as many elements as the
- * number of joining relations.
+ * Notes on partition key expressions (partexprs and nullable_partexprs):
+ *
+ * Partition key expressions will be used to spot references to the partition
+ * keys of the relation in the expressions of a given query so as to apply
+ * various partitioning-based optimizations to certain query constructs.  For
+ * example, pruning unnecessary partitions of a table using baserestrictinfo
+ * clauses that contain partition keys, converting a join between two
+ * partitioned relations into a series of joins between pairs of their
+ * constituent partitions if the joined rows follow the same partitioning
+ * as the relations being joined.
+ *
+ * The partexprs and nullable_partexprs arrays each contain
+ * part_scheme->partnatts elements.  Each of the elements is a list of
+ * partition key expressions.  For partitioned *base* relations, there is one
+ * expression in every list, whereas for partitioned *join* relations, there
+ * can be as many as the number of component relations.
+ *
+ * nullable_partexprs are populated only in partitioned *join* relationss,
+ * that is, if any of their component relations are nullable due to OUTER JOIN
+ * considerations.  It contains only the expressions of the nullable component
+ * relations, while those of the non-nullable relations are present in the
+ * partexprs.  For the considerations of partitionwise join, nullable partition
+ * keys can be considered to partition the underlying relation in the same
+ * manner as the non-nullable partition keys do, as long as the join operator
+ * is stable, because those null-valued keys can't be joined further, thus
+ * preserving the partitioning.
  *----------
  */
 typedef enum RelOptKind
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 1296edcdae..885f754f10 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2003,3 +2003,203 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
                            Filter: (b = 0)
 (16 rows)
 
+-- N-way join consisting of 2 or more full joins
+DROP TABLE prt1_n_p2;
+CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500') PARTITION BY RANGE (c);
+CREATE TABLE prt1_n_p2_1 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0250') TO ('0350');
+CREATE TABLE prt1_n_p2_2 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0350') TO ('0500');
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(250, 499, 2) i;
+ANALYZE prt1_n;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                              QUERY PLAN                                              
+------------------------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(COALESCE(t1.c, t2.c), t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1.c, t2.c))::text = (t3.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1_1.c, t2_1.c))::text = (t3_1.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1_2.c, t2_2.c))::text = (t3_2.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1.c, t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1.c)::text = (t3.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_1.c, t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_2.c, t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1.c, t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1.c)::text = (t3.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_1.c, t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_2.c, t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+SET enable_hashjoin TO off;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index db9a6b4a96..97ec983cec 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -270,6 +270,7 @@ EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 
+
 -- test default partition behavior for range
 ALTER TABLE prt1 DETACH PARTITION prt1_p3;
 ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
@@ -435,3 +436,31 @@ ANALYZE prt2;
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+
+-- N-way join consisting of 2 or more full joins
+DROP TABLE prt1_n_p2;
+CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500') PARTITION BY RANGE (c);
+CREATE TABLE prt1_n_p2_1 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0250') TO ('0350');
+CREATE TABLE prt1_n_p2_2 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0350') TO ('0500');
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(250, 499, 2) i;
+ANALYZE prt1_n;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SET enable_hashjoin TO off;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-- 
2.11.0

0002-Translate-multi-relation-EC-members-in-a-separate-pa.patchapplication/octet-stream; name=0002-Translate-multi-relation-EC-members-in-a-separate-pa.patchDownload

From 9984fc3398e8e27626c356cbd9dc13d9d63a13e2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 27 Jun 2019 14:23:42 +0900
Subject: [PATCH 2/2] Translate multi-relation EC members in a separate pass

It's not correct to translate a given multi-relation EC member when
not all parent-child relation pairs (in the form AppendRelInfos) are
available for translation.

For example, EC expressions that can only be emitted by joinrels
should only be translated if there corresponding child joinrels
are present, that is, only if using partitionwise join.
---
 src/backend/optimizer/path/allpaths.c        |   2 +-
 src/backend/optimizer/path/equivclass.c      |  51 ++++--
 src/backend/optimizer/util/relnode.c         |  14 +-
 src/include/optimizer/paths.h                |   3 +-
 src/test/regress/expected/partition_join.out | 243 ++++++++++++++++++++++++++-
 5 files changed, 285 insertions(+), 28 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b7723481b0..0a95e7221d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 688d9b0707..bbeaf5b17e 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2109,12 +2109,15 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL,
+ * respectively, or JOINREL and OTHER_JOINREL.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
@@ -2123,6 +2126,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 	foreach(lc1, root->eq_classes)
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
+		EquivalenceMember *first_em;
 		ListCell   *lc2;
 
 		/*
@@ -2133,11 +2137,23 @@ add_child_rel_equivalences(PlannerInfo *root,
 		if (cur_ec->ec_has_volatile)
 			continue;
 
+		first_em = (EquivalenceMember *) linitial(cur_ec->ec_members);
+
+		/*
+		 * Single-relation EC members would already have been translated
+		 * by the time we begin looking for multi-relation EC members, so
+		 * no need to consider those.  Note that looking at the first
+		 * element is enough, because the rest should look the same.
+		 */
+		if (bms_membership(parent_rel->relids) == BMS_MULTIPLE &&
+			bms_membership(first_em->em_relids) == BMS_SINGLETON)
+			continue;
+
 		/*
 		 * No point in searching if child's topmost parent rel is not
 		 * mentioned in eclass.
 		 */
-		if (!bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids))
+		if (!bms_overlap(child_rel->top_parent_relids, cur_ec->ec_relids))
 			continue;
 
 		foreach(lc2, cur_ec->ec_members)
@@ -2147,34 +2163,31 @@ add_child_rel_equivalences(PlannerInfo *root,
 			if (cur_em->em_is_const)
 				continue;		/* ignore consts here */
 
-			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
-			 */
-			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
-
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_is_subset(cur_em->em_relids, child_rel->top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
@@ -2210,6 +2223,12 @@ add_child_rel_equivalences(PlannerInfo *root,
 				(void) add_eq_member(cur_ec, child_expr,
 									 new_relids, new_nullable_relids,
 									 true, cur_em->em_datatype);
+
+				/*
+				 * There aren't going to be more expressions to translate in
+				 * the same EC.
+				 */
+				break;
 			}
 		}
 	}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 80de20f13d..ac914f3ae1 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -851,6 +851,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
@@ -863,10 +873,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
 
-	/*
-	 * If the parent joinrel has pending equivalence classes, so does the
-	 * child.
-	 */
 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
 
 	/* Is the join between partitions itself partitioned? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..832cc84bb4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -150,7 +150,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 885f754f10..a952d059c5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2190,16 +2190,247 @@ SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING
 SET enable_hashjoin TO off;
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                               QUERY PLAN                                                
+---------------------------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(COALESCE(t1.c, t2.c), t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1.c, t2.c))::text) = (t3.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1.c, t2.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1.c
+                                                   ->  Seq Scan on prt1_n_p1 t1
+                                             ->  Sort
+                                                   Sort Key: t2.c
+                                                   ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1_1.c, t2_1.c))::text) = (t3_1.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1_1.c, t2_1.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1_1.c
+                                                   ->  Seq Scan on prt1_n_p2_1 t1_1
+                                             ->  Sort
+                                                   Sort Key: t2_1.c
+                                                   ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1_2.c, t2_2.c))::text) = (t3_2.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1_2.c, t2_2.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1_2.c
+                                                   ->  Seq Scan on prt1_n_p2_2 t1_2
+                                             ->  Sort
+                                                   Sort Key: t2_2.c
+                                                   ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(70 rows)
+
 SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1.c, t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1.c, t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1.c)::text = (t3.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_1.c, t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_1.c, t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_2.c, t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_2.c, t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(64 rows)
+
 SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1.c, t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1.c, t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1.c)::text = (t3.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_1.c, t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_1.c, t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_2.c, t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_2.c, t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(64 rows)
+
 SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
-- 
2.11.0

Etsuro Fujita

etsuro.fujita@gmail.com

over 6 years ago

In reply to: Amit Langote (#1)

Re: d25ea01275 and partitionwise join

On Tue, Jul 2, 2019 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote:

0001 - fix partitionwise join to work correctly with n-way joins of
which some are full joins (+ cosmetic improvements around the code
that was touched)

Here are my comments about the cosmetic improvements: they seem pretty
large to me, so I'd make a separate patch for that. In addition, I'd
move have_partkey_equi_join() and match_expr_to_partition_keys() to
relnode.c, because these functions are only used in that file.

Best regards,
Etsuro Fujita

Amit Langote

amitlangote09@gmail.com

over 6 years ago

In reply to: Etsuro Fujita (#2)

3 attachment(s)

Re: d25ea01275 and partitionwise join

Fujita-san,

Thanks for looking at this.

On Tue, Jul 16, 2019 at 8:22 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Tue, Jul 2, 2019 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote:

0001 - fix partitionwise join to work correctly with n-way joins of
which some are full joins (+ cosmetic improvements around the code
that was touched)

Here are my comments about the cosmetic improvements: they seem pretty
large to me, so I'd make a separate patch for that.

OK, my bad that I added so many cosmetic changes into a patch that is
meant to fix the main issue. Just to clarify, I'm proposing these
cosmetic improvements to better clarify the terminological separation
between nullable and non-nullable partition keys, which I found a bit
hard to understand as is.

I've broken the patch into two: 0001 contains only cosmetic changes
and 0002 the fix for handling full joins properly. Would you rather
that be reversed?

In addition, I'd
move have_partkey_equi_join() and match_expr_to_partition_keys() to
relnode.c, because these functions are only used in that file.

I hadn't noticed that. Makes sense to move them to relnode.c, which
is implemented in 0001.

Thanks,
Amit

Attachments:

v2-0001-Some-cosmetic-improvements-to-partitionwise-join-.patchapplication/octet-stream; name=v2-0001-Some-cosmetic-improvements-to-partitionwise-join-.patchDownload

From fa3a9abb4c911bbfa6c61cc65bcbe1547171a075 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:22:31 +0900
Subject: [PATCH v2 1/3] Some cosmetic improvements to partitionwise join code

Among other changes, this moves a couple of functions from joinrel.c
to relnode.c.
---
 src/backend/optimizer/path/joinrels.c | 167 ---------------------
 src/backend/optimizer/util/plancat.c  |  20 +--
 src/backend/optimizer/util/relnode.c  | 268 +++++++++++++++++++++++++++++-----
 src/include/nodes/pathnodes.h         |  36 +++--
 src/include/optimizer/paths.h         |   3 -
 5 files changed, 271 insertions(+), 223 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a480ab764..fa68059c3f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -46,8 +46,6 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 												SpecialJoinInfo *parent_sjinfo,
 												Relids left_relids, Relids right_relids);
-static int	match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
-										 bool strict_op);
 
 
 /*
@@ -1573,168 +1571,3 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 
 	return sjinfo;
 }
-
-/*
- * Returns true if there exists an equi-join condition for each pair of
- * partition keys from given relations being joined.
- */
-bool
-have_partkey_equi_join(RelOptInfo *joinrel,
-					   RelOptInfo *rel1, RelOptInfo *rel2,
-					   JoinType jointype, List *restrictlist)
-{
-	PartitionScheme part_scheme = rel1->part_scheme;
-	ListCell   *lc;
-	int			cnt_pks;
-	bool		pk_has_clause[PARTITION_MAX_KEYS];
-	bool		strict_op;
-
-	/*
-	 * This function should be called when the joining relations have same
-	 * partitioning scheme.
-	 */
-	Assert(rel1->part_scheme == rel2->part_scheme);
-	Assert(part_scheme);
-
-	memset(pk_has_clause, 0, sizeof(pk_has_clause));
-	foreach(lc, restrictlist)
-	{
-		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
-		OpExpr	   *opexpr;
-		Expr	   *expr1;
-		Expr	   *expr2;
-		int			ipk1;
-		int			ipk2;
-
-		/* If processing an outer join, only use its own join clauses. */
-		if (IS_OUTER_JOIN(jointype) &&
-			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
-			continue;
-
-		/* Skip clauses which can not be used for a join. */
-		if (!rinfo->can_join)
-			continue;
-
-		/* Skip clauses which are not equality conditions. */
-		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
-			continue;
-
-		opexpr = castNode(OpExpr, rinfo->clause);
-
-		/*
-		 * The equi-join between partition keys is strict if equi-join between
-		 * at least one partition key is using a strict operator. See
-		 * explanation about outer join reordering identity 3 in
-		 * optimizer/README
-		 */
-		strict_op = op_strict(opexpr->opno);
-
-		/* Match the operands to the relation. */
-		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
-			bms_is_subset(rinfo->right_relids, rel2->relids))
-		{
-			expr1 = linitial(opexpr->args);
-			expr2 = lsecond(opexpr->args);
-		}
-		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
-				 bms_is_subset(rinfo->right_relids, rel1->relids))
-		{
-			expr1 = lsecond(opexpr->args);
-			expr2 = linitial(opexpr->args);
-		}
-		else
-			continue;
-
-		/*
-		 * Only clauses referencing the partition keys are useful for
-		 * partitionwise join.
-		 */
-		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
-		if (ipk1 < 0)
-			continue;
-		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
-		if (ipk2 < 0)
-			continue;
-
-		/*
-		 * If the clause refers to keys at different ordinal positions, it can
-		 * not be used for partitionwise join.
-		 */
-		if (ipk1 != ipk2)
-			continue;
-
-		/*
-		 * The clause allows partitionwise join if only it uses the same
-		 * operator family as that specified by the partition key.
-		 */
-		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
-		{
-			if (!op_in_opfamily(rinfo->hashjoinoperator,
-								part_scheme->partopfamily[ipk1]))
-				continue;
-		}
-		else if (!list_member_oid(rinfo->mergeopfamilies,
-								  part_scheme->partopfamily[ipk1]))
-			continue;
-
-		/* Mark the partition key as having an equi-join clause. */
-		pk_has_clause[ipk1] = true;
-	}
-
-	/* Check whether every partition key has an equi-join condition. */
-	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
-	{
-		if (!pk_has_clause[cnt_pks])
-			return false;
-	}
-
-	return true;
-}
-
-/*
- * Find the partition key from the given relation matching the given
- * expression. If found, return the index of the partition key, else return -1.
- */
-static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
-{
-	int			cnt;
-
-	/* This function should be called only for partitioned relations. */
-	Assert(rel->part_scheme);
-
-	/* Remove any relabel decorations. */
-	while (IsA(expr, RelabelType))
-		expr = (Expr *) (castNode(RelabelType, expr))->arg;
-
-	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
-	{
-		ListCell   *lc;
-
-		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-
-		if (!strict_op)
-			continue;
-
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
-		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-	}
-
-	return -1;
-}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 98e99481c6..86075449fa 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2261,9 +2261,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
 /*
  * set_baserel_partition_key_exprs
  *
- * Builds partition key expressions for the given base relation and sets them
- * in given RelOptInfo.  Any single column partition keys are converted to Var
- * nodes.  All Var nodes are restamped with the relid of given relation.
+ * Builds partition key expressions for the given base relation and sets
+ * rel->partexprs.
  */
 static void
 set_baserel_partition_key_exprs(Relation relation,
@@ -2311,16 +2310,19 @@ set_baserel_partition_key_exprs(Relation relation,
 			lc = lnext(partkey->partexprs, lc);
 		}
 
+		/* Base relations have a single expression per key. */
 		partexprs[cnt] = list_make1(partexpr);
 	}
 
+	/*
+	 * For base relations, we assume that the partition keys are non-nullable,
+	 * although they are nullable in principle; list and hash partitioned
+	 * tables may contain nulls in the partition key(s), for example.
+	 * Assuming non-nullability is okay for the considerations of partition
+	 * pruning, because pruning is never performed with non-strict operators.
+	 */
 	rel->partexprs = partexprs;
 
-	/*
-	 * A base relation can not have nullable partition key expressions. We
-	 * still allocate array of empty expressions lists to keep partition key
-	 * expression handling code simple. See build_joinrel_partition_info() and
-	 * match_expr_to_partition_keys().
-	 */
+	/* Assigning NIL for each key means there are no nullable keys. */
 	rel->nullable_partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..f21ec9bdfc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -29,6 +29,7 @@
 #include "optimizer/tlist.h"
 #include "partitioning/partbounds.h"
 #include "utils/hsearch.h"
+#include "utils/lsyscache.h"
 
 
 typedef struct JoinHashEntry
@@ -58,6 +59,14 @@ static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
 static void build_joinrel_partition_info(RelOptInfo *joinrel,
 										 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 										 List *restrictlist, JoinType jointype);
+static bool have_partkey_equi_join(RelOptInfo *joinrel,
+					   RelOptInfo *rel1, RelOptInfo *rel2,
+					   JoinType jointype, List *restrictlist);
+static int match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel,
+					bool strict_op);
+static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype);
 static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *parentrel,
 									   RelOptInfo *childrel,
@@ -1591,18 +1600,18 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer)
 
 /*
  * build_joinrel_partition_info
- *		If the two relations have same partitioning scheme, their join may be
- *		partitioned and will follow the same partitioning scheme as the joining
- *		relations. Set the partition scheme and partition key expressions in
- *		the join relation.
+ *		Checks if the two relations being joined can use partitionwise join
+ *		and if yes, initialize partitioning information of the resulting
+ *		partitioned relation
+ *
+ * This will set part_scheme and partition key expressions (partexprs and
+ * nullable_partexprs) if required.
  */
 static void
 build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 							 RelOptInfo *inner_rel, List *restrictlist,
 							 JoinType jointype)
 {
-	int			partnatts;
-	int			cnt;
 	PartitionScheme part_scheme;
 
 	/* Nothing to do if partitionwise join technique is disabled. */
@@ -1669,11 +1678,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	 */
 	joinrel->part_scheme = part_scheme;
 	joinrel->boundinfo = outer_rel->boundinfo;
-	partnatts = joinrel->part_scheme->partnatts;
-	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
-	joinrel->nullable_partexprs =
-		(List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nparts = outer_rel->nparts;
+	set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, jointype);
 	joinrel->part_rels =
 		(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * joinrel->nparts);
 
@@ -1683,32 +1689,201 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	Assert(outer_rel->consider_partitionwise_join);
 	Assert(inner_rel->consider_partitionwise_join);
 	joinrel->consider_partitionwise_join = true;
+}
+
+/*
+ * have_partkey_equi_join
+ *		Returns true if there exist equi-join conditions involving pairs
+ *		of matching partition keys of the relations being joined for all
+ *		partition keys
+ */
+static bool
+have_partkey_equi_join(RelOptInfo *joinrel,
+					   RelOptInfo *rel1, RelOptInfo *rel2,
+					   JoinType jointype, List *restrictlist)
+{
+	PartitionScheme part_scheme = rel1->part_scheme;
+	ListCell   *lc;
+	int			cnt_pks;
+	bool		pk_has_clause[PARTITION_MAX_KEYS];
+	bool		strict_op;
 
 	/*
-	 * Construct partition keys for the join.
-	 *
-	 * An INNER join between two partitioned relations can be regarded as
-	 * partitioned by either key expression.  For example, A INNER JOIN B ON
-	 * A.a = B.b can be regarded as partitioned on A.a or on B.b; they are
-	 * equivalent.
-	 *
-	 * For a SEMI or ANTI join, the result can only be regarded as being
-	 * partitioned in the same manner as the outer side, since the inner
-	 * columns are not retained.
-	 *
-	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
-	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
-	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b and
-	 * thus partition keys of an OUTER join should include partition key
-	 * expressions from the OUTER side only.  However, because all
-	 * commonly-used comparison operators are strict, the presence of nulls on
-	 * the outer side doesn't cause any problem; they can't match anything at
-	 * future join levels anyway.  Therefore, we track two sets of
-	 * expressions: those that authentically partition the relation
-	 * (partexprs) and those that partition the relation with the exception
-	 * that extra nulls may be present (nullable_partexprs).  When the
-	 * comparison operator is strict, the latter is just as good as the
-	 * former.
+	 * This function should be called when the joining relations have same
+	 * partitioning scheme.
+	 */
+	Assert(rel1->part_scheme == rel2->part_scheme);
+	Assert(part_scheme);
+
+	memset(pk_has_clause, 0, sizeof(pk_has_clause));
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		OpExpr	   *opexpr;
+		Expr	   *expr1;
+		Expr	   *expr2;
+		int			ipk1;
+		int			ipk2;
+
+		/* If processing an outer join, only use its own join clauses. */
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
+			continue;
+
+		/* Skip clauses which can not be used for a join. */
+		if (!rinfo->can_join)
+			continue;
+
+		/* Skip clauses which are not equality conditions. */
+		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
+			continue;
+
+		opexpr = castNode(OpExpr, rinfo->clause);
+
+		/*
+		 * The equi-join between partition keys is strict if equi-join between
+		 * at least one partition key is using a strict operator. See
+		 * explanation about outer join reordering identity 3 in
+		 * optimizer/README
+		 */
+		strict_op = op_strict(opexpr->opno);
+
+		/* Match the operands to the relation. */
+		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
+			bms_is_subset(rinfo->right_relids, rel2->relids))
+		{
+			expr1 = linitial(opexpr->args);
+			expr2 = lsecond(opexpr->args);
+		}
+		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
+				 bms_is_subset(rinfo->right_relids, rel1->relids))
+		{
+			expr1 = lsecond(opexpr->args);
+			expr2 = linitial(opexpr->args);
+		}
+		else
+			continue;
+
+		/*
+		 * Only clauses referencing the partition keys are useful for
+		 * partitionwise join.
+		 */
+		ipk1 = match_join_arg_to_partition_keys(expr1, rel1, strict_op);
+		if (ipk1 < 0)
+			continue;
+		ipk2 = match_join_arg_to_partition_keys(expr2, rel2, strict_op);
+		if (ipk2 < 0)
+			continue;
+
+		/*
+		 * If the clause refers to keys at different ordinal positions, it can
+		 * not be used for partitionwise join.
+		 */
+		if (ipk1 != ipk2)
+			continue;
+
+		/*
+		 * The clause allows partitionwise join if only it uses the same
+		 * operator family as that specified by the partition key.
+		 */
+		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
+		{
+			if (!op_in_opfamily(rinfo->hashjoinoperator,
+								part_scheme->partopfamily[ipk1]))
+				continue;
+		}
+		else if (!list_member_oid(rinfo->mergeopfamilies,
+								  part_scheme->partopfamily[ipk1]))
+			continue;
+
+		/* Mark the partition key as having an equi-join clause. */
+		pk_has_clause[ipk1] = true;
+	}
+
+	/* Check whether every partition key has an equi-join condition. */
+	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
+	{
+		if (!pk_has_clause[cnt_pks])
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * match_join_arg_to_partition_keys
+ *		Tries to match a join clause argument expression to one of the nullable
+ *		or non-nullable partition keys and if a match is found, returns the
+ *		matched	key's ordinal position or -1 if the expression could not be
+ *		matched to any of the keys
+ */
+static int
+match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+{
+	int			cnt;
+
+	/* This function should be called only for partitioned relations. */
+	Assert(rel->part_scheme);
+
+	/* Remove any relabel decorations. */
+	while (IsA(expr, RelabelType))
+		expr = (Expr *) (castNode(RelabelType, expr))->arg;
+
+	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
+	{
+		ListCell   *lc;
+
+		Assert(rel->partexprs);
+		foreach(lc, rel->partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+
+		if (!strict_op)
+			continue;
+
+		/*
+		 * If it's a strict equi-join a NULL partition key on one side will
+		 * not join a NULL partition key on the other side. So, rows with NULL
+		 * partition key from a partition on one side can not join with those
+		 * from a non-matching partition on the other side. So, search the
+		 * nullable partition keys as well.
+		 */
+		Assert(rel->nullable_partexprs);
+		foreach(lc, rel->nullable_partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+	}
+
+	return -1;
+}
+
+/*
+ * set_joinrel_partition_key_exprs
+ *		Initialize partition key expressions
+ */
+static void
+set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype)
+{
+	int		partnatts;
+	int		cnt;
+
+	Assert(joinrel->part_scheme != NULL);
+
+	partnatts = joinrel->part_scheme->partnatts;
+	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
+	joinrel->nullable_partexprs =
+		(List **) palloc0(sizeof(List *) * partnatts);
+
+	/*
+	 * Join type determines which partition keys are assumed by the resulting
+	 * join relation.  Note that these keys are to be considered when checking
+	 * if any further joins involving this joinrel may be partitioned.
 	 */
 	for (cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1726,18 +1901,37 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 
 		switch (jointype)
 		{
+			/*
+			 * Join relation resulting from an INNER join may be regarded as
+			 * partitioned by either of inner and outer relation keys.  For
+			 * example, A INNER JOIN B ON A.a = B.b can be regarded as
+			 * partitioned on either A.a or B.b.
+			 */
 			case JOIN_INNER:
 				partexpr = list_concat(outer_expr, inner_expr);
 				nullable_partexpr = list_concat(outer_null_expr,
 												inner_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a SEMI or ANTI join may be
+			 * regarded as partitioned on the outer relation keys, since the
+			 * inner columns are omitted from the output.
+			 */
 			case JOIN_SEMI:
 			case JOIN_ANTI:
 				partexpr = outer_expr;
 				nullable_partexpr = outer_null_expr;
 				break;
 
+			/*
+			 * Join relation resulting from a LEFT OUTER JOIN likewise may be
+			 * regarded as partitioned on the (non-nullable) outer relation
+			 * keys.  The nullability of inner relation keys prevents them to
+			 * be considered partition keys of the join relation in all cases,
+			 * but they are okay as partition keys for further joins that
+			 * involve strict join operators.
+			 */
 			case JOIN_LEFT:
 				partexpr = outer_expr;
 				nullable_partexpr = list_concat(inner_expr,
@@ -1746,6 +1940,12 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 												inner_null_expr);
 				break;
 
+			/*
+			 * For FULL OUTER JOINs, both relations are nullable, so the
+			 * resulting join relation may be regarded as partitioned on
+			 * either of inner and outer relation keys, but only for joins
+			 * that involve strict join operators.
+			 */
 			case JOIN_FULL:
 				nullable_partexpr = list_concat(outer_expr,
 												inner_expr);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 441e64eca9..3648fc8d3c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -582,16 +582,32 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *								 this relation that are partitioned tables
  *								 themselves, in hierarchical order
  *
- * Note: A base relation always has only one set of partition keys, but a join
- * relation may have as many sets of partition keys as the number of relations
- * being joined. partexprs and nullable_partexprs are arrays containing
- * part_scheme->partnatts elements each. Each of these elements is a list of
- * partition key expressions.  For a base relation each list in partexprs
- * contains only one expression and nullable_partexprs is not populated. For a
- * join relation, partexprs and nullable_partexprs contain partition key
- * expressions from non-nullable and nullable relations resp. Lists at any
- * given position in those arrays together contain as many elements as the
- * number of joining relations.
+ * Notes on partition key expressions (partexprs and nullable_partexprs):
+ *
+ * Partition key expressions will be used to spot references to the partition
+ * keys of the relation in the expressions of a given query so as to apply
+ * various partitioning-based optimizations to certain query constructs.  For
+ * example, pruning unnecessary partitions of a table using baserestrictinfo
+ * clauses that contain partition keys, converting a join between two
+ * partitioned relations into a series of joins between pairs of their
+ * constituent partitions if the joined rows follow the same partitioning
+ * as the relations being joined.
+ *
+ * The partexprs and nullable_partexprs arrays each contain
+ * part_scheme->partnatts elements.  Each of the elements is a list of
+ * partition key expressions.  For partitioned *base* relations, there is one
+ * expression in every list, whereas for partitioned *join* relations, there
+ * can be as many as the number of component relations.
+ *
+ * nullable_partexprs are populated only in partitioned *join* relationss,
+ * that is, if any of their component relations are nullable due to OUTER JOIN
+ * considerations.  It contains only the expressions of the nullable component
+ * relations, while those of the non-nullable relations are present in the
+ * partexprs.  For the considerations of partitionwise join, nullable partition
+ * keys can be considered to partition the underlying relation in the same
+ * manner as the non-nullable partition keys do, as long as the join operator
+ * is stable, because those null-valued keys can't be joined further, thus
+ * preserving the partitioning.
  *----------
  */
 typedef enum RelOptKind
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..54610b8656 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -106,9 +106,6 @@ extern bool have_join_order_restriction(PlannerInfo *root,
 extern bool have_dangerous_phv(PlannerInfo *root,
 							   Relids outer_relids, Relids inner_params);
 extern void mark_dummy_rel(RelOptInfo *rel);
-extern bool have_partkey_equi_join(RelOptInfo *joinrel,
-								   RelOptInfo *rel1, RelOptInfo *rel2,
-								   JoinType jointype, List *restrictlist);
 
 /*
  * equivclass.c
-- 
2.11.0

v2-0002-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchapplication/octet-stream; name=v2-0002-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchDownload

From 2d069323ed8c3e5283093a5ed91069df467487f3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:33:20 +0900
Subject: [PATCH v2 2/3] Fix partitionwise join to handle FULL JOINs correctly

---
 src/backend/optimizer/util/relnode.c         |  86 +++++++++---
 src/test/regress/expected/partition_join.out | 200 +++++++++++++++++++++++++++
 src/test/regress/sql/partition_join.sql      |  29 ++++
 3 files changed, 299 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f21ec9bdfc..9ec1bacaff 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -64,6 +64,7 @@ static bool have_partkey_equi_join(RelOptInfo *joinrel,
 					   JoinType jointype, List *restrictlist);
 static int match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel,
 					bool strict_op);
+static List *extract_coalesce_args(Expr *expr);
 static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 								JoinType jointype);
@@ -1821,6 +1822,8 @@ static int
 match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 {
 	int			cnt;
+	int			matched = -1;
+	List	   *nullable_exprs;
 
 	/* This function should be called only for partitioned relations. */
 	Assert(rel->part_scheme);
@@ -1829,36 +1832,87 @@ match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 	while (IsA(expr, RelabelType))
 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
 
+	/*
+	 * Extract the arguments from possibly nested COALESCE expressions.  Each
+	 * of these arguments could be null when joining, so these expressions are
+	 * called as such and are to be matched only with the nullable partition
+	 * keys.
+	 */
+	if (IsA(expr, CoalesceExpr))
+		nullable_exprs = extract_coalesce_args(expr);
+	else
+		/*
+		 * expr may or may not be nullable but add to the list anyway to
+		 * simplify the coding below.
+		 */
+		nullable_exprs = list_make1(expr);
+
 	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
 	{
-		ListCell   *lc;
-
 		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
+
+		/* Is the expression one of the non-nullable partition keys? */
+		if (list_member(rel->partexprs[cnt], expr))
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 
+		/*
+		 * Nope, so check if it is one of the nullable keys.  Allowing
+		 * nullable keys won't work if the join operator is not strict,
+		 * because null partition keys may then join with rows from other
+		 * partitions.  XXX - would that ever be true if the operator is
+		 * already determined to be mergejoin- and hashjoin-able?
+		 */
 		if (!strict_op)
 			continue;
 
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
+		/* OK to match with nullable keys. */
 		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
+		if (list_intersection(rel->nullable_partexprs[cnt],
+							  nullable_exprs) != NIL)
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 	}
 
-	return -1;
+	Assert(list_length(nullable_exprs) >= 1);
+	list_free(nullable_exprs);
+
+	return matched;
+}
+
+/*
+ * extract_coalesce_args
+ *		Extract all arguments from arbitrarily nested CoalesceExpr's
+ *
+ * Note: caller should free the List structure when done using it.
+ */
+static List *
+extract_coalesce_args(Expr *expr)
+{
+	List   *coalesce_args = NIL;
+
+	while (expr && IsA(expr, CoalesceExpr))
+	{
+		CoalesceExpr *cexpr = (CoalesceExpr *) expr;
+		ListCell *lc;
+
+		expr = NULL;
+		foreach(lc, cexpr->args)
+		{
+			if (IsA(lfirst(lc), CoalesceExpr))
+				expr = lfirst(lc);
+			else
+				coalesce_args = lappend(coalesce_args, lfirst(lc));
+		}
+
+		Assert(expr == NULL || IsA(expr, CoalesceExpr));
+	}
+
+	return coalesce_args;
 }
 
 /*
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 1296edcdae..885f754f10 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2003,3 +2003,203 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
                            Filter: (b = 0)
 (16 rows)
 
+-- N-way join consisting of 2 or more full joins
+DROP TABLE prt1_n_p2;
+CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500') PARTITION BY RANGE (c);
+CREATE TABLE prt1_n_p2_1 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0250') TO ('0350');
+CREATE TABLE prt1_n_p2_2 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0350') TO ('0500');
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(250, 499, 2) i;
+ANALYZE prt1_n;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                              QUERY PLAN                                              
+------------------------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(COALESCE(t1.c, t2.c), t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1.c, t2.c))::text = (t3.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1_1.c, t2_1.c))::text = (t3_1.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1_2.c, t2_2.c))::text = (t3_2.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1.c, t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1.c)::text = (t3.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_1.c, t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_2.c, t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1.c, t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1.c)::text = (t3.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_1.c, t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_2.c, t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+SET enable_hashjoin TO off;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index db9a6b4a96..97ec983cec 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -270,6 +270,7 @@ EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 
+
 -- test default partition behavior for range
 ALTER TABLE prt1 DETACH PARTITION prt1_p3;
 ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
@@ -435,3 +436,31 @@ ANALYZE prt2;
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+
+-- N-way join consisting of 2 or more full joins
+DROP TABLE prt1_n_p2;
+CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500') PARTITION BY RANGE (c);
+CREATE TABLE prt1_n_p2_1 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0250') TO ('0350');
+CREATE TABLE prt1_n_p2_2 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0350') TO ('0500');
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(250, 499, 2) i;
+ANALYZE prt1_n;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SET enable_hashjoin TO off;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-- 
2.11.0

v2-0003-Add-multi-relation-EC-child-members-in-a-separate.patchapplication/octet-stream; name=v2-0003-Add-multi-relation-EC-child-members-in-a-separate.patchDownload

From aeb03898522a9ccc85435d2dfeae688134f11501 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 27 Jun 2019 14:23:42 +0900
Subject: [PATCH v2 3/3] Add multi-relation EC child members in a separate pass

A given multi-relation EC member should be translated when building
the child joinrel that will be able to supply the necessary child
expression.
---
 src/backend/optimizer/path/allpaths.c        |   2 +-
 src/backend/optimizer/path/equivclass.c      |  53 ++++--
 src/backend/optimizer/util/relnode.c         |  14 +-
 src/include/optimizer/paths.h                |   3 +-
 src/test/regress/expected/partition_join.out | 243 ++++++++++++++++++++++++++-
 5 files changed, 287 insertions(+), 28 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e9ee32b7f4..4c0e6592d9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 78d076b13c..908a924672 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2095,12 +2095,15 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL,
+ * respectively, or JOINREL and OTHER_JOINREL.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
@@ -2110,6 +2113,7 @@ add_child_rel_equivalences(PlannerInfo *root,
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
 		int			num_members;
+		EquivalenceMember *first_em;
 
 		/*
 		 * If this EC contains a volatile expression, then generating child
@@ -2119,11 +2123,25 @@ add_child_rel_equivalences(PlannerInfo *root,
 		if (cur_ec->ec_has_volatile)
 			continue;
 
+		first_em = (EquivalenceMember *) linitial(cur_ec->ec_members);
+
+		/*
+		 * Only translate ECs whose members expressions could possibly match
+		 * the parent relation.  That is, for "baserel" parent and child
+		 * relations only consider ECs that contain single-relation members,
+		 * whereas, for "joinrel" parent and child relations, only consider ECs
+		 * that contain multi-relation members.  Note that looking at the first
+		 * EC member is enough, because others should look the same.
+		 */
+		if (bms_membership(parent_rel->relids) !=
+			bms_membership(first_em->em_relids))
+			continue;
+
 		/*
 		 * No point in searching if child's topmost parent rel is not
 		 * mentioned in eclass.
 		 */
-		if (!bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids))
+		if (!bms_overlap(child_rel->top_parent_relids, cur_ec->ec_relids))
 			continue;
 
 		/*
@@ -2139,34 +2157,31 @@ add_child_rel_equivalences(PlannerInfo *root,
 			if (cur_em->em_is_const)
 				continue;		/* ignore consts here */
 
-			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
-			 */
-			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
-
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_is_subset(cur_em->em_relids, child_rel->top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
@@ -2202,6 +2217,12 @@ add_child_rel_equivalences(PlannerInfo *root,
 				(void) add_eq_member(cur_ec, child_expr,
 									 new_relids, new_nullable_relids,
 									 true, cur_em->em_datatype);
+
+				/*
+				 * There aren't going to be more expressions to translate in
+				 * the same EC.
+				 */
+				break;
 			}
 		}
 	}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9ec1bacaff..7a26f950d0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -858,6 +858,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
@@ -870,10 +880,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
 
-	/*
-	 * If the parent joinrel has pending equivalence classes, so does the
-	 * child.
-	 */
 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
 
 	/* Is the join between partitions itself partitioned? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54610b8656..ca507f3ee7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -147,7 +147,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 885f754f10..a952d059c5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2190,16 +2190,247 @@ SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING
 SET enable_hashjoin TO off;
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                               QUERY PLAN                                                
+---------------------------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(COALESCE(t1.c, t2.c), t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1.c, t2.c))::text) = (t3.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1.c, t2.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1.c
+                                                   ->  Seq Scan on prt1_n_p1 t1
+                                             ->  Sort
+                                                   Sort Key: t2.c
+                                                   ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1_1.c, t2_1.c))::text) = (t3_1.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1_1.c, t2_1.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1_1.c
+                                                   ->  Seq Scan on prt1_n_p2_1 t1_1
+                                             ->  Sort
+                                                   Sort Key: t2_1.c
+                                                   ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1_2.c, t2_2.c))::text) = (t3_2.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1_2.c, t2_2.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1_2.c
+                                                   ->  Seq Scan on prt1_n_p2_2 t1_2
+                                             ->  Sort
+                                                   Sort Key: t2_2.c
+                                                   ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(70 rows)
+
 SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1.c, t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1.c, t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1.c)::text = (t3.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_1.c, t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_1.c, t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_2.c, t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_2.c, t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(64 rows)
+
 SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1.c, t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1.c, t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1.c)::text = (t3.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_1.c, t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_1.c, t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_2.c, t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_2.c, t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(64 rows)
+
 SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
-- 
2.11.0

Etsuro Fujita

etsuro.fujita@gmail.com

over 6 years ago

In reply to: Amit Langote (#3)

Re: d25ea01275 and partitionwise join

On Thu, Jul 18, 2019 at 11:18 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jul 16, 2019 at 8:22 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Tue, Jul 2, 2019 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote:

0001 - fix partitionwise join to work correctly with n-way joins of
which some are full joins (+ cosmetic improvements around the code
that was touched)

Here are my comments about the cosmetic improvements: they seem pretty
large to me, so I'd make a separate patch for that.

OK, my bad that I added so many cosmetic changes into a patch that is
meant to fix the main issue. Just to clarify, I'm proposing these
cosmetic improvements to better clarify the terminological separation
between nullable and non-nullable partition keys, which I found a bit
hard to understand as is.

OK, thanks for the explanation!

I've broken the patch into two: 0001 contains only cosmetic changes
and 0002 the fix for handling full joins properly. Would you rather
that be reversed?

I like this order.

In addition, I'd
move have_partkey_equi_join() and match_expr_to_partition_keys() to
relnode.c, because these functions are only used in that file.

I hadn't noticed that. Makes sense to move them to relnode.c, which
is implemented in 0001.

Thanks for including that! Will review.

Best regards,
Etsuro Fujita

Amit Langote

amitlangote09@gmail.com

over 6 years ago

In reply to: Etsuro Fujita (#4)

Re: d25ea01275 and partitionwise join

Fujita-san,

On Thu, Jul 18, 2019 at 8:10 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Thu, Jul 18, 2019 at 11:18 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jul 16, 2019 at 8:22 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:

On Tue, Jul 2, 2019 at 6:29 PM Amit Langote <amitlangote09@gmail.com> wrote:

0001 - fix partitionwise join to work correctly with n-way joins of
which some are full joins (+ cosmetic improvements around the code
that was touched)

Here are my comments about the cosmetic improvements: they seem pretty
large to me, so I'd make a separate patch for that.

OK, my bad that I added so many cosmetic changes into a patch that is
meant to fix the main issue. Just to clarify, I'm proposing these
cosmetic improvements to better clarify the terminological separation
between nullable and non-nullable partition keys, which I found a bit
hard to understand as is.

OK, thanks for the explanation!

I've broken the patch into two: 0001 contains only cosmetic changes
and 0002 the fix for handling full joins properly. Would you rather
that be reversed?

I like this order.

In addition, I'd
move have_partkey_equi_join() and match_expr_to_partition_keys() to
relnode.c, because these functions are only used in that file.

I hadn't noticed that. Makes sense to move them to relnode.c, which
is implemented in 0001.

Thanks for including that! Will review.

To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2278/

I know there is one more patch beside the partitionwise join fix, but
I've set the title to suggest that this is related mainly to
partitionwise joins.

Thanks,
Amit

Richard Guo

riguo@pivotal.io

over 6 years ago

In reply to: Amit Langote (#5)

Re: d25ea01275 and partitionwise join

Hi Amit,

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote <amitlangote09@gmail.com>
wrote:

Fujita-san,

To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2278/

I know there is one more patch beside the partitionwise join fix, but
I've set the title to suggest that this is related mainly to
partitionwise joins.

Thank you for working on this. Currently partitionwise join does not
take COALESCE expr into consideration when matching to partition keys.
This is a problem.

BTW, a rebase is needed for the patch set.

Thanks
Richard

Richard Guo

riguo@pivotal.io

over 6 years ago

In reply to: Richard Guo (#6)

Re: d25ea01275 and partitionwise join

Hi Amit,

On Wed, Sep 4, 2019 at 3:30 PM Richard Guo <riguo@pivotal.io> wrote:

Hi Amit,

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote <amitlangote09@gmail.com>
wrote:

Fujita-san,

To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2278/

I know there is one more patch beside the partitionwise join fix, but
I've set the title to suggest that this is related mainly to
partitionwise joins.

Thank you for working on this. Currently partitionwise join does not
take COALESCE expr into consideration when matching to partition keys.
This is a problem.

BTW, a rebase is needed for the patch set.

I'm reviewing v2-0002 and I have concern about how COALESCE expr is
processed in match_join_arg_to_partition_keys().

If there is a COALESCE expr with first arg being non-partition key expr
and second arg being partition key, the patch would match it to the
partition key, which may result in wrong results in some cases.

For instance, consider the partition table below:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

So with patch v2-0002, the following query will be planned with
partitionwise join.

# explain (costs off)
select * from (p as t1 full join p as t2 on t1.k = t2.k) as
t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.val1, t12.k1)
= t3.k;
QUERY PLAN
----------------------------------------------------------
Append
-> Hash Full Join
Hash Cond: (COALESCE(t1.val, t1.k) = t3.k)
-> Hash Full Join
Hash Cond: (t1.k = t2.k)
-> Seq Scan on p_1 t1
-> Hash
-> Seq Scan on p_1 t2
-> Hash
-> Seq Scan on p_1 t3
-> Hash Full Join
Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k)
-> Hash Full Join
Hash Cond: (t1_1.k = t2_1.k)
-> Seq Scan on p_2 t1_1
-> Hash
-> Seq Scan on p_2 t2_1
-> Hash
-> Seq Scan on p_2 t3_1
(19 rows)

But as t1.val is not a partition key, actually we cannot use
partitionwise join here.

If we insert below data into the table, we will get wrong results for
the query above.

insert into p select 5,15;
insert into p select 15,5;

Thanks
Richard

Amit Langote

amitlangote09@gmail.com

over 6 years ago

In reply to: Richard Guo (#6)

Re: d25ea01275 and partitionwise join

Hello Richard,

On Wed, Sep 4, 2019 at 4:30 PM Richard Guo <riguo@pivotal.io> wrote:

Hi Amit,

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote <amitlangote09@gmail.com> wrote:

Fujita-san,

To avoid losing track of this, I've added this to November CF.

https://commitfest.postgresql.org/25/2278/

I know there is one more patch beside the partitionwise join fix, but
I've set the title to suggest that this is related mainly to
partitionwise joins.

Thank you for working on this. Currently partitionwise join does not
take COALESCE expr into consideration when matching to partition keys.
This is a problem.

BTW, a rebase is needed for the patch set.

Thanks a lot for looking at this.

I tried rebasing today and found that adopting this patch to the
following recent commit to equivalence processing code would take some
time that I don't currently have.

commit 3373c7155350cf6fcd51dd090f29e1332901e329
Author: David Rowley <drowley@postgresql.org>
Date: Sun Jul 21 17:30:58 2019 +1200

Speed up finding EquivalenceClasses for a given set of rels

I will come back to this in a couple of weeks, along with addressing
your other comments.

Thanks,
Amit

Amit Langote

amitlangote09@gmail.com

over 6 years ago

In reply to: Richard Guo (#7)

3 attachment(s)

Re: d25ea01275 and partitionwise join

Hi Richard,

Thanks a lot for taking a close look at the patch and sorry about the delay.

On Wed, Sep 4, 2019 at 5:29 PM Richard Guo <riguo@pivotal.io> wrote:

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote <amitlangote09@gmail.com> wrote:

I'm reviewing v2-0002 and I have concern about how COALESCE expr is
processed in match_join_arg_to_partition_keys().

If there is a COALESCE expr with first arg being non-partition key expr
and second arg being partition key, the patch would match it to the
partition key, which may result in wrong results in some cases.

For instance, consider the partition table below:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

So with patch v2-0002, the following query will be planned with
partitionwise join.

# explain (costs off)
select * from (p as t1 full join p as t2 on t1.k = t2.k) as t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.val1, t12.k1) = t3.k;
QUERY PLAN
----------------------------------------------------------
Append
-> Hash Full Join
Hash Cond: (COALESCE(t1.val, t1.k) = t3.k)
-> Hash Full Join
Hash Cond: (t1.k = t2.k)
-> Seq Scan on p_1 t1
-> Hash
-> Seq Scan on p_1 t2
-> Hash
-> Seq Scan on p_1 t3
-> Hash Full Join
Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k)
-> Hash Full Join
Hash Cond: (t1_1.k = t2_1.k)
-> Seq Scan on p_2 t1_1
-> Hash
-> Seq Scan on p_2 t2_1
-> Hash
-> Seq Scan on p_2 t3_1
(19 rows)

But as t1.val is not a partition key, actually we cannot use
partitionwise join here.

If we insert below data into the table, we will get wrong results for
the query above.

insert into p select 5,15;
insert into p select 15,5;

Good catch! It's quite wrong to use COALESCE(t12.val1, t12.k1) = t3.k
for partitionwise join as the COALESCE expression might as well output
the value of val1 which doesn't conform to partitioning.

I've fixed match_join_arg_to_partition_keys() to catch that case and
fail. Added a test case as well.

Please find attached updated patches.

Thanks,
Amit

Attachments:

v3-0003-Add-multi-relation-EC-child-members-in-a-separate.patchapplication/octet-stream; name=v3-0003-Add-multi-relation-EC-child-members-in-a-separate.patchDownload

From 0c8647a7ad5280068a79ef99d5a350fd7e6ee1ed Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 27 Jun 2019 14:23:42 +0900
Subject: [PATCH v3 3/3] Add multi-relation EC child members in a separate pass

A given multi-relation EC member should be translated when building
the child joinrel that will be able to supply the necessary child
expression.
---
 src/backend/optimizer/path/allpaths.c        |   2 +-
 src/backend/optimizer/path/equivclass.c      |  68 ++++++--
 src/backend/optimizer/util/relnode.c         |  20 ++-
 src/include/optimizer/paths.h                |   3 +-
 src/test/regress/expected/partition_join.out | 243 ++++++++++++++++++++++++++-
 5 files changed, 304 insertions(+), 32 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index db3a68a51d..70d0be691e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1067,7 +1067,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * paths that produce those sort orderings).
 		 */
 		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
+			add_child_rel_equivalences(root, &appinfo, 1, rel, childrel);
 		childrel->has_eclass_joins = rel->has_eclass_joins;
 
 		/*
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ccc07ba9f0..67756ab671 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1193,6 +1193,20 @@ generate_join_implied_equalities(PlannerInfo *root,
 		result = list_concat(result, sublist);
 	}
 
+	/*
+	 * Remember the EC indexes that were found to refer to this joinrel.  This
+	 * is useful in add_child_rel_equivalences() where we need to add child
+	 * versions of the expressions in each of these ECs.
+	 */
+	if (!bms_is_empty(matching_ecs))
+	{
+		RelOptInfo *joinrel = find_join_rel(root, join_relids);
+
+		if (joinrel)
+			joinrel->eclass_indexes = bms_add_members(joinrel->eclass_indexes,
+													  matching_ecs);
+	}
+
 	return result;
 }
 
@@ -2215,12 +2229,15 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
  * Note that this function won't be called at all unless we have at least some
  * reason to believe that the EC members it generates will be useful.
  *
- * parent_rel and child_rel could be derived from appinfo, but since the
+ * parent_rel and child_rel could be derived from appinfos, but since the
  * caller has already computed them, we might as well just pass them in.
+ * Note that parent_rel and child_rel are either BASEREL and OTHER_MEMBER_REL,
+ * respectively, or JOINREL and OTHER_JOINREL.
  */
 void
 add_child_rel_equivalences(PlannerInfo *root,
-						   AppendRelInfo *appinfo,
+						   AppendRelInfo **appinfos,
+						   int		nappinfos,
 						   RelOptInfo *parent_rel,
 						   RelOptInfo *child_rel)
 {
@@ -2231,13 +2248,14 @@ add_child_rel_equivalences(PlannerInfo *root,
 	 * eclass_indexes to avoid searching all of root->eq_classes.
 	 */
 	Assert(root->ec_merging_done);
-	Assert(IS_SIMPLE_REL(parent_rel));
+	Assert(IS_SIMPLE_REL(parent_rel) || IS_JOIN_REL(parent_rel));
 
 	i = -1;
 	while ((i = bms_next_member(parent_rel->eclass_indexes, i)) >= 0)
 	{
 		EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
 		int			num_members;
+		EquivalenceMember *first_em;
 
 		/*
 		 * If this EC contains a volatile expression, then generating child
@@ -2247,8 +2265,21 @@ add_child_rel_equivalences(PlannerInfo *root,
 		if (cur_ec->ec_has_volatile)
 			continue;
 
+		first_em = (EquivalenceMember *) linitial(cur_ec->ec_members);
+
+		/*
+		 * Only translate ECs whose members expressions could possibly match
+		 * the parent relation.  That is, for "baserel" parent and child
+		 * relations only consider ECs that contain single-relation members,
+		 * whereas, for "joinrel" parent and child relations, only consider ECs
+		 * that contain multi-relation members.  Note that looking at the first
+		 * EC member is enough, because others should look the same.
+		 */
+		if (bms_membership(parent_rel->relids) != bms_membership(first_em->em_relids))
+			continue;
+
 		/* Sanity check eclass_indexes only contain ECs for parent_rel */
-		Assert(bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids));
+		Assert(bms_overlap(child_rel->top_parent_relids, cur_ec->ec_relids));
 
 		/*
 		 * We don't use foreach() here because there's no point in scanning
@@ -2263,34 +2294,31 @@ add_child_rel_equivalences(PlannerInfo *root,
 			if (cur_em->em_is_const)
 				continue;		/* ignore consts here */
 
-			/*
-			 * We consider only original EC members here, not
-			 * already-transformed child members.  Otherwise, if some original
-			 * member expression references more than one appendrel, we'd get
-			 * an O(N^2) explosion of useless derived expressions for
-			 * combinations of children.
-			 */
-			if (cur_em->em_is_child)
-				continue;		/* ignore children here */
-
 			/* Does this member reference child's topmost parent rel? */
-			if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids))
+			if (bms_is_subset(cur_em->em_relids, child_rel->top_parent_relids))
 			{
 				/* Yes, generate transformed child version */
 				Expr	   *child_expr;
 				Relids		new_relids;
 				Relids		new_nullable_relids;
 
-				if (parent_rel->reloptkind == RELOPT_BASEREL)
+				/*
+				 * If the parent_rel is itself the topmost parent rel, transform
+				 * directly.
+				 */
+				if (parent_rel->reloptkind == RELOPT_BASEREL ||
+					parent_rel->reloptkind == RELOPT_JOINREL)
 				{
 					/* Simple single-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs(root,
 											   (Node *) cur_em->em_expr,
-											   1, &appinfo);
+											   nappinfos, appinfos);
 				}
 				else
 				{
+					Assert(parent_rel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
+						   parent_rel->reloptkind == RELOPT_OTHER_JOINREL);
 					/* Must do multi-level transformation */
 					child_expr = (Expr *)
 						adjust_appendrel_attrs_multilevel(root,
@@ -2329,6 +2357,12 @@ add_child_rel_equivalences(PlannerInfo *root,
 
 				/* Record this EC index for the child rel */
 				child_rel->eclass_indexes = bms_add_member(child_rel->eclass_indexes, i);
+
+				/*
+				 * There aren't going to be more expressions to translate in
+				 * the same EC.
+				 */
+				break;
 			}
 		}
 	}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3417438108..7cbd9ebe94 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -671,6 +671,9 @@ build_join_rel(PlannerInfo *root,
 	joinrel->nullable_partexprs = NULL;
 	joinrel->partitioned_child_rels = NIL;
 
+	/* Add the joinrel to the PlannerInfo. */
+	add_join_rel(root, joinrel);
+
 	/* Compute information relevant to the foreign relations. */
 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
 
@@ -744,9 +747,6 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
 
-	/* Add the joinrel to the PlannerInfo. */
-	add_join_rel(root, joinrel);
-
 	/*
 	 * Also, if dynamic-programming join search is active, add the new joinrel
 	 * to the appropriate sublist.  Note: you might think the Assert on number
@@ -864,6 +864,16 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 														(Node *) parent_joinrel->joininfo,
 														nappinfos,
 														appinfos);
+
+	/*
+	 * If the parent joinrel has pending equivalence classes, so does the
+	 * child.
+	 */
+	if (parent_joinrel->has_eclass_joins ||
+		has_useful_pathkeys(root, parent_joinrel))
+			add_child_rel_equivalences(root, appinfos, nappinfos,
+									   parent_joinrel, joinrel);
+
 	pfree(appinfos);
 
 	/*
@@ -873,10 +883,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
 
-	/*
-	 * If the parent joinrel has pending equivalence classes, so does the
-	 * child.
-	 */
 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
 
 	/* Is the join between partitions itself partitioned? */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54610b8656..ca507f3ee7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -147,7 +147,8 @@ extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
 extern void add_child_rel_equivalences(PlannerInfo *root,
-									   AppendRelInfo *appinfo,
+									   AppendRelInfo **appinfos,
+									   int		nappinfos,
 									   RelOptInfo *parent_rel,
 									   RelOptInfo *child_rel);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 3260a345ff..875a90fd62 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2190,19 +2190,250 @@ SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING
 SET enable_hashjoin TO off;
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                               QUERY PLAN                                                
+---------------------------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(COALESCE(t1.c, t2.c), t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1.c, t2.c))::text) = (t3.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1.c, t2.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1.c
+                                                   ->  Seq Scan on prt1_n_p1 t1
+                                             ->  Sort
+                                                   Sort Key: t2.c
+                                                   ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1_1.c, t2_1.c))::text) = (t3_1.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1_1.c, t2_1.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1_1.c
+                                                   ->  Seq Scan on prt1_n_p2_1 t1_1
+                                             ->  Sort
+                                                   Sort Key: t2_1.c
+                                                   ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: (((COALESCE(t1_2.c, t2_2.c))::text) = (t3_2.c)::text)
+                                 ->  Sort
+                                       Sort Key: ((COALESCE(t1_2.c, t2_2.c))::text)
+                                       ->  Merge Full Join
+                                             Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                             ->  Sort
+                                                   Sort Key: t1_2.c
+                                                   ->  Seq Scan on prt1_n_p2_2 t1_2
+                                             ->  Sort
+                                                   Sort Key: t2_2.c
+                                                   ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(70 rows)
+
 SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1.c, t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1.c, t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1.c)::text = (t3.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_1.c, t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_1.c, t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_2.c, t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_2.c, t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                                 ->  Merge Left Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(64 rows)
+
 SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1.c, t3.c))::text) = (t4.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1.c, t3.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1.c)::text = (t3.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Sort
+                                       Sort Key: t3.c
+                                       ->  Seq Scan on prt1_n_p1 t3
+                     ->  Sort
+                           Sort Key: t4.c
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_1.c, t3_1.c))::text) = (t4_1.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_1.c, t3_1.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Sort
+                                       Sort Key: t3_1.c
+                                       ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Sort
+                           Sort Key: t4_1.c
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Merge Full Join
+                     Merge Cond: (((COALESCE(t1_2.c, t3_2.c))::text) = (t4_2.c)::text)
+                     ->  Sort
+                           Sort Key: ((COALESCE(t1_2.c, t3_2.c))::text)
+                           ->  Merge Full Join
+                                 Merge Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                                 ->  Merge Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+                                 ->  Sort
+                                       Sort Key: t3_2.c
+                                       ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Sort
+                           Sort Key: t4_2.c
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(64 rows)
+
 SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
-ERROR:  could not find pathkey item to sort
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
 -- Beware of non-key join columns sneaking in the manually written COALESCE
 -- expressions; can't use partitionwise join in that case.
 ALTER TABLE prt1_n ADD d varchar DEFAULT '0002';
-- 
2.11.0

v3-0001-Some-cosmetic-improvements-to-partitionwise-join-.patchapplication/octet-stream; name=v3-0001-Some-cosmetic-improvements-to-partitionwise-join-.patchDownload

From c118bc5a249f58733f8bb6e0eaaf430e71754c10 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:22:31 +0900
Subject: [PATCH v3 1/3] Some cosmetic improvements to partitionwise join code

Among other changes, this moves a couple of functions from joinrel.c
to relnode.c.
---
 src/backend/optimizer/path/joinrels.c | 167 ---------------------
 src/backend/optimizer/util/plancat.c  |  20 +--
 src/backend/optimizer/util/relnode.c  | 268 +++++++++++++++++++++++++++++-----
 src/include/nodes/pathnodes.h         |  36 +++--
 src/include/optimizer/paths.h         |   3 -
 5 files changed, 271 insertions(+), 223 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a480ab764..fa68059c3f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -46,8 +46,6 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 												SpecialJoinInfo *parent_sjinfo,
 												Relids left_relids, Relids right_relids);
-static int	match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
-										 bool strict_op);
 
 
 /*
@@ -1573,168 +1571,3 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 
 	return sjinfo;
 }
-
-/*
- * Returns true if there exists an equi-join condition for each pair of
- * partition keys from given relations being joined.
- */
-bool
-have_partkey_equi_join(RelOptInfo *joinrel,
-					   RelOptInfo *rel1, RelOptInfo *rel2,
-					   JoinType jointype, List *restrictlist)
-{
-	PartitionScheme part_scheme = rel1->part_scheme;
-	ListCell   *lc;
-	int			cnt_pks;
-	bool		pk_has_clause[PARTITION_MAX_KEYS];
-	bool		strict_op;
-
-	/*
-	 * This function should be called when the joining relations have same
-	 * partitioning scheme.
-	 */
-	Assert(rel1->part_scheme == rel2->part_scheme);
-	Assert(part_scheme);
-
-	memset(pk_has_clause, 0, sizeof(pk_has_clause));
-	foreach(lc, restrictlist)
-	{
-		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
-		OpExpr	   *opexpr;
-		Expr	   *expr1;
-		Expr	   *expr2;
-		int			ipk1;
-		int			ipk2;
-
-		/* If processing an outer join, only use its own join clauses. */
-		if (IS_OUTER_JOIN(jointype) &&
-			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
-			continue;
-
-		/* Skip clauses which can not be used for a join. */
-		if (!rinfo->can_join)
-			continue;
-
-		/* Skip clauses which are not equality conditions. */
-		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
-			continue;
-
-		opexpr = castNode(OpExpr, rinfo->clause);
-
-		/*
-		 * The equi-join between partition keys is strict if equi-join between
-		 * at least one partition key is using a strict operator. See
-		 * explanation about outer join reordering identity 3 in
-		 * optimizer/README
-		 */
-		strict_op = op_strict(opexpr->opno);
-
-		/* Match the operands to the relation. */
-		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
-			bms_is_subset(rinfo->right_relids, rel2->relids))
-		{
-			expr1 = linitial(opexpr->args);
-			expr2 = lsecond(opexpr->args);
-		}
-		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
-				 bms_is_subset(rinfo->right_relids, rel1->relids))
-		{
-			expr1 = lsecond(opexpr->args);
-			expr2 = linitial(opexpr->args);
-		}
-		else
-			continue;
-
-		/*
-		 * Only clauses referencing the partition keys are useful for
-		 * partitionwise join.
-		 */
-		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
-		if (ipk1 < 0)
-			continue;
-		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
-		if (ipk2 < 0)
-			continue;
-
-		/*
-		 * If the clause refers to keys at different ordinal positions, it can
-		 * not be used for partitionwise join.
-		 */
-		if (ipk1 != ipk2)
-			continue;
-
-		/*
-		 * The clause allows partitionwise join if only it uses the same
-		 * operator family as that specified by the partition key.
-		 */
-		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
-		{
-			if (!op_in_opfamily(rinfo->hashjoinoperator,
-								part_scheme->partopfamily[ipk1]))
-				continue;
-		}
-		else if (!list_member_oid(rinfo->mergeopfamilies,
-								  part_scheme->partopfamily[ipk1]))
-			continue;
-
-		/* Mark the partition key as having an equi-join clause. */
-		pk_has_clause[ipk1] = true;
-	}
-
-	/* Check whether every partition key has an equi-join condition. */
-	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
-	{
-		if (!pk_has_clause[cnt_pks])
-			return false;
-	}
-
-	return true;
-}
-
-/*
- * Find the partition key from the given relation matching the given
- * expression. If found, return the index of the partition key, else return -1.
- */
-static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
-{
-	int			cnt;
-
-	/* This function should be called only for partitioned relations. */
-	Assert(rel->part_scheme);
-
-	/* Remove any relabel decorations. */
-	while (IsA(expr, RelabelType))
-		expr = (Expr *) (castNode(RelabelType, expr))->arg;
-
-	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
-	{
-		ListCell   *lc;
-
-		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-
-		if (!strict_op)
-			continue;
-
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
-		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-	}
-
-	return -1;
-}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cf1761401d..7f4195e061 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2248,9 +2248,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
 /*
  * set_baserel_partition_key_exprs
  *
- * Builds partition key expressions for the given base relation and sets them
- * in given RelOptInfo.  Any single column partition keys are converted to Var
- * nodes.  All Var nodes are restamped with the relid of given relation.
+ * Builds partition key expressions for the given base relation and sets
+ * rel->partexprs.
  */
 static void
 set_baserel_partition_key_exprs(Relation relation,
@@ -2298,17 +2297,20 @@ set_baserel_partition_key_exprs(Relation relation,
 			lc = lnext(partkey->partexprs, lc);
 		}
 
+		/* Base relations have a single expression per key. */
 		partexprs[cnt] = list_make1(partexpr);
 	}
 
+	/*
+	 * For base relations, we assume that the partition keys are non-nullable,
+	 * although they are nullable in principle; list and hash partitioned
+	 * tables may contain nulls in the partition key(s), for example.
+	 * Assuming non-nullability is okay for the considerations of partition
+	 * pruning, because pruning is never performed with non-strict operators.
+	 */
 	rel->partexprs = partexprs;
 
-	/*
-	 * A base relation can not have nullable partition key expressions. We
-	 * still allocate array of empty expressions lists to keep partition key
-	 * expression handling code simple. See build_joinrel_partition_info() and
-	 * match_expr_to_partition_keys().
-	 */
+	/* Assigning NIL for each key means there are no nullable keys. */
 	rel->nullable_partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 85415381fb..d5b5baaf2e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -29,6 +29,7 @@
 #include "optimizer/tlist.h"
 #include "partitioning/partbounds.h"
 #include "utils/hsearch.h"
+#include "utils/lsyscache.h"
 
 
 typedef struct JoinHashEntry
@@ -58,6 +59,14 @@ static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
 static void build_joinrel_partition_info(RelOptInfo *joinrel,
 										 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 										 List *restrictlist, JoinType jointype);
+static bool have_partkey_equi_join(RelOptInfo *joinrel,
+					   RelOptInfo *rel1, RelOptInfo *rel2,
+					   JoinType jointype, List *restrictlist);
+static int match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel,
+					bool strict_op);
+static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype);
 static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *parentrel,
 									   RelOptInfo *childrel,
@@ -1594,18 +1603,18 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer)
 
 /*
  * build_joinrel_partition_info
- *		If the two relations have same partitioning scheme, their join may be
- *		partitioned and will follow the same partitioning scheme as the joining
- *		relations. Set the partition scheme and partition key expressions in
- *		the join relation.
+ *		Checks if the two relations being joined can use partitionwise join
+ *		and if yes, initialize partitioning information of the resulting
+ *		partitioned relation
+ *
+ * This will set part_scheme and partition key expressions (partexprs and
+ * nullable_partexprs) if required.
  */
 static void
 build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 							 RelOptInfo *inner_rel, List *restrictlist,
 							 JoinType jointype)
 {
-	int			partnatts;
-	int			cnt;
 	PartitionScheme part_scheme;
 
 	/* Nothing to do if partitionwise join technique is disabled. */
@@ -1672,11 +1681,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	 */
 	joinrel->part_scheme = part_scheme;
 	joinrel->boundinfo = outer_rel->boundinfo;
-	partnatts = joinrel->part_scheme->partnatts;
-	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
-	joinrel->nullable_partexprs =
-		(List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nparts = outer_rel->nparts;
+	set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, jointype);
 	joinrel->part_rels =
 		(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * joinrel->nparts);
 
@@ -1686,32 +1692,201 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	Assert(outer_rel->consider_partitionwise_join);
 	Assert(inner_rel->consider_partitionwise_join);
 	joinrel->consider_partitionwise_join = true;
+}
+
+/*
+ * have_partkey_equi_join
+ *		Returns true if there exist equi-join conditions involving pairs
+ *		of matching partition keys of the relations being joined for all
+ *		partition keys
+ */
+static bool
+have_partkey_equi_join(RelOptInfo *joinrel,
+					   RelOptInfo *rel1, RelOptInfo *rel2,
+					   JoinType jointype, List *restrictlist)
+{
+	PartitionScheme part_scheme = rel1->part_scheme;
+	ListCell   *lc;
+	int			cnt_pks;
+	bool		pk_has_clause[PARTITION_MAX_KEYS];
+	bool		strict_op;
 
 	/*
-	 * Construct partition keys for the join.
-	 *
-	 * An INNER join between two partitioned relations can be regarded as
-	 * partitioned by either key expression.  For example, A INNER JOIN B ON
-	 * A.a = B.b can be regarded as partitioned on A.a or on B.b; they are
-	 * equivalent.
-	 *
-	 * For a SEMI or ANTI join, the result can only be regarded as being
-	 * partitioned in the same manner as the outer side, since the inner
-	 * columns are not retained.
-	 *
-	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
-	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
-	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b and
-	 * thus partition keys of an OUTER join should include partition key
-	 * expressions from the OUTER side only.  However, because all
-	 * commonly-used comparison operators are strict, the presence of nulls on
-	 * the outer side doesn't cause any problem; they can't match anything at
-	 * future join levels anyway.  Therefore, we track two sets of
-	 * expressions: those that authentically partition the relation
-	 * (partexprs) and those that partition the relation with the exception
-	 * that extra nulls may be present (nullable_partexprs).  When the
-	 * comparison operator is strict, the latter is just as good as the
-	 * former.
+	 * This function should be called when the joining relations have same
+	 * partitioning scheme.
+	 */
+	Assert(rel1->part_scheme == rel2->part_scheme);
+	Assert(part_scheme);
+
+	memset(pk_has_clause, 0, sizeof(pk_has_clause));
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		OpExpr	   *opexpr;
+		Expr	   *expr1;
+		Expr	   *expr2;
+		int			ipk1;
+		int			ipk2;
+
+		/* If processing an outer join, only use its own join clauses. */
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
+			continue;
+
+		/* Skip clauses which can not be used for a join. */
+		if (!rinfo->can_join)
+			continue;
+
+		/* Skip clauses which are not equality conditions. */
+		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
+			continue;
+
+		opexpr = castNode(OpExpr, rinfo->clause);
+
+		/*
+		 * The equi-join between partition keys is strict if equi-join between
+		 * at least one partition key is using a strict operator. See
+		 * explanation about outer join reordering identity 3 in
+		 * optimizer/README
+		 */
+		strict_op = op_strict(opexpr->opno);
+
+		/* Match the operands to the relation. */
+		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
+			bms_is_subset(rinfo->right_relids, rel2->relids))
+		{
+			expr1 = linitial(opexpr->args);
+			expr2 = lsecond(opexpr->args);
+		}
+		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
+				 bms_is_subset(rinfo->right_relids, rel1->relids))
+		{
+			expr1 = lsecond(opexpr->args);
+			expr2 = linitial(opexpr->args);
+		}
+		else
+			continue;
+
+		/*
+		 * Only clauses referencing the partition keys are useful for
+		 * partitionwise join.
+		 */
+		ipk1 = match_join_arg_to_partition_keys(expr1, rel1, strict_op);
+		if (ipk1 < 0)
+			continue;
+		ipk2 = match_join_arg_to_partition_keys(expr2, rel2, strict_op);
+		if (ipk2 < 0)
+			continue;
+
+		/*
+		 * If the clause refers to keys at different ordinal positions, it can
+		 * not be used for partitionwise join.
+		 */
+		if (ipk1 != ipk2)
+			continue;
+
+		/*
+		 * The clause allows partitionwise join if only it uses the same
+		 * operator family as that specified by the partition key.
+		 */
+		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
+		{
+			if (!op_in_opfamily(rinfo->hashjoinoperator,
+								part_scheme->partopfamily[ipk1]))
+				continue;
+		}
+		else if (!list_member_oid(rinfo->mergeopfamilies,
+								  part_scheme->partopfamily[ipk1]))
+			continue;
+
+		/* Mark the partition key as having an equi-join clause. */
+		pk_has_clause[ipk1] = true;
+	}
+
+	/* Check whether every partition key has an equi-join condition. */
+	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
+	{
+		if (!pk_has_clause[cnt_pks])
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * match_join_arg_to_partition_keys
+ *		Tries to match a join clause argument expression to one of the nullable
+ *		or non-nullable partition keys and if a match is found, returns the
+ *		matched	key's ordinal position or -1 if the expression could not be
+ *		matched to any of the keys
+ */
+static int
+match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+{
+	int			cnt;
+
+	/* This function should be called only for partitioned relations. */
+	Assert(rel->part_scheme);
+
+	/* Remove any relabel decorations. */
+	while (IsA(expr, RelabelType))
+		expr = (Expr *) (castNode(RelabelType, expr))->arg;
+
+	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
+	{
+		ListCell   *lc;
+
+		Assert(rel->partexprs);
+		foreach(lc, rel->partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+
+		if (!strict_op)
+			continue;
+
+		/*
+		 * If it's a strict equi-join a NULL partition key on one side will
+		 * not join a NULL partition key on the other side. So, rows with NULL
+		 * partition key from a partition on one side can not join with those
+		 * from a non-matching partition on the other side. So, search the
+		 * nullable partition keys as well.
+		 */
+		Assert(rel->nullable_partexprs);
+		foreach(lc, rel->nullable_partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+	}
+
+	return -1;
+}
+
+/*
+ * set_joinrel_partition_key_exprs
+ *		Initialize partition key expressions
+ */
+static void
+set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype)
+{
+	int		partnatts;
+	int		cnt;
+
+	Assert(joinrel->part_scheme != NULL);
+
+	partnatts = joinrel->part_scheme->partnatts;
+	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
+	joinrel->nullable_partexprs =
+		(List **) palloc0(sizeof(List *) * partnatts);
+
+	/*
+	 * Join type determines which partition keys are assumed by the resulting
+	 * join relation.  Note that these keys are to be considered when checking
+	 * if any further joins involving this joinrel may be partitioned.
 	 */
 	for (cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1725,18 +1900,37 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 
 		switch (jointype)
 		{
+			/*
+			 * Join relation resulting from an INNER join may be regarded as
+			 * partitioned by either of inner and outer relation keys.  For
+			 * example, A INNER JOIN B ON A.a = B.b can be regarded as
+			 * partitioned on either A.a or B.b.
+			 */
 			case JOIN_INNER:
 				partexpr = list_concat_copy(outer_expr, inner_expr);
 				nullable_partexpr = list_concat_copy(outer_null_expr,
 													 inner_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a SEMI or ANTI join may be
+			 * regarded as partitioned on the outer relation keys, since the
+			 * inner columns are omitted from the output.
+			 */
 			case JOIN_SEMI:
 			case JOIN_ANTI:
 				partexpr = list_copy(outer_expr);
 				nullable_partexpr = list_copy(outer_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a LEFT OUTER JOIN likewise may be
+			 * regarded as partitioned on the (non-nullable) outer relation
+			 * keys.  The nullability of inner relation keys prevents them to
+			 * be considered partition keys of the join relation in all cases,
+			 * but they are okay as partition keys for further joins that
+			 * involve strict join operators.
+			 */
 			case JOIN_LEFT:
 				partexpr = list_copy(outer_expr);
 				nullable_partexpr = list_concat_copy(inner_expr,
@@ -1745,6 +1939,12 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 												inner_null_expr);
 				break;
 
+			/*
+			 * For FULL OUTER JOINs, both relations are nullable, so the
+			 * resulting join relation may be regarded as partitioned on
+			 * either of inner and outer relation keys, but only for joins
+			 * that involve strict join operators.
+			 */
 			case JOIN_FULL:
 				nullable_partexpr = list_concat_copy(outer_expr,
 													 inner_expr);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 23a06d718e..80a5cb77f4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -584,16 +584,32 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *								 this relation that are partitioned tables
  *								 themselves, in hierarchical order
  *
- * Note: A base relation always has only one set of partition keys, but a join
- * relation may have as many sets of partition keys as the number of relations
- * being joined. partexprs and nullable_partexprs are arrays containing
- * part_scheme->partnatts elements each. Each of these elements is a list of
- * partition key expressions.  For a base relation each list in partexprs
- * contains only one expression and nullable_partexprs is not populated. For a
- * join relation, partexprs and nullable_partexprs contain partition key
- * expressions from non-nullable and nullable relations resp. Lists at any
- * given position in those arrays together contain as many elements as the
- * number of joining relations.
+ * Notes on partition key expressions (partexprs and nullable_partexprs):
+ *
+ * Partition key expressions will be used to spot references to the partition
+ * keys of the relation in the expressions of a given query so as to apply
+ * various partitioning-based optimizations to certain query constructs.  For
+ * example, pruning unnecessary partitions of a table using baserestrictinfo
+ * clauses that contain partition keys, converting a join between two
+ * partitioned relations into a series of joins between pairs of their
+ * constituent partitions if the joined rows follow the same partitioning
+ * as the relations being joined.
+ *
+ * The partexprs and nullable_partexprs arrays each contain
+ * part_scheme->partnatts elements.  Each of the elements is a list of
+ * partition key expressions.  For partitioned *base* relations, there is one
+ * expression in every list, whereas for partitioned *join* relations, there
+ * can be as many as the number of component relations.
+ *
+ * nullable_partexprs are populated only in partitioned *join* relationss,
+ * that is, if any of their component relations are nullable due to OUTER JOIN
+ * considerations.  It contains only the expressions of the nullable component
+ * relations, while those of the non-nullable relations are present in the
+ * partexprs.  For the considerations of partitionwise join, nullable partition
+ * keys can be considered to partition the underlying relation in the same
+ * manner as the non-nullable partition keys do, as long as the join operator
+ * is stable, because those null-valued keys can't be joined further, thus
+ * preserving the partitioning.
  *----------
  */
 typedef enum RelOptKind
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..54610b8656 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -106,9 +106,6 @@ extern bool have_join_order_restriction(PlannerInfo *root,
 extern bool have_dangerous_phv(PlannerInfo *root,
 							   Relids outer_relids, Relids inner_params);
 extern void mark_dummy_rel(RelOptInfo *rel);
-extern bool have_partkey_equi_join(RelOptInfo *joinrel,
-								   RelOptInfo *rel1, RelOptInfo *rel2,
-								   JoinType jointype, List *restrictlist);
 
 /*
  * equivclass.c
-- 
2.11.0

v3-0002-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchapplication/octet-stream; name=v3-0002-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchDownload

From 30d3ab6db13b780ada0667c4b1d9266b2e7fcf41 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:33:20 +0900
Subject: [PATCH v3 2/3] Fix partitionwise join to handle FULL JOINs correctly

---
 src/backend/optimizer/util/relnode.c         |  92 ++++++++--
 src/test/regress/expected/partition_join.out | 258 +++++++++++++++++++++++++++
 src/test/regress/sql/partition_join.sql      |  36 ++++
 3 files changed, 370 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d5b5baaf2e..3417438108 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -64,6 +64,7 @@ static bool have_partkey_equi_join(RelOptInfo *joinrel,
 					   JoinType jointype, List *restrictlist);
 static int match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel,
 					bool strict_op);
+static List *extract_coalesce_args(Expr *expr);
 static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 								JoinType jointype);
@@ -1824,6 +1825,8 @@ static int
 match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 {
 	int			cnt;
+	int			matched = -1;
+	List	   *nullable_exprs;
 
 	/* This function should be called only for partitioned relations. */
 	Assert(rel->part_scheme);
@@ -1832,36 +1835,93 @@ match_join_arg_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 	while (IsA(expr, RelabelType))
 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
 
+	/*
+	 * Extract the arguments from possibly nested COALESCE expressions.  Each
+	 * of these arguments could be null when joining, so these expressions are
+	 * called as such and are to be matched only with the nullable partition
+	 * keys.
+	 */
+	if (IsA(expr, CoalesceExpr))
+		nullable_exprs = extract_coalesce_args(expr);
+	else
+		/*
+		 * expr may or may not be nullable but add to the list anyway to
+		 * simplify the coding below.
+		 */
+		nullable_exprs = list_make1(expr);
+
 	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
 	{
-		ListCell   *lc;
-
 		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
+
+		/* Is the expression one of the non-nullable partition keys? */
+		if (list_member(rel->partexprs[cnt], expr))
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 
+		/*
+		 * Nope, so check if it is one of the nullable keys.  Allowing
+		 * nullable keys won't work if the join operator is not strict,
+		 * because null partition keys may then join with rows from other
+		 * partitions.  XXX - would that ever be true if the operator is
+		 * already determined to be mergejoin- and hashjoin-able?
+		 */
 		if (!strict_op)
 			continue;
 
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
+		/* OK to match with nullable keys. */
 		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
+
+		/* First rule out nullable_exprs containing non-key expressions. */
+		if (list_difference(nullable_exprs,
+							rel->nullable_partexprs[cnt]) != NIL)
+			continue;
+
+		if (list_intersection(rel->nullable_partexprs[cnt],
+							  nullable_exprs) != NIL)
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 	}
 
-	return -1;
+	Assert(list_length(nullable_exprs) >= 1);
+	list_free(nullable_exprs);
+
+	return matched;
+}
+
+/*
+ * extract_coalesce_args
+ *		Extract all arguments from arbitrarily nested CoalesceExpr's
+ *
+ * Note: caller should free the List structure when done using it.
+ */
+static List *
+extract_coalesce_args(Expr *expr)
+{
+	List   *coalesce_args = NIL;
+
+	while (expr && IsA(expr, CoalesceExpr))
+	{
+		CoalesceExpr *cexpr = (CoalesceExpr *) expr;
+		ListCell *lc;
+
+		expr = NULL;
+		foreach(lc, cexpr->args)
+		{
+			if (IsA(lfirst(lc), CoalesceExpr))
+				expr = lfirst(lc);
+			else
+				coalesce_args = lappend(coalesce_args, lfirst(lc));
+		}
+
+		Assert(expr == NULL || IsA(expr, CoalesceExpr));
+	}
+
+	return coalesce_args;
 }
 
 /*
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index cad8dd591a..3260a345ff 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2003,3 +2003,261 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
                            Filter: (b = 0)
 (16 rows)
 
+-- N-way join consisting of 2 or more full joins
+DROP TABLE prt1_n_p2;
+CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500') PARTITION BY RANGE (c);
+CREATE TABLE prt1_n_p2_1 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0250') TO ('0350');
+CREATE TABLE prt1_n_p2_2 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0350') TO ('0500');
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(250, 499, 2) i;
+ANALYZE prt1_n;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                              QUERY PLAN                                              
+------------------------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(COALESCE(t1.c, t2.c), t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1.c, t2.c), t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1.c, t2.c))::text = (t3.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1_1.c, t2_1.c), t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1_1.c, t2_1.c))::text = (t3_1.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(COALESCE(t1_2.c, t2_2.c), t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(t1_2.c, t2_2.c))::text = (t3_2.c)::text)
+                           ->  Hash Full Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1.c, t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1.c)::text = (t3.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_1.c, t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_2.c, t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                           ->  Hash Left Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: (COALESCE(COALESCE(t1.c, t3.c), t4.c))
+         ->  Append
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1.c, t3.c))::text = (t4.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1.c)::text = (t3.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1.c)::text = (t2.c)::text)
+                                 ->  Seq Scan on prt1_n_p1 t1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p1 t2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p1 t4
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_1.c, t3_1.c))::text = (t4_1.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_1.c)::text = (t3_1.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_1 t1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_1 t2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_1 t4_1
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(t1_2.c, t3_2.c))::text = (t4_2.c)::text)
+                     ->  Hash Full Join
+                           Hash Cond: ((t1_2.c)::text = (t3_2.c)::text)
+                           ->  Hash Join
+                                 Hash Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                 ->  Seq Scan on prt1_n_p2_2 t1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt1_n_p2_2 t2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_n_p2_2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_n_p2_2 t4_2
+(43 rows)
+
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+  c   | a | b | a | b | a | b | a | b 
+------+---+---+---+---+---+---+---+---
+ 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
+ 0002 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2
+ 0004 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4
+ 0006 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6
+ 0008 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8
+(5 rows)
+
+SET enable_hashjoin TO off;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+ERROR:  could not find pathkey item to sort
+-- Beware of non-key join columns sneaking in the manually written COALESCE
+-- expressions; can't use partitionwise join in that case.
+ALTER TABLE prt1_n ADD d varchar DEFAULT '0002';
+EXPLAIN (COSTS OFF)
+SELECT * FROM (prt1_n t1 FULL JOIN prt1_n t2 ON t1.c = t2.c) AS t12(a1, b1, c1, d1, a2, b2, c2, d2) FULL JOIN prt1_n t3 ON COALESCE(t12.d1, t12.c1) = t3.c ORDER BY t3.c LIMIT 5;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Limit
+   ->  Sort
+         Sort Key: t3.c
+         ->  Merge Full Join
+               Merge Cond: (((COALESCE(t1.d, t1.c))::text) = (t3.c)::text)
+               ->  Sort
+                     Sort Key: ((COALESCE(t1.d, t1.c))::text)
+                     ->  Result
+                           ->  Append
+                                 ->  Merge Full Join
+                                       Merge Cond: ((t1.c)::text = (t2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1.c
+                                             ->  Seq Scan on prt1_n_p1 t1
+                                       ->  Sort
+                                             Sort Key: t2.c
+                                             ->  Seq Scan on prt1_n_p1 t2
+                                 ->  Merge Full Join
+                                       Merge Cond: ((t1_1.c)::text = (t2_1.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t1_1
+                                       ->  Sort
+                                             Sort Key: t2_1.c
+                                             ->  Seq Scan on prt1_n_p2_1 t2_1
+                                 ->  Merge Full Join
+                                       Merge Cond: ((t1_2.c)::text = (t2_2.c)::text)
+                                       ->  Sort
+                                             Sort Key: t1_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t1_2
+                                       ->  Sort
+                                             Sort Key: t2_2.c
+                                             ->  Seq Scan on prt1_n_p2_2 t2_2
+               ->  Sort
+                     Sort Key: t3.c
+                     ->  Append
+                           ->  Seq Scan on prt1_n_p1 t3
+                           ->  Seq Scan on prt1_n_p2_1 t3_1
+                           ->  Seq Scan on prt1_n_p2_2 t3_2
+(39 rows)
+
+SELECT * FROM (prt1_n t1 FULL JOIN prt1_n t2 ON t1.c = t2.c) AS t12(a1, b1, c1, d1, a2, b2, c2, d2) FULL JOIN prt1_n t3 ON COALESCE(t12.d1, t12.c1) = t3.c ORDER BY t3.c LIMIT 5;
+ a1 | b1 |  c1  |  d1  | a2 | b2 |  c2  |  d2  | a | b |  c   |  d   
+----+----+------+------+----+----+------+------+---+---+------+------
+    |    |      |      |    |    |      |      | 0 | 0 | 0000 | 0002
+  2 |  2 | 0002 | 0002 |  2 |  2 | 0002 | 0002 | 2 | 2 | 0002 | 0002
+  4 |  4 | 0004 | 0002 |  4 |  4 | 0004 | 0002 | 2 | 2 | 0002 | 0002
+  6 |  6 | 0006 | 0002 |  6 |  6 | 0006 | 0002 | 2 | 2 | 0002 | 0002
+  0 |  0 | 0000 | 0002 |  0 |  0 | 0000 | 0002 | 2 | 2 | 0002 | 0002
+(5 rows)
+
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index fb3ba18a26..3e4661c977 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -270,6 +270,7 @@ EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 
+
 -- test default partition behavior for range
 ALTER TABLE prt1 DETACH PARTITION prt1_p3;
 ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
@@ -435,3 +436,38 @@ ANALYZE prt2;
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
+
+-- N-way join consisting of 2 or more full joins
+DROP TABLE prt1_n_p2;
+CREATE TABLE prt1_n_p2 PARTITION OF prt1_n FOR VALUES FROM ('0250') TO ('0500') PARTITION BY RANGE (c);
+CREATE TABLE prt1_n_p2_1 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0250') TO ('0350');
+CREATE TABLE prt1_n_p2_2 PARTITION OF prt1_n_p2 FOR VALUES FROM ('0350') TO ('0500');
+INSERT INTO prt1_n SELECT i, i, to_char(i, 'FM0000') FROM generate_series(250, 499, 2) i;
+ANALYZE prt1_n;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SET enable_hashjoin TO off;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 FULL JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 LEFT JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+EXPLAIN (COSTS OFF)
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+SELECT * FROM prt1_n t1 INNER JOIN prt1_n t2 USING (c) FULL JOIN prt1_n t3 USING (c) FULL JOIN prt1_n t4 USING (c) ORDER BY c LIMIT 5;
+
+-- Beware of non-key join columns sneaking in the manually written COALESCE
+-- expressions; can't use partitionwise join in that case.
+ALTER TABLE prt1_n ADD d varchar DEFAULT '0002';
+EXPLAIN (COSTS OFF)
+SELECT * FROM (prt1_n t1 FULL JOIN prt1_n t2 ON t1.c = t2.c) AS t12(a1, b1, c1, d1, a2, b2, c2, d2) FULL JOIN prt1_n t3 ON COALESCE(t12.d1, t12.c1) = t3.c ORDER BY t3.c LIMIT 5;
+SELECT * FROM (prt1_n t1 FULL JOIN prt1_n t2 ON t1.c = t2.c) AS t12(a1, b1, c1, d1, a2, b2, c2, d2) FULL JOIN prt1_n t3 ON COALESCE(t12.d1, t12.c1) = t3.c ORDER BY t3.c LIMIT 5;
-- 
2.11.0

#10

Richard Guo

riguo@pivotal.io

over 6 years ago

In reply to: Amit Langote (#9)

Re: d25ea01275 and partitionwise join

On Thu, Sep 19, 2019 at 4:15 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Hi Richard,

Thanks a lot for taking a close look at the patch and sorry about the
delay.

On Wed, Sep 4, 2019 at 5:29 PM Richard Guo <riguo@pivotal.io> wrote:

On Wed, Sep 4, 2019 at 10:01 AM Amit Langote <amitlangote09@gmail.com>

wrote:

I'm reviewing v2-0002 and I have concern about how COALESCE expr is
processed in match_join_arg_to_partition_keys().

If there is a COALESCE expr with first arg being non-partition key expr
and second arg being partition key, the patch would match it to the
partition key, which may result in wrong results in some cases.

For instance, consider the partition table below:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

So with patch v2-0002, the following query will be planned with
partitionwise join.

# explain (costs off)
select * from (p as t1 full join p as t2 on t1.k = t2.k) as

t12(k1,val1,k2,val2)

full join p as t3 on COALESCE(t12.val1,

t12.k1) = t3.k;

QUERY PLAN
----------------------------------------------------------
Append
-> Hash Full Join
Hash Cond: (COALESCE(t1.val, t1.k) = t3.k)
-> Hash Full Join
Hash Cond: (t1.k = t2.k)
-> Seq Scan on p_1 t1
-> Hash
-> Seq Scan on p_1 t2
-> Hash
-> Seq Scan on p_1 t3
-> Hash Full Join
Hash Cond: (COALESCE(t1_1.val, t1_1.k) = t3_1.k)
-> Hash Full Join
Hash Cond: (t1_1.k = t2_1.k)
-> Seq Scan on p_2 t1_1
-> Hash
-> Seq Scan on p_2 t2_1
-> Hash
-> Seq Scan on p_2 t3_1
(19 rows)

But as t1.val is not a partition key, actually we cannot use
partitionwise join here.

If we insert below data into the table, we will get wrong results for
the query above.

insert into p select 5,15;
insert into p select 15,5;

Good catch! It's quite wrong to use COALESCE(t12.val1, t12.k1) = t3.k
for partitionwise join as the COALESCE expression might as well output
the value of val1 which doesn't conform to partitioning.

I've fixed match_join_arg_to_partition_keys() to catch that case and
fail. Added a test case as well.

Please find attached updated patches.

Thank you for the fix. Will review.

Thanks
Richard

#11

Justin Pryzby

pryzby@telsasoft.com

about 6 years ago

In reply to: Amit Langote (#9)

Re: d25ea01275 and partitionwise join

On Thu, Sep 19, 2019 at 05:15:37PM +0900, Amit Langote wrote:

Please find attached updated patches.

Tom pointed me to this thread, since we hit it in 12.0
/messages/by-id/16802.1570989962@sss.pgh.pa.us

I can't say much about the patch; there's a little typo:
"The nullability of inner relation keys prevents them to"
..should say "prevent them from".

In order to compile it against REL12, I tried to cherry-pick this one:
3373c715: Speed up finding EquivalenceClasses for a given set of rels

But then it crashes in check-world (possibly due to misapplied hunks).

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

#12

Justin Pryzby

pryzby@telsasoft.com

about 6 years ago

In reply to: Justin Pryzby (#11)

Re: d25ea01275 and partitionwise join

On Sun, Oct 13, 2019 at 03:02:17PM -0500, Justin Pryzby wrote:

On Thu, Sep 19, 2019 at 05:15:37PM +0900, Amit Langote wrote:

Please find attached updated patches.

Tom pointed me to this thread, since we hit it in 12.0
/messages/by-id/16802.1570989962@sss.pgh.pa.us

I can't say much about the patch; there's a little typo:
"The nullability of inner relation keys prevents them to"
..should say "prevent them from".

In order to compile it against REL12, I tried to cherry-pick this one:
3373c715: Speed up finding EquivalenceClasses for a given set of rels

But then it crashes in check-world (possibly due to misapplied hunks).

I did it again paying more attention and got it to pass.

The PWJ + FULL JOIN query seems ok now.

But I'll leave PWJ disabled until I can look more closely.

$ PGOPTIONS='-c max_parallel_workers_per_gather=0 -c enable_mergejoin=off -c enable_hashagg=off -c enable_partitionwise_join=on' psql postgres -f tmp/sql-2019-10-11.1
SET
Nested Loop (cost=80106964.13..131163200.28 rows=2226681567 width=6)
Join Filter: ((s.site_location = ''::text) OR ((s.site_office)::integer = ((COALESCE(t1.site_id, t2.site_id))::integer)))
-> Group (cost=80106964.13..80837945.46 rows=22491733 width=12)
Group Key: (COALESCE(t1.start_time, t2.start_time)), ((COALESCE(t1.site_id, t2.site_id))::integer)
-> Merge Append (cost=80106964.13..80613028.13 rows=22491733 width=12)
Sort Key: (COALESCE(t1.start_time, t2.start_time)), ((COALESCE(t1.site_id, t2.site_id))::integer)
-> Group (cost=25494496.54..25633699.28 rows=11136219 width=12)
Group Key: (COALESCE(t1.start_time, t2.start_time)), ((COALESCE(t1.site_id, t2.site_id))::integer)
-> Sort (cost=25494496.54..25522337.09 rows=11136219 width=12)
Sort Key: (COALESCE(t1.start_time, t2.start_time)), ((COALESCE(t1.site_id, t2.site_id))::integer)
-> Hash Full Join (cost=28608.75..24191071.36 rows=11136219 width=12)
Hash Cond: ((t1.start_time = t2.start_time) AND (t1.site_id = t2.site_id))
Filter: ((COALESCE(t1.start_time, t2.start_time) >= '2019-10-01 00:00:00'::timestamp without time zone) AND (COALESCE(t1.start_time, t2.start_time) < '2019-10-01 01:00:00'::timestamp without time zone))
-> Seq Scan on t1 (cost=0.00..14495.10 rows=940910 width=10)
-> Hash (cost=14495.10..14495.10 rows=940910 width=10)
-> Seq Scan on t1 t2 (cost=0.00..14495.10 rows=940910 width=10)
-> Group (cost=54612467.58..54754411.51 rows=11355514 width=12)
Group Key: (COALESCE(t1_1.start_time, t2_1.start_time)), ((COALESCE(t1_1.site_id, t2_1.site_id))::integer)
-> Sort (cost=54612467.58..54640856.37 rows=11355514 width=12)
Sort Key: (COALESCE(t1_1.start_time, t2_1.start_time)), ((COALESCE(t1_1.site_id, t2_1.site_id))::integer)
-> Hash Full Join (cost=28608.75..53281777.94 rows=11355514 width=12)
Hash Cond: ((t1_1.start_time = t2_1.start_time) AND (t1_1.site_id = t2_1.site_id))
Filter: ((COALESCE(t1_1.start_time, t2_1.start_time) >= '2019-10-01 00:00:00'::timestamp without time zone) AND (COALESCE(t1_1.start_time, t2_1.start_time) < '2019-10-01 01:00:00'::timestamp without time zone))
-> Seq Scan on t2 t1_1 (cost=0.00..14495.10 rows=940910 width=10)
-> Hash (cost=14495.10..14495.10 rows=940910 width=10)
-> Seq Scan on t2 t2_1 (cost=0.00..14495.10 rows=940910 width=10)
-> Materialize (cost=0.00..2.48 rows=99 width=6)
-> Seq Scan on s (cost=0.00..1.99 rows=99 width=6)

--
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

#13

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Justin Pryzby (#11)

Re: d25ea01275 and partitionwise join

Hi Justin,

On Mon, Oct 14, 2019 at 5:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Thu, Sep 19, 2019 at 05:15:37PM +0900, Amit Langote wrote:

Please find attached updated patches.

Tom pointed me to this thread, since we hit it in 12.0
/messages/by-id/16802.1570989962@sss.pgh.pa.us

I can't say much about the patch; there's a little typo:
"The nullability of inner relation keys prevents them to"
..should say "prevent them from".

Thanks, will fix.

Regards,
Amit

#14

Tom Lane

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Amit Langote (#13)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Oct 14, 2019 at 5:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I can't say much about the patch; there's a little typo:
"The nullability of inner relation keys prevents them to"
..should say "prevent them from".

Thanks, will fix.

Just to leave a breadcrumb in this thread --- the planner failure
induced by d25ea01275 has been fixed in 529ebb20a. The difficulty
with multiway full joins that Amit started this thread with remains
open, but I imagine the posted patches will need rebasing over
529ebb20a.

regards, tom lane

#15

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tom Lane (#14)

3 attachment(s)

Re: d25ea01275 and partitionwise join

On Wed, Nov 6, 2019 at 2:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Oct 14, 2019 at 5:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

I can't say much about the patch; there's a little typo:
"The nullability of inner relation keys prevents them to"
..should say "prevent them from".

Thanks, will fix.

Just to leave a breadcrumb in this thread --- the planner failure
induced by d25ea01275 has been fixed in 529ebb20a. The difficulty
with multiway full joins that Amit started this thread with remains
open, but I imagine the posted patches will need rebasing over
529ebb20a.

Here are the rebased patches.

I've divided the patch containing only cosmetic improvements into two
patches, of which the latter only moves around code. Patch 0003
implements the actual fix to the problem with multiway joins.

Thanks,
Amit

Attachments:

0001-Some-cosmetic-improvements-to-partitionwise-join-cod.patchapplication/octet-stream; name=0001-Some-cosmetic-improvements-to-partitionwise-join-cod.patchDownload

From 1a9de062aad7b3d909c291d420edd1a2e45db461 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:22:31 +0900
Subject: [PATCH 1/4] Some cosmetic improvements to partitionwise join code

---
 src/backend/optimizer/path/joinrels.c | 18 +++++--
 src/backend/optimizer/util/plancat.c  | 20 ++++----
 src/backend/optimizer/util/relnode.c  | 92 ++++++++++++++++++++++-------------
 src/include/nodes/pathnodes.h         | 36 ++++++++++----
 4 files changed, 109 insertions(+), 57 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6a480ab764..6c0904b695 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1575,8 +1575,11 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 }
 
 /*
- * Returns true if there exists an equi-join condition for each pair of
- * partition keys from given relations being joined.
+ * have_partkey_equi_join
+ *
+ * Returns true if there exist equi-join conditions involving pairs
+ * of matching partition keys of the relations being joined for all
+ * partition keys.
  */
 bool
 have_partkey_equi_join(RelOptInfo *joinrel,
@@ -1692,8 +1695,15 @@ have_partkey_equi_join(RelOptInfo *joinrel,
 }
 
 /*
- * Find the partition key from the given relation matching the given
- * expression. If found, return the index of the partition key, else return -1.
+ * match_expr_to_partition_keys
+ *
+ * Tries to match an expression to one of the nullable or non-nullable
+ * partition keys and if a match is found, returns the matched	key's
+ * ordinal position or -1 if the expression could not be matched to any
+ * of the keys.
+ *
+ * strict_op must be true if the expression will be compared with the
+ * partition key using a strict operator.
  */
 static int
 match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index e5f9e04d65..c85d321202 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2248,9 +2248,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
 /*
  * set_baserel_partition_key_exprs
  *
- * Builds partition key expressions for the given base relation and sets them
- * in given RelOptInfo.  Any single column partition keys are converted to Var
- * nodes.  All Var nodes are restamped with the relid of given relation.
+ * Builds partition key expressions for the given base relation and sets
+ * rel->partexprs.
  */
 static void
 set_baserel_partition_key_exprs(Relation relation,
@@ -2298,17 +2297,20 @@ set_baserel_partition_key_exprs(Relation relation,
 			lc = lnext(partkey->partexprs, lc);
 		}
 
+		/* Base relations have a single expression per key. */
 		partexprs[cnt] = list_make1(partexpr);
 	}
 
+	/*
+	 * For base relations, we assume that the partition keys are non-nullable,
+	 * although they are nullable in principle; list and hash partitioned
+	 * tables may contain nulls in the partition key(s), for example.
+	 * Assuming non-nullability is okay for the considerations of partition
+	 * pruning, because pruning is never performed with non-strict operators.
+	 */
 	rel->partexprs = partexprs;
 
-	/*
-	 * A base relation can not have nullable partition key expressions. We
-	 * still allocate array of empty expressions lists to keep partition key
-	 * expression handling code simple. See build_joinrel_partition_info() and
-	 * match_expr_to_partition_keys().
-	 */
+	/* Assigning NIL for each key means there are no nullable keys. */
 	rel->nullable_partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 03e02423b2..e30aa692d7 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -29,6 +29,7 @@
 #include "optimizer/tlist.h"
 #include "partitioning/partbounds.h"
 #include "utils/hsearch.h"
+#include "utils/lsyscache.h"
 
 
 typedef struct JoinHashEntry
@@ -58,6 +59,9 @@ static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
 static void build_joinrel_partition_info(RelOptInfo *joinrel,
 										 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 										 List *restrictlist, JoinType jointype);
+static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype);
 static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *parentrel,
 									   RelOptInfo *childrel,
@@ -1607,18 +1611,18 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer)
 
 /*
  * build_joinrel_partition_info
- *		If the two relations have same partitioning scheme, their join may be
- *		partitioned and will follow the same partitioning scheme as the joining
- *		relations. Set the partition scheme and partition key expressions in
- *		the join relation.
+ *		Checks if the two relations being joined can use partitionwise join
+ *		and if yes, initialize partitioning information of the resulting
+ *		partitioned relation
+ *
+ * This will set part_scheme and partition key expressions (partexprs and
+ * nullable_partexprs) if required.
  */
 static void
 build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 							 RelOptInfo *inner_rel, List *restrictlist,
 							 JoinType jointype)
 {
-	int			partnatts;
-	int			cnt;
 	PartitionScheme part_scheme;
 
 	/* Nothing to do if partitionwise join technique is disabled. */
@@ -1685,11 +1689,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	 */
 	joinrel->part_scheme = part_scheme;
 	joinrel->boundinfo = outer_rel->boundinfo;
-	partnatts = joinrel->part_scheme->partnatts;
-	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
-	joinrel->nullable_partexprs =
-		(List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nparts = outer_rel->nparts;
+	set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, jointype);
 	joinrel->part_rels =
 		(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * joinrel->nparts);
 
@@ -1699,32 +1700,31 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	Assert(outer_rel->consider_partitionwise_join);
 	Assert(inner_rel->consider_partitionwise_join);
 	joinrel->consider_partitionwise_join = true;
+}
+
+/*
+ * set_joinrel_partition_key_exprs
+ *		Initialize partition key expressions
+ */
+static void
+set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype)
+{
+	int		partnatts;
+	int		cnt;
+
+	Assert(joinrel->part_scheme != NULL);
+
+	partnatts = joinrel->part_scheme->partnatts;
+	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
+	joinrel->nullable_partexprs =
+		(List **) palloc0(sizeof(List *) * partnatts);
 
 	/*
-	 * Construct partition keys for the join.
-	 *
-	 * An INNER join between two partitioned relations can be regarded as
-	 * partitioned by either key expression.  For example, A INNER JOIN B ON
-	 * A.a = B.b can be regarded as partitioned on A.a or on B.b; they are
-	 * equivalent.
-	 *
-	 * For a SEMI or ANTI join, the result can only be regarded as being
-	 * partitioned in the same manner as the outer side, since the inner
-	 * columns are not retained.
-	 *
-	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
-	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
-	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b and
-	 * thus partition keys of an OUTER join should include partition key
-	 * expressions from the OUTER side only.  However, because all
-	 * commonly-used comparison operators are strict, the presence of nulls on
-	 * the outer side doesn't cause any problem; they can't match anything at
-	 * future join levels anyway.  Therefore, we track two sets of
-	 * expressions: those that authentically partition the relation
-	 * (partexprs) and those that partition the relation with the exception
-	 * that extra nulls may be present (nullable_partexprs).  When the
-	 * comparison operator is strict, the latter is just as good as the
-	 * former.
+	 * Join type determines which partition keys are assumed by the resulting
+	 * join relation.  Note that these keys are to be considered when checking
+	 * if any further joins involving this joinrel may be partitioned.
 	 */
 	for (cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1738,18 +1738,36 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 
 		switch (jointype)
 		{
+			/*
+			 * Join relation resulting from an INNER join may be regarded as
+			 * partitioned by either of inner and outer relation keys.  For
+			 * example, A INNER JOIN B ON A.a = B.b can be regarded as
+			 * partitioned on either A.a or B.b.
+			 */
 			case JOIN_INNER:
 				partexpr = list_concat_copy(outer_expr, inner_expr);
 				nullable_partexpr = list_concat_copy(outer_null_expr,
 													 inner_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a SEMI or ANTI join may be
+			 * regarded as partitioned on the outer relation keys, since the
+			 * inner columns are omitted from the output.
+			 */
 			case JOIN_SEMI:
 			case JOIN_ANTI:
 				partexpr = list_copy(outer_expr);
 				nullable_partexpr = list_copy(outer_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a LEFT OUTER JOIN likewise may be
+			 * regarded as partitioned on the (non-nullable) outer relation
+			 * keys.  The inner (nullable) relation keys are okay as partition
+			 * keys for further joins as long as they involve strict join
+			 * operators.
+			 */
 			case JOIN_LEFT:
 				partexpr = list_copy(outer_expr);
 				nullable_partexpr = list_concat_copy(inner_expr,
@@ -1758,6 +1776,12 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 												inner_null_expr);
 				break;
 
+			/*
+			 * For FULL OUTER JOINs, both relations are nullable, so the
+			 * resulting join relation may be regarded as partitioned on
+			 * either of inner and outer relation keys, but only for joins
+			 * that involve strict join operators.
+			 */
 			case JOIN_FULL:
 				nullable_partexpr = list_concat_copy(outer_expr,
 													 inner_expr);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 23a06d718e..80a5cb77f4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -584,16 +584,32 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *								 this relation that are partitioned tables
  *								 themselves, in hierarchical order
  *
- * Note: A base relation always has only one set of partition keys, but a join
- * relation may have as many sets of partition keys as the number of relations
- * being joined. partexprs and nullable_partexprs are arrays containing
- * part_scheme->partnatts elements each. Each of these elements is a list of
- * partition key expressions.  For a base relation each list in partexprs
- * contains only one expression and nullable_partexprs is not populated. For a
- * join relation, partexprs and nullable_partexprs contain partition key
- * expressions from non-nullable and nullable relations resp. Lists at any
- * given position in those arrays together contain as many elements as the
- * number of joining relations.
+ * Notes on partition key expressions (partexprs and nullable_partexprs):
+ *
+ * Partition key expressions will be used to spot references to the partition
+ * keys of the relation in the expressions of a given query so as to apply
+ * various partitioning-based optimizations to certain query constructs.  For
+ * example, pruning unnecessary partitions of a table using baserestrictinfo
+ * clauses that contain partition keys, converting a join between two
+ * partitioned relations into a series of joins between pairs of their
+ * constituent partitions if the joined rows follow the same partitioning
+ * as the relations being joined.
+ *
+ * The partexprs and nullable_partexprs arrays each contain
+ * part_scheme->partnatts elements.  Each of the elements is a list of
+ * partition key expressions.  For partitioned *base* relations, there is one
+ * expression in every list, whereas for partitioned *join* relations, there
+ * can be as many as the number of component relations.
+ *
+ * nullable_partexprs are populated only in partitioned *join* relationss,
+ * that is, if any of their component relations are nullable due to OUTER JOIN
+ * considerations.  It contains only the expressions of the nullable component
+ * relations, while those of the non-nullable relations are present in the
+ * partexprs.  For the considerations of partitionwise join, nullable partition
+ * keys can be considered to partition the underlying relation in the same
+ * manner as the non-nullable partition keys do, as long as the join operator
+ * is stable, because those null-valued keys can't be joined further, thus
+ * preserving the partitioning.
  *----------
  */
 typedef enum RelOptKind
-- 
2.11.0

0002-Move-some-code-from-joinrel.c-to-relnode.c.patchapplication/octet-stream; name=0002-Move-some-code-from-joinrel.c-to-relnode.c.patchDownload

From 30c74413e5b0b138048dbd0796ed83fd7027854a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 6 Nov 2019 11:00:56 +0900
Subject: [PATCH 2/4] Move some code from joinrel.c to relnode.c

---
 src/backend/optimizer/path/joinrels.c | 177 ---------------------------------
 src/backend/optimizer/util/relnode.c  | 180 ++++++++++++++++++++++++++++++++++
 src/include/optimizer/paths.h         |   3 -
 3 files changed, 180 insertions(+), 180 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6c0904b695..fa68059c3f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -46,8 +46,6 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 												SpecialJoinInfo *parent_sjinfo,
 												Relids left_relids, Relids right_relids);
-static int	match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
-										 bool strict_op);
 
 
 /*
@@ -1573,178 +1571,3 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 
 	return sjinfo;
 }
-
-/*
- * have_partkey_equi_join
- *
- * Returns true if there exist equi-join conditions involving pairs
- * of matching partition keys of the relations being joined for all
- * partition keys.
- */
-bool
-have_partkey_equi_join(RelOptInfo *joinrel,
-					   RelOptInfo *rel1, RelOptInfo *rel2,
-					   JoinType jointype, List *restrictlist)
-{
-	PartitionScheme part_scheme = rel1->part_scheme;
-	ListCell   *lc;
-	int			cnt_pks;
-	bool		pk_has_clause[PARTITION_MAX_KEYS];
-	bool		strict_op;
-
-	/*
-	 * This function should be called when the joining relations have same
-	 * partitioning scheme.
-	 */
-	Assert(rel1->part_scheme == rel2->part_scheme);
-	Assert(part_scheme);
-
-	memset(pk_has_clause, 0, sizeof(pk_has_clause));
-	foreach(lc, restrictlist)
-	{
-		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
-		OpExpr	   *opexpr;
-		Expr	   *expr1;
-		Expr	   *expr2;
-		int			ipk1;
-		int			ipk2;
-
-		/* If processing an outer join, only use its own join clauses. */
-		if (IS_OUTER_JOIN(jointype) &&
-			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
-			continue;
-
-		/* Skip clauses which can not be used for a join. */
-		if (!rinfo->can_join)
-			continue;
-
-		/* Skip clauses which are not equality conditions. */
-		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
-			continue;
-
-		opexpr = castNode(OpExpr, rinfo->clause);
-
-		/*
-		 * The equi-join between partition keys is strict if equi-join between
-		 * at least one partition key is using a strict operator. See
-		 * explanation about outer join reordering identity 3 in
-		 * optimizer/README
-		 */
-		strict_op = op_strict(opexpr->opno);
-
-		/* Match the operands to the relation. */
-		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
-			bms_is_subset(rinfo->right_relids, rel2->relids))
-		{
-			expr1 = linitial(opexpr->args);
-			expr2 = lsecond(opexpr->args);
-		}
-		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
-				 bms_is_subset(rinfo->right_relids, rel1->relids))
-		{
-			expr1 = lsecond(opexpr->args);
-			expr2 = linitial(opexpr->args);
-		}
-		else
-			continue;
-
-		/*
-		 * Only clauses referencing the partition keys are useful for
-		 * partitionwise join.
-		 */
-		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
-		if (ipk1 < 0)
-			continue;
-		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
-		if (ipk2 < 0)
-			continue;
-
-		/*
-		 * If the clause refers to keys at different ordinal positions, it can
-		 * not be used for partitionwise join.
-		 */
-		if (ipk1 != ipk2)
-			continue;
-
-		/*
-		 * The clause allows partitionwise join if only it uses the same
-		 * operator family as that specified by the partition key.
-		 */
-		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
-		{
-			if (!op_in_opfamily(rinfo->hashjoinoperator,
-								part_scheme->partopfamily[ipk1]))
-				continue;
-		}
-		else if (!list_member_oid(rinfo->mergeopfamilies,
-								  part_scheme->partopfamily[ipk1]))
-			continue;
-
-		/* Mark the partition key as having an equi-join clause. */
-		pk_has_clause[ipk1] = true;
-	}
-
-	/* Check whether every partition key has an equi-join condition. */
-	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
-	{
-		if (!pk_has_clause[cnt_pks])
-			return false;
-	}
-
-	return true;
-}
-
-/*
- * match_expr_to_partition_keys
- *
- * Tries to match an expression to one of the nullable or non-nullable
- * partition keys and if a match is found, returns the matched	key's
- * ordinal position or -1 if the expression could not be matched to any
- * of the keys.
- *
- * strict_op must be true if the expression will be compared with the
- * partition key using a strict operator.
- */
-static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
-{
-	int			cnt;
-
-	/* This function should be called only for partitioned relations. */
-	Assert(rel->part_scheme);
-
-	/* Remove any relabel decorations. */
-	while (IsA(expr, RelabelType))
-		expr = (Expr *) (castNode(RelabelType, expr))->arg;
-
-	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
-	{
-		ListCell   *lc;
-
-		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-
-		if (!strict_op)
-			continue;
-
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
-		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-	}
-
-	return -1;
-}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e30aa692d7..07ece2f870 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -67,6 +67,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *childrel,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
+static bool have_partkey_equi_join(RelOptInfo *joinrel,
+								   RelOptInfo *rel1, RelOptInfo *rel2,
+								   JoinType jointype, List *restrictlist);
+static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
+										bool strict_op);
 
 
 /*
@@ -1823,3 +1828,178 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * have_partkey_equi_join
+ *
+ * Returns true if there exist equi-join conditions involving pairs
+ * of matching partition keys of the relations being joined for all
+ * partition keys.
+ */
+bool
+have_partkey_equi_join(RelOptInfo *joinrel,
+					   RelOptInfo *rel1, RelOptInfo *rel2,
+					   JoinType jointype, List *restrictlist)
+{
+	PartitionScheme part_scheme = rel1->part_scheme;
+	ListCell   *lc;
+	int			cnt_pks;
+	bool		pk_has_clause[PARTITION_MAX_KEYS];
+	bool		strict_op;
+
+	/*
+	 * This function should be called when the joining relations have same
+	 * partitioning scheme.
+	 */
+	Assert(rel1->part_scheme == rel2->part_scheme);
+	Assert(part_scheme);
+
+	memset(pk_has_clause, 0, sizeof(pk_has_clause));
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		OpExpr	   *opexpr;
+		Expr	   *expr1;
+		Expr	   *expr2;
+		int			ipk1;
+		int			ipk2;
+
+		/* If processing an outer join, only use its own join clauses. */
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
+			continue;
+
+		/* Skip clauses which can not be used for a join. */
+		if (!rinfo->can_join)
+			continue;
+
+		/* Skip clauses which are not equality conditions. */
+		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
+			continue;
+
+		opexpr = castNode(OpExpr, rinfo->clause);
+
+		/*
+		 * The equi-join between partition keys is strict if equi-join between
+		 * at least one partition key is using a strict operator. See
+		 * explanation about outer join reordering identity 3 in
+		 * optimizer/README
+		 */
+		strict_op = op_strict(opexpr->opno);
+
+		/* Match the operands to the relation. */
+		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
+			bms_is_subset(rinfo->right_relids, rel2->relids))
+		{
+			expr1 = linitial(opexpr->args);
+			expr2 = lsecond(opexpr->args);
+		}
+		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
+				 bms_is_subset(rinfo->right_relids, rel1->relids))
+		{
+			expr1 = lsecond(opexpr->args);
+			expr2 = linitial(opexpr->args);
+		}
+		else
+			continue;
+
+		/*
+		 * Only clauses referencing the partition keys are useful for
+		 * partitionwise join.
+		 */
+		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
+		if (ipk1 < 0)
+			continue;
+		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
+		if (ipk2 < 0)
+			continue;
+
+		/*
+		 * If the clause refers to keys at different ordinal positions, it can
+		 * not be used for partitionwise join.
+		 */
+		if (ipk1 != ipk2)
+			continue;
+
+		/*
+		 * The clause allows partitionwise join if only it uses the same
+		 * operator family as that specified by the partition key.
+		 */
+		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
+		{
+			if (!op_in_opfamily(rinfo->hashjoinoperator,
+								part_scheme->partopfamily[ipk1]))
+				continue;
+		}
+		else if (!list_member_oid(rinfo->mergeopfamilies,
+								  part_scheme->partopfamily[ipk1]))
+			continue;
+
+		/* Mark the partition key as having an equi-join clause. */
+		pk_has_clause[ipk1] = true;
+	}
+
+	/* Check whether every partition key has an equi-join condition. */
+	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
+	{
+		if (!pk_has_clause[cnt_pks])
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * match_expr_to_partition_keys
+ *
+ * Tries to match an expression to one of the nullable or non-nullable
+ * partition keys and if a match is found, returns the matched	key's
+ * ordinal position or -1 if the expression could not be matched to any
+ * of the keys.
+ *
+ * strict_op must be true if the expression will be compared with the
+ * partition key using a strict operator.
+ */
+static int
+match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+{
+	int			cnt;
+
+	/* This function should be called only for partitioned relations. */
+	Assert(rel->part_scheme);
+
+	/* Remove any relabel decorations. */
+	while (IsA(expr, RelabelType))
+		expr = (Expr *) (castNode(RelabelType, expr))->arg;
+
+	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
+	{
+		ListCell   *lc;
+
+		Assert(rel->partexprs);
+		foreach(lc, rel->partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+
+		if (!strict_op)
+			continue;
+
+		/*
+		 * If it's a strict equi-join a NULL partition key on one side will
+		 * not join a NULL partition key on the other side. So, rows with NULL
+		 * partition key from a partition on one side can not join with those
+		 * from a non-matching partition on the other side. So, search the
+		 * nullable partition keys as well.
+		 */
+		Assert(rel->nullable_partexprs);
+		foreach(lc, rel->nullable_partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+	}
+
+	return -1;
+}
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index c6c34630c2..1d74faddb8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -106,9 +106,6 @@ extern bool have_join_order_restriction(PlannerInfo *root,
 extern bool have_dangerous_phv(PlannerInfo *root,
 							   Relids outer_relids, Relids inner_params);
 extern void mark_dummy_rel(RelOptInfo *rel);
-extern bool have_partkey_equi_join(RelOptInfo *joinrel,
-								   RelOptInfo *rel1, RelOptInfo *rel2,
-								   JoinType jointype, List *restrictlist);
 
 /*
  * equivclass.c
-- 
2.11.0

0003-Fix-partitionwise-join-to-handle-FULL-JOINs-correctl.patchapplication/octet-stream; name=0003-Fix-partitionwise-join-to-handle-FULL-JOINs-correctl.patchDownload

From a46d0b0d4dbf4f8474cbb8a5047ff955ceca5759 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:33:20 +0900
Subject: [PATCH 3/4] Fix partitionwise join to handle FULL JOINs correctly

---
 src/backend/optimizer/util/relnode.c         | 104 +++++++++++++++++----
 src/test/regress/expected/partition_join.out | 129 +++++++++++++++++++++++++++
 src/test/regress/sql/partition_join.sql      |  24 +++++
 3 files changed, 241 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 07ece2f870..ac34aed0e0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -72,6 +72,7 @@ static bool have_partkey_equi_join(RelOptInfo *joinrel,
 								   JoinType jointype, List *restrictlist);
 static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
 										bool strict_op);
+static List *extract_coalesce_args(Expr *expr);
 
 
 /*
@@ -1964,6 +1965,8 @@ static int
 match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 {
 	int			cnt;
+	int			matched = -1;
+	List	   *nullable_exprs;
 
 	/* This function should be called only for partitioned relations. */
 	Assert(rel->part_scheme);
@@ -1972,34 +1975,103 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 	while (IsA(expr, RelabelType))
 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
 
+	/* For PlaceHolderVars, refer to contained expression. */
+	if (IsA(expr, PlaceHolderVar))
+		expr = (castNode(PlaceHolderVar, expr))->phexpr;
+
+	/*
+	 * Extract the arguments from possibly nested COALESCE expressions.  Each
+	 * of these arguments could be null when joining, so these expressions are
+	 * called as such and are to be matched only with the nullable partition
+	 * keys.
+	 */
+	if (IsA(expr, CoalesceExpr))
+		nullable_exprs = extract_coalesce_args(expr);
+	else
+		/*
+		 * expr may or may not be nullable but add to the list anyway to
+		 * simplify the coding below.
+		 */
+		nullable_exprs = list_make1(expr);
+
 	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
 	{
-		ListCell   *lc;
-
 		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
+
+		/* Is the expression one of the non-nullable partition keys? */
+		if (list_member(rel->partexprs[cnt], expr))
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 
+		/*
+		 * Nope, so check if it is one of the nullable keys.  Allowing
+		 * nullable keys won't work if the join operator is not strict,
+		 * because null partition keys may then join with rows from other
+		 * partitions.  XXX - would that ever be true if the operator is
+		 * already determined to be mergejoin- and hashjoin-able?
+		 */
 		if (!strict_op)
 			continue;
 
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
+		/* OK to match with nullable keys. */
 		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
+
+		/* First rule out nullable_exprs containing non-key expressions. */
+		if (list_difference(nullable_exprs,
+							rel->nullable_partexprs[cnt]) != NIL)
+			continue;
+
+		if (list_intersection(rel->nullable_partexprs[cnt],
+							  nullable_exprs) != NIL)
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 	}
 
-	return -1;
+	Assert(list_length(nullable_exprs) >= 1);
+	list_free(nullable_exprs);
+
+	return matched;
+}
+
+/*
+ * extract_coalesce_args
+ *		Extract all arguments from arbitrarily nested CoalesceExpr's
+ *
+ * Note: caller should free the List structure when done using it.
+ */
+static List *
+extract_coalesce_args(Expr *expr)
+{
+	List   *coalesce_args = NIL;
+
+	while (expr && IsA(expr, CoalesceExpr))
+	{
+		CoalesceExpr *cexpr = (CoalesceExpr *) expr;
+		ListCell *lc;
+
+		expr = NULL;
+		foreach(lc, cexpr->args)
+		{
+			Expr   *expr = lfirst(lc);
+
+			/* Remove any relabel decorations. */
+			while (IsA(expr, RelabelType))
+				expr = (Expr *) (castNode(RelabelType, expr))->arg;
+
+			/* For PlaceHolderVars, refer to contained expression. */
+			if (IsA(expr, PlaceHolderVar))
+				expr = (castNode(PlaceHolderVar, expr))->phexpr;
+
+			if (!IsA(expr, CoalesceExpr))
+				coalesce_args = lappend(coalesce_args, expr);
+		}
+
+		Assert(expr == NULL || IsA(expr, CoalesceExpr));
+	}
+
+	return coalesce_args;
 }
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 975bf6765c..e8388eedb6 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -750,6 +750,135 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
  550 | 0550 |     |      |     1100 | 0
 (12 rows)
 
+-- FULL JOIN with COALESCE expression
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                                                      QUERY PLAN                                                                       
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Group
+   Group Key: (COALESCE(COALESCE(prt1_p1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1_p1.b, p2.b), p3.b))
+   ->  Merge Append
+         Sort Key: (COALESCE(COALESCE(prt1_p1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1_p1.b, p2.b), p3.b))
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_p1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1_p1.b, p2.b), p3.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_p1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1_p1.b, p2.b), p3.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_p1.a, p2.a) = p3.a) AND (COALESCE(prt1_p1.b, p2.b) = p3.b))
+                           Filter: ((COALESCE(COALESCE(prt1_p1.a, p2.a), p3.a) >= 490) AND (COALESCE(COALESCE(prt1_p1.a, p2.a), p3.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_p1.a = p2.a) AND (prt1_p1.b = p2.b))
+                                 ->  Seq Scan on prt1_p1
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p1 p2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p1 p3
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_p2.a, p2_1.a), p3_1.a)), (COALESCE(COALESCE(prt1_p2.b, p2_1.b), p3_1.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_p2.a, p2_1.a), p3_1.a)), (COALESCE(COALESCE(prt1_p2.b, p2_1.b), p3_1.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_p2.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_p2.b, p2_1.b) = p3_1.b))
+                           Filter: ((COALESCE(COALESCE(prt1_p2.a, p2_1.a), p3_1.a) >= 490) AND (COALESCE(COALESCE(prt1_p2.a, p2_1.a), p3_1.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_p2.a = p2_1.a) AND (prt1_p2.b = p2_1.b))
+                                 ->  Seq Scan on prt1_p2
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p2 p2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p2 p3_1
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_p3.a, p2_2.a), p3_2.a)), (COALESCE(COALESCE(prt1_p3.b, p2_2.b), p3_2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_p3.a, p2_2.a), p3_2.a)), (COALESCE(COALESCE(prt1_p3.b, p2_2.b), p3_2.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_p3.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_p3.b, p2_2.b) = p3_2.b))
+                           Filter: ((COALESCE(COALESCE(prt1_p3.a, p2_2.a), p3_2.a) >= 490) AND (COALESCE(COALESCE(prt1_p3.a, p2_2.a), p3_2.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_p3.a = p2_2.a) AND (prt1_p3.b = p2_2.b))
+                                 ->  Seq Scan on prt1_p3
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p3 p2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p3 p3_2
+(46 rows)
+
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 495 | 20
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 501 |  1
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 507 |  7
+ 508 |  8
+ 510 | 10
+(14 rows)
+
+-- Manually written COALESCE expression containing non-key expression
+EXPLAIN (COSTS OFF)
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
+ Group
+   Group Key: p1.a, p1.b
+   ->  Sort
+         Sort Key: p1.a, p1.b
+         ->  Nested Loop Left Join
+               Join Filter: (COALESCE(p2.b, p3.a) = p3.a)
+               ->  Append
+                     ->  Hash Right Join
+                           Hash Cond: ((p2.a = p1.a) AND (p2.b = p1.b))
+                           ->  Seq Scan on prt2_p2 p2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_p2 p1
+                                       Filter: ((a >= 490) AND (a <= 510))
+                     ->  Hash Right Join
+                           Hash Cond: ((p2_1.a = p1_1.a) AND (p2_1.b = p1_1.b))
+                           ->  Seq Scan on prt2_p3 p2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_p3 p1_1
+                                       Filter: ((a >= 490) AND (a <= 510))
+               ->  Materialize
+                     ->  Append
+                           ->  Seq Scan on prt2_p1 p3
+                           ->  Seq Scan on prt2_p2 p3_1
+                           ->  Seq Scan on prt2_p3 p3_2
+(24 rows)
+
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 508 |  8
+ 510 | 10
+(11 rows)
+
+RESET enable_partitionwise_aggregate;
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 92994b479b..9f68c5074c 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -145,6 +145,29 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 
+-- FULL JOIN with COALESCE expression
+
+SET enable_partitionwise_aggregate TO true;
+
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+-- Manually written COALESCE expression containing non-key expression
+EXPLAIN (COSTS OFF)
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+RESET enable_partitionwise_aggregate;
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
@@ -285,6 +308,7 @@ EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 
+
 -- test default partition behavior for range
 ALTER TABLE prt1 DETACH PARTITION prt1_p3;
 ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
-- 
2.11.0

#16

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Langote (#15)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

On Wed, Nov 6, 2019 at 2:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just to leave a breadcrumb in this thread --- the planner failure
induced by d25ea01275 has been fixed in 529ebb20a. The difficulty
with multiway full joins that Amit started this thread with remains
open, but I imagine the posted patches will need rebasing over
529ebb20a.

Here are the rebased patches.

The cfbot shows these patches as failing regression tests. I think
it is just cosmetic fallout from 6ef77cf46 having changed EXPLAIN's
choices of table alias names; but please look closer to confirm,
and post updated patches.

regards, tom lane

#17

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Tom Lane (#16)

3 attachment(s)

Re: d25ea01275 and partitionwise join

On Sat, Feb 29, 2020 at 8:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Wed, Nov 6, 2019 at 2:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Just to leave a breadcrumb in this thread --- the planner failure
induced by d25ea01275 has been fixed in 529ebb20a. The difficulty
with multiway full joins that Amit started this thread with remains
open, but I imagine the posted patches will need rebasing over
529ebb20a.

Here are the rebased patches.

The cfbot shows these patches as failing regression tests. I think
it is just cosmetic fallout from 6ef77cf46 having changed EXPLAIN's
choices of table alias names; but please look closer to confirm,
and post updated patches.

Thanks for notifying.

Checked and indeed fallout from 6ef77cf46 seems to be the reason a
test is failing.

Updated patches attached.

Thanks,
Amit

Attachments:

v5-0002-Move-some-code-from-joinrel.c-to-relnode.c.patchapplication/octet-stream; name=v5-0002-Move-some-code-from-joinrel.c-to-relnode.c.patchDownload

From 6524921316331f0ee506254c0e38e32950520884 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 6 Nov 2019 11:00:56 +0900
Subject: [PATCH v5 2/3] Move some code from joinrel.c to relnode.c

---
 src/backend/optimizer/path/joinrels.c | 177 -------------------------
 src/backend/optimizer/util/relnode.c  | 180 ++++++++++++++++++++++++++
 src/include/optimizer/paths.h         |   3 -
 3 files changed, 180 insertions(+), 180 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index b896e3e474..983ee8b139 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -46,8 +46,6 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
 static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
 												SpecialJoinInfo *parent_sjinfo,
 												Relids left_relids, Relids right_relids);
-static int	match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
-										 bool strict_op);
 
 
 /*
@@ -1573,178 +1571,3 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 
 	return sjinfo;
 }
-
-/*
- * have_partkey_equi_join
- *
- * Returns true if there exist equi-join conditions involving pairs
- * of matching partition keys of the relations being joined for all
- * partition keys.
- */
-bool
-have_partkey_equi_join(RelOptInfo *joinrel,
-					   RelOptInfo *rel1, RelOptInfo *rel2,
-					   JoinType jointype, List *restrictlist)
-{
-	PartitionScheme part_scheme = rel1->part_scheme;
-	ListCell   *lc;
-	int			cnt_pks;
-	bool		pk_has_clause[PARTITION_MAX_KEYS];
-	bool		strict_op;
-
-	/*
-	 * This function should be called when the joining relations have same
-	 * partitioning scheme.
-	 */
-	Assert(rel1->part_scheme == rel2->part_scheme);
-	Assert(part_scheme);
-
-	memset(pk_has_clause, 0, sizeof(pk_has_clause));
-	foreach(lc, restrictlist)
-	{
-		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
-		OpExpr	   *opexpr;
-		Expr	   *expr1;
-		Expr	   *expr2;
-		int			ipk1;
-		int			ipk2;
-
-		/* If processing an outer join, only use its own join clauses. */
-		if (IS_OUTER_JOIN(jointype) &&
-			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
-			continue;
-
-		/* Skip clauses which can not be used for a join. */
-		if (!rinfo->can_join)
-			continue;
-
-		/* Skip clauses which are not equality conditions. */
-		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
-			continue;
-
-		opexpr = castNode(OpExpr, rinfo->clause);
-
-		/*
-		 * The equi-join between partition keys is strict if equi-join between
-		 * at least one partition key is using a strict operator. See
-		 * explanation about outer join reordering identity 3 in
-		 * optimizer/README
-		 */
-		strict_op = op_strict(opexpr->opno);
-
-		/* Match the operands to the relation. */
-		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
-			bms_is_subset(rinfo->right_relids, rel2->relids))
-		{
-			expr1 = linitial(opexpr->args);
-			expr2 = lsecond(opexpr->args);
-		}
-		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
-				 bms_is_subset(rinfo->right_relids, rel1->relids))
-		{
-			expr1 = lsecond(opexpr->args);
-			expr2 = linitial(opexpr->args);
-		}
-		else
-			continue;
-
-		/*
-		 * Only clauses referencing the partition keys are useful for
-		 * partitionwise join.
-		 */
-		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
-		if (ipk1 < 0)
-			continue;
-		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
-		if (ipk2 < 0)
-			continue;
-
-		/*
-		 * If the clause refers to keys at different ordinal positions, it can
-		 * not be used for partitionwise join.
-		 */
-		if (ipk1 != ipk2)
-			continue;
-
-		/*
-		 * The clause allows partitionwise join if only it uses the same
-		 * operator family as that specified by the partition key.
-		 */
-		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
-		{
-			if (!op_in_opfamily(rinfo->hashjoinoperator,
-								part_scheme->partopfamily[ipk1]))
-				continue;
-		}
-		else if (!list_member_oid(rinfo->mergeopfamilies,
-								  part_scheme->partopfamily[ipk1]))
-			continue;
-
-		/* Mark the partition key as having an equi-join clause. */
-		pk_has_clause[ipk1] = true;
-	}
-
-	/* Check whether every partition key has an equi-join condition. */
-	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
-	{
-		if (!pk_has_clause[cnt_pks])
-			return false;
-	}
-
-	return true;
-}
-
-/*
- * match_expr_to_partition_keys
- *
- * Tries to match an expression to one of the nullable or non-nullable
- * partition keys and if a match is found, returns the matched	key's
- * ordinal position or -1 if the expression could not be matched to any
- * of the keys.
- *
- * strict_op must be true if the expression will be compared with the
- * partition key using a strict operator.
- */
-static int
-match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
-{
-	int			cnt;
-
-	/* This function should be called only for partitioned relations. */
-	Assert(rel->part_scheme);
-
-	/* Remove any relabel decorations. */
-	while (IsA(expr, RelabelType))
-		expr = (Expr *) (castNode(RelabelType, expr))->arg;
-
-	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
-	{
-		ListCell   *lc;
-
-		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-
-		if (!strict_op)
-			continue;
-
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
-		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
-		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
-		}
-	}
-
-	return -1;
-}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 81ec600ecb..8281bef317 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -67,6 +67,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *childrel,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
+static bool have_partkey_equi_join(RelOptInfo *joinrel,
+								   RelOptInfo *rel1, RelOptInfo *rel2,
+								   JoinType jointype, List *restrictlist);
+static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
+										bool strict_op);
 
 
 /*
@@ -1823,3 +1828,178 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * have_partkey_equi_join
+ *
+ * Returns true if there exist equi-join conditions involving pairs
+ * of matching partition keys of the relations being joined for all
+ * partition keys.
+ */
+bool
+have_partkey_equi_join(RelOptInfo *joinrel,
+					   RelOptInfo *rel1, RelOptInfo *rel2,
+					   JoinType jointype, List *restrictlist)
+{
+	PartitionScheme part_scheme = rel1->part_scheme;
+	ListCell   *lc;
+	int			cnt_pks;
+	bool		pk_has_clause[PARTITION_MAX_KEYS];
+	bool		strict_op;
+
+	/*
+	 * This function should be called when the joining relations have same
+	 * partitioning scheme.
+	 */
+	Assert(rel1->part_scheme == rel2->part_scheme);
+	Assert(part_scheme);
+
+	memset(pk_has_clause, 0, sizeof(pk_has_clause));
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		OpExpr	   *opexpr;
+		Expr	   *expr1;
+		Expr	   *expr2;
+		int			ipk1;
+		int			ipk2;
+
+		/* If processing an outer join, only use its own join clauses. */
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids))
+			continue;
+
+		/* Skip clauses which can not be used for a join. */
+		if (!rinfo->can_join)
+			continue;
+
+		/* Skip clauses which are not equality conditions. */
+		if (!rinfo->mergeopfamilies && !OidIsValid(rinfo->hashjoinoperator))
+			continue;
+
+		opexpr = castNode(OpExpr, rinfo->clause);
+
+		/*
+		 * The equi-join between partition keys is strict if equi-join between
+		 * at least one partition key is using a strict operator. See
+		 * explanation about outer join reordering identity 3 in
+		 * optimizer/README
+		 */
+		strict_op = op_strict(opexpr->opno);
+
+		/* Match the operands to the relation. */
+		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
+			bms_is_subset(rinfo->right_relids, rel2->relids))
+		{
+			expr1 = linitial(opexpr->args);
+			expr2 = lsecond(opexpr->args);
+		}
+		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
+				 bms_is_subset(rinfo->right_relids, rel1->relids))
+		{
+			expr1 = lsecond(opexpr->args);
+			expr2 = linitial(opexpr->args);
+		}
+		else
+			continue;
+
+		/*
+		 * Only clauses referencing the partition keys are useful for
+		 * partitionwise join.
+		 */
+		ipk1 = match_expr_to_partition_keys(expr1, rel1, strict_op);
+		if (ipk1 < 0)
+			continue;
+		ipk2 = match_expr_to_partition_keys(expr2, rel2, strict_op);
+		if (ipk2 < 0)
+			continue;
+
+		/*
+		 * If the clause refers to keys at different ordinal positions, it can
+		 * not be used for partitionwise join.
+		 */
+		if (ipk1 != ipk2)
+			continue;
+
+		/*
+		 * The clause allows partitionwise join if only it uses the same
+		 * operator family as that specified by the partition key.
+		 */
+		if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
+		{
+			if (!op_in_opfamily(rinfo->hashjoinoperator,
+								part_scheme->partopfamily[ipk1]))
+				continue;
+		}
+		else if (!list_member_oid(rinfo->mergeopfamilies,
+								  part_scheme->partopfamily[ipk1]))
+			continue;
+
+		/* Mark the partition key as having an equi-join clause. */
+		pk_has_clause[ipk1] = true;
+	}
+
+	/* Check whether every partition key has an equi-join condition. */
+	for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
+	{
+		if (!pk_has_clause[cnt_pks])
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * match_expr_to_partition_keys
+ *
+ * Tries to match an expression to one of the nullable or non-nullable
+ * partition keys and if a match is found, returns the matched	key's
+ * ordinal position or -1 if the expression could not be matched to any
+ * of the keys.
+ *
+ * strict_op must be true if the expression will be compared with the
+ * partition key using a strict operator.
+ */
+static int
+match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
+{
+	int			cnt;
+
+	/* This function should be called only for partitioned relations. */
+	Assert(rel->part_scheme);
+
+	/* Remove any relabel decorations. */
+	while (IsA(expr, RelabelType))
+		expr = (Expr *) (castNode(RelabelType, expr))->arg;
+
+	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
+	{
+		ListCell   *lc;
+
+		Assert(rel->partexprs);
+		foreach(lc, rel->partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+
+		if (!strict_op)
+			continue;
+
+		/*
+		 * If it's a strict equi-join a NULL partition key on one side will
+		 * not join a NULL partition key on the other side. So, rows with NULL
+		 * partition key from a partition on one side can not join with those
+		 * from a non-matching partition on the other side. So, search the
+		 * nullable partition keys as well.
+		 */
+		Assert(rel->nullable_partexprs);
+		foreach(lc, rel->nullable_partexprs[cnt])
+		{
+			if (equal(lfirst(lc), expr))
+				return cnt;
+		}
+	}
+
+	return -1;
+}
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..c689fe8e26 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -106,9 +106,6 @@ extern bool have_join_order_restriction(PlannerInfo *root,
 extern bool have_dangerous_phv(PlannerInfo *root,
 							   Relids outer_relids, Relids inner_params);
 extern void mark_dummy_rel(RelOptInfo *rel);
-extern bool have_partkey_equi_join(RelOptInfo *joinrel,
-								   RelOptInfo *rel1, RelOptInfo *rel2,
-								   JoinType jointype, List *restrictlist);
 
 /*
  * equivclass.c
-- 
2.20.1 (Apple Git-117)

v5-0001-Some-cosmetic-improvements-to-partitionwise-join-.patchapplication/octet-stream; name=v5-0001-Some-cosmetic-improvements-to-partitionwise-join-.patchDownload

From 7081c00c4e8ca0c386d9a594d130492876f18ad5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:22:31 +0900
Subject: [PATCH v5 1/3] Some cosmetic improvements to partitionwise join code

---
 src/backend/optimizer/path/joinrels.c | 18 ++++--
 src/backend/optimizer/util/plancat.c  | 20 +++---
 src/backend/optimizer/util/relnode.c  | 92 +++++++++++++++++----------
 src/include/nodes/pathnodes.h         | 36 ++++++++---
 4 files changed, 109 insertions(+), 57 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a21c295b99..b896e3e474 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1575,8 +1575,11 @@ build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
 }
 
 /*
- * Returns true if there exists an equi-join condition for each pair of
- * partition keys from given relations being joined.
+ * have_partkey_equi_join
+ *
+ * Returns true if there exist equi-join conditions involving pairs
+ * of matching partition keys of the relations being joined for all
+ * partition keys.
  */
 bool
 have_partkey_equi_join(RelOptInfo *joinrel,
@@ -1692,8 +1695,15 @@ have_partkey_equi_join(RelOptInfo *joinrel,
 }
 
 /*
- * Find the partition key from the given relation matching the given
- * expression. If found, return the index of the partition key, else return -1.
+ * match_expr_to_partition_keys
+ *
+ * Tries to match an expression to one of the nullable or non-nullable
+ * partition keys and if a match is found, returns the matched	key's
+ * ordinal position or -1 if the expression could not be matched to any
+ * of the keys.
+ *
+ * strict_op must be true if the expression will be compared with the
+ * partition key using a strict operator.
  */
 static int
 match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..980dc6499b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2247,9 +2247,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
 /*
  * set_baserel_partition_key_exprs
  *
- * Builds partition key expressions for the given base relation and sets them
- * in given RelOptInfo.  Any single column partition keys are converted to Var
- * nodes.  All Var nodes are restamped with the relid of given relation.
+ * Builds partition key expressions for the given base relation and sets
+ * rel->partexprs.
  */
 static void
 set_baserel_partition_key_exprs(Relation relation,
@@ -2297,17 +2296,20 @@ set_baserel_partition_key_exprs(Relation relation,
 			lc = lnext(partkey->partexprs, lc);
 		}
 
+		/* Base relations have a single expression per key. */
 		partexprs[cnt] = list_make1(partexpr);
 	}
 
+	/*
+	 * For base relations, we assume that the partition keys are non-nullable,
+	 * although they are nullable in principle; list and hash partitioned
+	 * tables may contain nulls in the partition key(s), for example.
+	 * Assuming non-nullability is okay for the considerations of partition
+	 * pruning, because pruning is never performed with non-strict operators.
+	 */
 	rel->partexprs = partexprs;
 
-	/*
-	 * A base relation can not have nullable partition key expressions. We
-	 * still allocate array of empty expressions lists to keep partition key
-	 * expression handling code simple. See build_joinrel_partition_info() and
-	 * match_expr_to_partition_keys().
-	 */
+	/* Assigning NIL for each key means there are no nullable keys. */
 	rel->nullable_partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 374f93890b..81ec600ecb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -29,6 +29,7 @@
 #include "optimizer/tlist.h"
 #include "partitioning/partbounds.h"
 #include "utils/hsearch.h"
+#include "utils/lsyscache.h"
 
 
 typedef struct JoinHashEntry
@@ -58,6 +59,9 @@ static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
 static void build_joinrel_partition_info(RelOptInfo *joinrel,
 										 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 										 List *restrictlist, JoinType jointype);
+static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype);
 static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *parentrel,
 									   RelOptInfo *childrel,
@@ -1607,18 +1611,18 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer)
 
 /*
  * build_joinrel_partition_info
- *		If the two relations have same partitioning scheme, their join may be
- *		partitioned and will follow the same partitioning scheme as the joining
- *		relations. Set the partition scheme and partition key expressions in
- *		the join relation.
+ *		Checks if the two relations being joined can use partitionwise join
+ *		and if yes, initialize partitioning information of the resulting
+ *		partitioned relation
+ *
+ * This will set part_scheme and partition key expressions (partexprs and
+ * nullable_partexprs) if required.
  */
 static void
 build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 							 RelOptInfo *inner_rel, List *restrictlist,
 							 JoinType jointype)
 {
-	int			partnatts;
-	int			cnt;
 	PartitionScheme part_scheme;
 
 	/* Nothing to do if partitionwise join technique is disabled. */
@@ -1685,11 +1689,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	 */
 	joinrel->part_scheme = part_scheme;
 	joinrel->boundinfo = outer_rel->boundinfo;
-	partnatts = joinrel->part_scheme->partnatts;
-	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
-	joinrel->nullable_partexprs =
-		(List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nparts = outer_rel->nparts;
+	set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, jointype);
 	joinrel->part_rels =
 		(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * joinrel->nparts);
 
@@ -1699,32 +1700,31 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	Assert(outer_rel->consider_partitionwise_join);
 	Assert(inner_rel->consider_partitionwise_join);
 	joinrel->consider_partitionwise_join = true;
+}
+
+/*
+ * set_joinrel_partition_key_exprs
+ *		Initialize partition key expressions
+ */
+static void
+set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
+								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+								JoinType jointype)
+{
+	int		partnatts;
+	int		cnt;
+
+	Assert(joinrel->part_scheme != NULL);
+
+	partnatts = joinrel->part_scheme->partnatts;
+	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
+	joinrel->nullable_partexprs =
+		(List **) palloc0(sizeof(List *) * partnatts);
 
 	/*
-	 * Construct partition keys for the join.
-	 *
-	 * An INNER join between two partitioned relations can be regarded as
-	 * partitioned by either key expression.  For example, A INNER JOIN B ON
-	 * A.a = B.b can be regarded as partitioned on A.a or on B.b; they are
-	 * equivalent.
-	 *
-	 * For a SEMI or ANTI join, the result can only be regarded as being
-	 * partitioned in the same manner as the outer side, since the inner
-	 * columns are not retained.
-	 *
-	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
-	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
-	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b and
-	 * thus partition keys of an OUTER join should include partition key
-	 * expressions from the OUTER side only.  However, because all
-	 * commonly-used comparison operators are strict, the presence of nulls on
-	 * the outer side doesn't cause any problem; they can't match anything at
-	 * future join levels anyway.  Therefore, we track two sets of
-	 * expressions: those that authentically partition the relation
-	 * (partexprs) and those that partition the relation with the exception
-	 * that extra nulls may be present (nullable_partexprs).  When the
-	 * comparison operator is strict, the latter is just as good as the
-	 * former.
+	 * Join type determines which partition keys are assumed by the resulting
+	 * join relation.  Note that these keys are to be considered when checking
+	 * if any further joins involving this joinrel may be partitioned.
 	 */
 	for (cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1738,18 +1738,36 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 
 		switch (jointype)
 		{
+			/*
+			 * Join relation resulting from an INNER join may be regarded as
+			 * partitioned by either of inner and outer relation keys.  For
+			 * example, A INNER JOIN B ON A.a = B.b can be regarded as
+			 * partitioned on either A.a or B.b.
+			 */
 			case JOIN_INNER:
 				partexpr = list_concat_copy(outer_expr, inner_expr);
 				nullable_partexpr = list_concat_copy(outer_null_expr,
 													 inner_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a SEMI or ANTI join may be
+			 * regarded as partitioned on the outer relation keys, since the
+			 * inner columns are omitted from the output.
+			 */
 			case JOIN_SEMI:
 			case JOIN_ANTI:
 				partexpr = list_copy(outer_expr);
 				nullable_partexpr = list_copy(outer_null_expr);
 				break;
 
+			/*
+			 * Join relation resulting from a LEFT OUTER JOIN likewise may be
+			 * regarded as partitioned on the (non-nullable) outer relation
+			 * keys.  The inner (nullable) relation keys are okay as partition
+			 * keys for further joins as long as they involve strict join
+			 * operators.
+			 */
 			case JOIN_LEFT:
 				partexpr = list_copy(outer_expr);
 				nullable_partexpr = list_concat_copy(inner_expr,
@@ -1758,6 +1776,12 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 												inner_null_expr);
 				break;
 
+			/*
+			 * For FULL OUTER JOINs, both relations are nullable, so the
+			 * resulting join relation may be regarded as partitioned on
+			 * either of inner and outer relation keys, but only for joins
+			 * that involve strict join operators.
+			 */
 			case JOIN_FULL:
 				nullable_partexpr = list_concat_copy(outer_expr,
 													 inner_expr);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..213bc41420 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -586,16 +586,32 @@ typedef struct PartitionSchemeData *PartitionScheme;
  *								 this relation that are partitioned tables
  *								 themselves, in hierarchical order
  *
- * Note: A base relation always has only one set of partition keys, but a join
- * relation may have as many sets of partition keys as the number of relations
- * being joined. partexprs and nullable_partexprs are arrays containing
- * part_scheme->partnatts elements each. Each of these elements is a list of
- * partition key expressions.  For a base relation each list in partexprs
- * contains only one expression and nullable_partexprs is not populated. For a
- * join relation, partexprs and nullable_partexprs contain partition key
- * expressions from non-nullable and nullable relations resp. Lists at any
- * given position in those arrays together contain as many elements as the
- * number of joining relations.
+ * Notes on partition key expressions (partexprs and nullable_partexprs):
+ *
+ * Partition key expressions will be used to spot references to the partition
+ * keys of the relation in the expressions of a given query so as to apply
+ * various partitioning-based optimizations to certain query constructs.  For
+ * example, pruning unnecessary partitions of a table using baserestrictinfo
+ * clauses that contain partition keys, converting a join between two
+ * partitioned relations into a series of joins between pairs of their
+ * constituent partitions if the joined rows follow the same partitioning
+ * as the relations being joined.
+ *
+ * The partexprs and nullable_partexprs arrays each contain
+ * part_scheme->partnatts elements.  Each of the elements is a list of
+ * partition key expressions.  For partitioned *base* relations, there is one
+ * expression in every list, whereas for partitioned *join* relations, there
+ * can be as many as the number of component relations.
+ *
+ * nullable_partexprs are populated only in partitioned *join* relationss,
+ * that is, if any of their component relations are nullable due to OUTER JOIN
+ * considerations.  It contains only the expressions of the nullable component
+ * relations, while those of the non-nullable relations are present in the
+ * partexprs.  For the considerations of partitionwise join, nullable partition
+ * keys can be considered to partition the underlying relation in the same
+ * manner as the non-nullable partition keys do, as long as the join operator
+ * is stable, because those null-valued keys can't be joined further, thus
+ * preserving the partitioning.
  *----------
  */
 typedef enum RelOptKind
-- 
2.20.1 (Apple Git-117)

v5-0003-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchapplication/octet-stream; name=v5-0003-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchDownload

From c3d0ce68d2c53d1da5120193e6715ed6f6eebf4d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:33:20 +0900
Subject: [PATCH v5 3/3] Fix partitionwise join to handle FULL JOINs correctly

---
 src/backend/optimizer/util/relnode.c         | 104 ++++++++++++---
 src/test/regress/expected/partition_join.out | 129 +++++++++++++++++++
 src/test/regress/sql/partition_join.sql      |  24 ++++
 3 files changed, 241 insertions(+), 16 deletions(-)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8281bef317..cdfb3412e1 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -72,6 +72,7 @@ static bool have_partkey_equi_join(RelOptInfo *joinrel,
 								   JoinType jointype, List *restrictlist);
 static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
 										bool strict_op);
+static List *extract_coalesce_args(Expr *expr);
 
 
 /*
@@ -1964,6 +1965,8 @@ static int
 match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 {
 	int			cnt;
+	int			matched = -1;
+	List	   *nullable_exprs;
 
 	/* This function should be called only for partitioned relations. */
 	Assert(rel->part_scheme);
@@ -1972,34 +1975,103 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op)
 	while (IsA(expr, RelabelType))
 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
 
+	/* For PlaceHolderVars, refer to contained expression. */
+	if (IsA(expr, PlaceHolderVar))
+		expr = (castNode(PlaceHolderVar, expr))->phexpr;
+
+	/*
+	 * Extract the arguments from possibly nested COALESCE expressions.  Each
+	 * of these arguments could be null when joining, so these expressions are
+	 * called as such and are to be matched only with the nullable partition
+	 * keys.
+	 */
+	if (IsA(expr, CoalesceExpr))
+		nullable_exprs = extract_coalesce_args(expr);
+	else
+		/*
+		 * expr may or may not be nullable but add to the list anyway to
+		 * simplify the coding below.
+		 */
+		nullable_exprs = list_make1(expr);
+
 	for (cnt = 0; cnt < rel->part_scheme->partnatts; cnt++)
 	{
-		ListCell   *lc;
-
 		Assert(rel->partexprs);
-		foreach(lc, rel->partexprs[cnt])
+
+		/* Is the expression one of the non-nullable partition keys? */
+		if (list_member(rel->partexprs[cnt], expr))
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 
+		/*
+		 * Nope, so check if it is one of the nullable keys.  Allowing
+		 * nullable keys won't work if the join operator is not strict,
+		 * because null partition keys may then join with rows from other
+		 * partitions.  XXX - would that ever be true if the operator is
+		 * already determined to be mergejoin- and hashjoin-able?
+		 */
 		if (!strict_op)
 			continue;
 
-		/*
-		 * If it's a strict equi-join a NULL partition key on one side will
-		 * not join a NULL partition key on the other side. So, rows with NULL
-		 * partition key from a partition on one side can not join with those
-		 * from a non-matching partition on the other side. So, search the
-		 * nullable partition keys as well.
-		 */
+		/* OK to match with nullable keys. */
 		Assert(rel->nullable_partexprs);
-		foreach(lc, rel->nullable_partexprs[cnt])
+
+		/* First rule out nullable_exprs containing non-key expressions. */
+		if (list_difference(nullable_exprs,
+							rel->nullable_partexprs[cnt]) != NIL)
+			continue;
+
+		if (list_intersection(rel->nullable_partexprs[cnt],
+							  nullable_exprs) != NIL)
 		{
-			if (equal(lfirst(lc), expr))
-				return cnt;
+			matched = cnt;
+			break;
 		}
 	}
 
-	return -1;
+	Assert(list_length(nullable_exprs) >= 1);
+	list_free(nullable_exprs);
+
+	return matched;
+}
+
+/*
+ * extract_coalesce_args
+ *		Extract all arguments from arbitrarily nested CoalesceExpr's
+ *
+ * Note: caller should free the List structure when done using it.
+ */
+static List *
+extract_coalesce_args(Expr *expr)
+{
+	List   *coalesce_args = NIL;
+
+	while (expr && IsA(expr, CoalesceExpr))
+	{
+		CoalesceExpr *cexpr = (CoalesceExpr *) expr;
+		ListCell *lc;
+
+		expr = NULL;
+		foreach(lc, cexpr->args)
+		{
+			Expr   *expr = lfirst(lc);
+
+			/* Remove any relabel decorations. */
+			while (IsA(expr, RelabelType))
+				expr = (Expr *) (castNode(RelabelType, expr))->arg;
+
+			/* For PlaceHolderVars, refer to contained expression. */
+			if (IsA(expr, PlaceHolderVar))
+				expr = (castNode(PlaceHolderVar, expr))->phexpr;
+
+			if (!IsA(expr, CoalesceExpr))
+				coalesce_args = lappend(coalesce_args, expr);
+		}
+
+		Assert(expr == NULL || IsA(expr, CoalesceExpr));
+	}
+
+	return coalesce_args;
 }
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47bde..a7bbedf97d 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -750,6 +750,135 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
  550 | 0550 |     |      |     1100 | 0
 (12 rows)
 
+-- FULL JOIN with COALESCE expression
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                                                     QUERY PLAN                                                                      
+-----------------------------------------------------------------------------------------------------------------------------------------------------
+ Group
+   Group Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+   ->  Merge Append
+         Sort Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1.a, p2.a) = p3.a) AND (COALESCE(prt1.b, p2.b) = p3.b))
+                           Filter: ((COALESCE(COALESCE(prt1.a, p2.a), p3.a) >= 490) AND (COALESCE(COALESCE(prt1.a, p2.a), p3.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1.a = p2.a) AND (prt1.b = p2.b))
+                                 ->  Seq Scan on prt1_p1 prt1
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p1 p2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p1 p3
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a)), (COALESCE(COALESCE(prt1_1.b, p2_1.b), p3_1.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a)), (COALESCE(COALESCE(prt1_1.b, p2_1.b), p3_1.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
+                           Filter: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) >= 490) AND (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                                 ->  Seq Scan on prt1_p2 prt1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p2 p2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p2 p3_1
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a)), (COALESCE(COALESCE(prt1_2.b, p2_2.b), p3_2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a)), (COALESCE(COALESCE(prt1_2.b, p2_2.b), p3_2.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
+                           Filter: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) >= 490) AND (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                                 ->  Seq Scan on prt1_p3 prt1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p3 p2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p3 p3_2
+(46 rows)
+
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 495 | 20
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 501 |  1
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 507 |  7
+ 508 |  8
+ 510 | 10
+(14 rows)
+
+-- Manually written COALESCE expression containing non-key expression
+EXPLAIN (COSTS OFF)
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
+ Group
+   Group Key: p1.a, p1.b
+   ->  Sort
+         Sort Key: p1.a, p1.b
+         ->  Nested Loop Left Join
+               Join Filter: (COALESCE(p2.b, p3.a) = p3.a)
+               ->  Append
+                     ->  Hash Right Join
+                           Hash Cond: ((p2_1.a = p1_1.a) AND (p2_1.b = p1_1.b))
+                           ->  Seq Scan on prt2_p2 p2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_p2 p1_1
+                                       Filter: ((a >= 490) AND (a <= 510))
+                     ->  Hash Right Join
+                           Hash Cond: ((p2_2.a = p1_2.a) AND (p2_2.b = p1_2.b))
+                           ->  Seq Scan on prt2_p3 p2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_p3 p1_2
+                                       Filter: ((a >= 490) AND (a <= 510))
+               ->  Materialize
+                     ->  Append
+                           ->  Seq Scan on prt2_p1 p3_1
+                           ->  Seq Scan on prt2_p2 p3_2
+                           ->  Seq Scan on prt2_p3 p3_3
+(24 rows)
+
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 508 |  8
+ 510 | 10
+(11 rows)
+
+RESET enable_partitionwise_aggregate;
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 575ba7b8d4..c7bf168bb4 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -145,6 +145,29 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 
+-- FULL JOIN with COALESCE expression
+
+SET enable_partitionwise_aggregate TO true;
+
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+-- Manually written COALESCE expression containing non-key expression
+EXPLAIN (COSTS OFF)
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON COALESCE(p2.b, p3.a) = p3.a
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+RESET enable_partitionwise_aggregate;
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
@@ -285,6 +308,7 @@ EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 
+
 -- test default partition behavior for range
 ALTER TABLE prt1 DETACH PARTITION prt1_p3;
 ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
-- 
2.20.1 (Apple Git-117)

#18

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Langote (#17)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

Updated patches attached.

I looked through these and committed 0001+0002, with some further
comment-polishing. However, I have no faith at all in 0003. It is
blithely digging through COALESCE expressions with no concern for
whether they came from full joins or not, or whether the other values
being coalesced to might completely change the semantics. Digging
through PlaceHolderVars scares me even more; what's that got to do
with the problem, anyway? So while this might fix the complained-of
issue of failing to use a partitionwise join, I think it wouldn't be
hard to create examples that it would incorrectly turn into
partitionwise joins.

I wonder whether it'd be feasible to fix the problem by going in the
other direction; that is, while constructing the nullable_partexprs
lists for a full join, add synthesized COALESCE() expressions for the
output columns (by wrapping COALESCE around copies of the input rels'
partition expressions), and then not need to do anything special in
match_expr_to_partition_keys. We'd still need to convince ourselves
that this did the right thing and not any of the wrong things, but
I think it might be easier to prove it that way.

regards, tom lane

#19

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Tom Lane (#18)

1 attachment(s)

Re: d25ea01275 and partitionwise join

On Sat, Apr 4, 2020 at 6:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Updated patches attached.

I looked through these and committed 0001+0002, with some further
comment-polishing. However, I have no faith at all in 0003.

Thanks for the review.

It is
blithely digging through COALESCE expressions with no concern for
whether they came from full joins or not, or whether the other values
being coalesced to might completely change the semantics. Digging
through PlaceHolderVars scares me even more; what's that got to do
with the problem, anyway? So while this might fix the complained-of
issue of failing to use a partitionwise join, I think it wouldn't be
hard to create examples that it would incorrectly turn into
partitionwise joins.

I wonder whether it'd be feasible to fix the problem by going in the
other direction; that is, while constructing the nullable_partexprs
lists for a full join, add synthesized COALESCE() expressions for the
output columns (by wrapping COALESCE around copies of the input rels'
partition expressions), and then not need to do anything special in
match_expr_to_partition_keys. We'd still need to convince ourselves
that this did the right thing and not any of the wrong things, but
I think it might be easier to prove it that way.

Okay, I tried that in the updated patch. I didn't try to come up with
examples that might break it though.

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v6-0001-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchapplication/octet-stream; name=v6-0001-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchDownload

From 4a3b4f080b113ed68c7272ba1ec776ec00516a61 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 18 Jul 2019 10:33:20 +0900
Subject: [PATCH v6] Fix partitionwise join to handle FULL JOINs correctly

---
 src/backend/nodes/makefuncs.c                |  17 ++++
 src/backend/optimizer/util/relnode.c         |  43 +++++++--
 src/backend/parser/parse_clause.c            |  20 ++---
 src/include/nodes/makefuncs.h                |   2 +
 src/test/regress/expected/partition_join.out | 129 +++++++++++++++++++++++++++
 src/test/regress/sql/partition_join.sql      |  23 +++++
 6 files changed, 214 insertions(+), 20 deletions(-)

diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index b442b5a..3e58fa9 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -812,3 +812,20 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+/*
+ * makeCoalesceExpr
+ */
+CoalesceExpr *
+makeCoalesceExpr(Oid typid, Oid collid, Node *l_node, Node *r_node,
+				 int location)
+{
+	CoalesceExpr *c = makeNode(CoalesceExpr);
+
+	c->coalescetype = typid;
+	c->coalescecollid = collid;
+	c->args = list_make2(l_node, r_node);
+	c->location = location;
+
+	return c;
+}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index af1fb48..a211028 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
 #include <limits.h>
 
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
@@ -1890,7 +1891,8 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 								JoinType jointype)
 {
-	int			partnatts = joinrel->part_scheme->partnatts;
+	PartitionScheme part_scheme = joinrel->part_scheme;
+	int			partnatts = part_scheme->partnatts;
 
 	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nullable_partexprs =
@@ -1963,12 +1965,39 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 				 * that involve strict join operators.
 				 */
 			case JOIN_FULL:
-				nullable_partexpr = list_concat_copy(outer_expr,
-													 inner_expr);
-				nullable_partexpr = list_concat(nullable_partexpr,
-												outer_null_expr);
-				nullable_partexpr = list_concat(nullable_partexpr,
-												inner_null_expr);
+				{
+					Oid		coltype = part_scheme->partopcintype[cnt],
+							colcoll = part_scheme->partcollation[cnt];
+					Node   *larg;
+					ListCell *rarg;
+					CoalesceExpr *coalesce = NULL;
+
+					nullable_partexpr = list_concat_copy(outer_expr,
+														 inner_expr);
+					nullable_partexpr = list_concat(nullable_partexpr,
+													outer_null_expr);
+					nullable_partexpr = list_concat(nullable_partexpr,
+													inner_null_expr);
+
+					/*
+					 * Add a CoalesceExpr wrapping all of the collected
+					 * nullable expressions, because clauses for a full join
+					 * written with USING () would not be matched to the
+					 * joinrel's partition keys without this.
+					 */
+					larg = (Node *) linitial(nullable_partexpr);
+					rarg = list_second_cell(nullable_partexpr);
+					while (rarg)
+					{
+						coalesce = makeCoalesceExpr(coltype, colcoll, larg,
+													(Node *) lfirst(rarg), -1);
+						larg = (Node *) coalesce;
+						rarg = lnext(nullable_partexpr, rarg);
+					}
+					Assert(coalesce != NULL);
+					nullable_partexpr = list_concat(nullable_partexpr,
+													list_make1(coalesce));
+				}
 				break;
 
 			default:
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 36a3eff..bb9890d 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -1643,20 +1643,14 @@ buildMergedJoinVar(ParseState *pstate, JoinType jointype,
 			res_node = r_node;
 			break;
 		case JOIN_FULL:
-			{
-				/*
-				 * Here we must build a COALESCE expression to ensure that the
-				 * join output is non-null if either input is.
-				 */
-				CoalesceExpr *c = makeNode(CoalesceExpr);
+			/*
+			 * Here we must build a COALESCE expression to ensure that the
+			 * join output is non-null if either input is.
+			 */
+			res_node = (Node *) makeCoalesceExpr(outcoltype, InvalidOid,
+												 l_node, r_node, -1);
+			break;
 
-				c->coalescetype = outcoltype;
-				/* coalescecollid will get set below */
-				c->args = list_make2(l_node, r_node);
-				c->location = -1;
-				res_node = (Node *) c;
-				break;
-			}
 		default:
 			elog(ERROR, "unrecognized join type: %d", (int) jointype);
 			res_node = NULL;	/* keep compiler quiet */
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aed..bb26ef5 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -104,5 +104,7 @@ extern DefElem *makeDefElemExtended(char *nameSpace, char *name, Node *arg,
 extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int location);
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
+extern CoalesceExpr *makeCoalesceExpr(Oid typid, Oid collid, Node *l_node, Node *r_node,
+				 int location);
 
 #endif							/* MAKEFUNC_H */
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47..e928767 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -750,6 +750,135 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
  550 | 0550 |     |      |     1100 | 0
 (12 rows)
 
+-- 3-way FULL JOIN
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                                                     QUERY PLAN                                                                      
+-----------------------------------------------------------------------------------------------------------------------------------------------------
+ Group
+   Group Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+   ->  Merge Append
+         Sort Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1.a, p2.a), p3.a)), (COALESCE(COALESCE(prt1.b, p2.b), p3.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1.a, p2.a) = p3.a) AND (COALESCE(prt1.b, p2.b) = p3.b))
+                           Filter: ((COALESCE(COALESCE(prt1.a, p2.a), p3.a) >= 490) AND (COALESCE(COALESCE(prt1.a, p2.a), p3.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1.a = p2.a) AND (prt1.b = p2.b))
+                                 ->  Seq Scan on prt1_p1 prt1
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p1 p2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p1 p3
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a)), (COALESCE(COALESCE(prt1_1.b, p2_1.b), p3_1.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a)), (COALESCE(COALESCE(prt1_1.b, p2_1.b), p3_1.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
+                           Filter: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) >= 490) AND (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                                 ->  Seq Scan on prt1_p2 prt1_1
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p2 p2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p2 p3_1
+         ->  Group
+               Group Key: (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a)), (COALESCE(COALESCE(prt1_2.b, p2_2.b), p3_2.b))
+               ->  Sort
+                     Sort Key: (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a)), (COALESCE(COALESCE(prt1_2.b, p2_2.b), p3_2.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
+                           Filter: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) >= 490) AND (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) <= 510))
+                           ->  Hash Full Join
+                                 Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                                 ->  Seq Scan on prt1_p3 prt1_2
+                                 ->  Hash
+                                       ->  Seq Scan on prt2_p3 p2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p3 p3_2
+(46 rows)
+
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 495 | 20
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 501 |  1
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 507 |  7
+ 508 |  8
+ 510 | 10
+(14 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON (p2.a = p3.a AND p2.b = p3.b)
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
+ Group
+   Group Key: p1.a, p1.b
+   ->  Sort
+         Sort Key: p1.a, p1.b
+         ->  Append
+               ->  Nested Loop Left Join
+                     ->  Hash Right Join
+                           Hash Cond: ((p2_1.a = p1_1.a) AND (p2_1.b = p1_1.b))
+                           ->  Seq Scan on prt2_p2 p2_1
+                           ->  Hash
+                                 ->  Seq Scan on prt1_p2 p1_1
+                                       Filter: ((a >= 490) AND (a <= 510))
+                     ->  Index Scan using iprt2_p2_b on prt2_p2 p3_1
+                           Index Cond: (a = p2_1.a)
+                           Filter: (p2_1.b = b)
+               ->  Nested Loop Left Join
+                     ->  Hash Right Join
+                           Hash Cond: ((p2_2.a = p1_2.a) AND (p2_2.b = p1_2.b))
+                           ->  Seq Scan on prt2_p3 p2_2
+                           ->  Hash
+                                 ->  Seq Scan on prt1_p3 p1_2
+                                       Filter: ((a >= 490) AND (a <= 510))
+                     ->  Index Scan using iprt2_p3_b on prt2_p3 p3_2
+                           Index Cond: (a = p2_2.a)
+                           Filter: (p2_2.b = b)
+(25 rows)
+
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON (p2.a = p3.a AND p2.b = p3.b)
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+  a  | b  
+-----+----
+ 490 | 15
+ 492 | 17
+ 494 | 19
+ 496 | 21
+ 498 | 23
+ 500 |  0
+ 502 |  2
+ 504 |  4
+ 506 |  6
+ 508 |  8
+ 510 | 10
+(11 rows)
+
+RESET enable_partitionwise_aggregate;
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 575ba7b..d28abf7 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -145,6 +145,28 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 
+-- 3-way FULL JOIN
+
+SET enable_partitionwise_aggregate TO true;
+
+EXPLAIN (COSTS OFF)
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+EXPLAIN (COSTS OFF)
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON (p2.a = p3.a AND p2.b = p3.b)
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+SELECT p1.a, p1.b FROM prt1 p1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) ON (p2.a = p3.a AND p2.b = p3.b)
+  WHERE p1.a BETWEEN 490 AND 510
+  GROUP BY 1, 2 ORDER BY 1, 2;
+
+RESET enable_partitionwise_aggregate;
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
@@ -285,6 +307,7 @@ EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
 
+
 -- test default partition behavior for range
 ALTER TABLE prt1 DETACH PARTITION prt1_p3;
 ALTER TABLE prt1 ATTACH PARTITION prt1_p3 DEFAULT;
-- 
1.8.3.1

#20

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Langote (#19)

1 attachment(s)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

Okay, I tried that in the updated patch. I didn't try to come up with
examples that might break it though.

I looked through this.

* Wasn't excited about inventing makeCoalesceExpr(); the fact that it only
had two potential call sites seemed to make it not worth the trouble.
Plus, as defined, it could not handle the general case of COALESCE, which
can have N arguments; so that seemed like a recipe for confusion.

* I think your logic for building the coalesce combinations was just
wrong. We need combinations of left-hand inputs with right-hand inputs,
not left-hand with left-hand or right-hand with right-hand. Also the
nesting should already have happened in the inputs, we don't need to
try to generate it here. The looping code was pretty messy as well.

* I don't think using partopcintype is necessarily right; that could be
a polymorphic type, for instance. Safer to copy the type of the input
expressions. Likely we could have got away with using partcollation,
but for consistency I copied that as well.

* You really need to update the data structure definitional comments
when you make a change like this.

* I did not like putting a test case that requires
enable_partitionwise_aggregate in the partition_join test; that seems
misplaced. But it's not necessary to the test, is it?

* I did not follow the point of your second test case. The WHERE
constraint on p1.a allows the planner to strength-reduce the joins,
which is why there's no full join in that explain result, but then
we aren't going to get to this code at all.

I repaired the above in the attached.

I'm actually sort of pleasantly surprised that this worked; I was
not sure that building COALESCEs like this would provide the result
we needed. But it seems okay -- it does fix the behavior in this
3-way test case, as well as the 4-way join you showed at the top
of the thread. It's fairly dependent on the fact that the planner
won't rearrange full joins; otherwise, the COALESCE structures we
build here might not match those made at parse time. But that's
not likely to change anytime soon; and this is hardly the only
place that would break, so I'm not sweating about it. (I have
some vague ideas about getting rid of the COALESCEs as part of
the Var redefinition I've been muttering about, anyway; there
might be a cleaner fix for this if that happens.)

Anyway, I think this is probably OK for now. Given that the
nullable_partexprs lists are only used in one place, it's pretty
hard to see how it would break anything.

regards, tom lane

Attachments:

v7-0001-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchtext/x-diff; charset=us-ascii; name*0=v7-0001-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.p; name*1=atchDownload

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index af1fb48..e1cc11c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
 #include <limits.h>
 
 #include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
@@ -1890,7 +1891,8 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 								JoinType jointype)
 {
-	int			partnatts = joinrel->part_scheme->partnatts;
+	PartitionScheme part_scheme = joinrel->part_scheme;
+	int			partnatts = part_scheme->partnatts;
 
 	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nullable_partexprs =
@@ -1899,7 +1901,8 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 	/*
 	 * The joinrel's partition expressions are the same as those of the input
 	 * rels, but we must properly classify them as nullable or not in the
-	 * joinrel's output.
+	 * joinrel's output.  (Also, we add some more partition expressions if
+	 * it's a FULL JOIN.)
 	 */
 	for (int cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1910,6 +1913,7 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 		const List *inner_null_expr = inner_rel->nullable_partexprs[cnt];
 		List	   *partexpr = NIL;
 		List	   *nullable_partexpr = NIL;
+		ListCell   *lc;
 
 		switch (jointype)
 		{
@@ -1969,6 +1973,31 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 												outer_null_expr);
 				nullable_partexpr = list_concat(nullable_partexpr,
 												inner_null_expr);
+
+				/*
+				 * Also add CoalesceExprs corresponding to each possible
+				 * full-join output variable (that is, left side coalesced to
+				 * right side), so that we can match equijoin expressions
+				 * using those variables.  We assume no type coercions are
+				 * needed to make the join outputs.
+				 */
+				foreach(lc, list_concat_copy(outer_expr, outer_null_expr))
+				{
+					Node	   *larg = (Node *) lfirst(lc);
+					ListCell   *lc2;
+
+					foreach(lc2, list_concat_copy(inner_expr, inner_null_expr))
+					{
+						Node	   *rarg = (Node *) lfirst(lc2);
+						CoalesceExpr *c = makeNode(CoalesceExpr);
+
+						c->coalescetype = exprType(larg);
+						c->coalescecollid = exprCollation(larg);
+						c->args = list_make2(larg, rarg);
+						c->location = -1;
+						nullable_partexpr = lappend(nullable_partexpr, c);
+					}
+				}
 				break;
 
 			default:
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 469c686..39c7b2f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -613,6 +613,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * that expression goes in the partexprs[i] list if the base relation
  * is not nullable by this join or any lower outer join, or in the
  * nullable_partexprs[i] list if the base relation is nullable.
+ * Furthermore, FULL JOINs add extra nullable_partexprs expressions
+ * corresponding to COALESCE expressions of the left and right join columns,
+ * to simplify matching join clauses to those lists.
  *----------
  */
 typedef enum RelOptKind
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47..cd60b6a 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -750,6 +750,55 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
  550 | 0550 |     |      |     1100 | 0
 (12 rows)
 
+--
+-- 3-way full join
+--
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+                                                               QUERY PLAN                                                                
+-----------------------------------------------------------------------------------------------------------------------------------------
+ Aggregate
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
+               Filter: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) >= 490) AND (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                     ->  Seq Scan on prt1_p1 prt1_1
+                     ->  Hash
+                           ->  Seq Scan on prt2_p1 p2_1
+               ->  Hash
+                     ->  Seq Scan on prt2_p1 p3_1
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
+               Filter: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) >= 490) AND (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                     ->  Seq Scan on prt1_p2 prt1_2
+                     ->  Hash
+                           ->  Seq Scan on prt2_p2 p2_2
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 p3_2
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(prt1_3.a, p2_3.a) = p3_3.a) AND (COALESCE(prt1_3.b, p2_3.b) = p3_3.b))
+               Filter: ((COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) >= 490) AND (COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((prt1_3.a = p2_3.a) AND (prt1_3.b = p2_3.b))
+                     ->  Seq Scan on prt1_p3 prt1_3
+                     ->  Hash
+                           ->  Seq Scan on prt2_p3 p2_3
+               ->  Hash
+                     ->  Seq Scan on prt2_p3 p3_3
+(32 rows)
+
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+ count 
+-------
+    14
+(1 row)
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 575ba7b..6184bbd 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -145,6 +145,15 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 
+--
+-- 3-way full join
+--
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)

#21

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Tom Lane (#20)

Re: d25ea01275 and partitionwise join

On Mon, Apr 6, 2020 at 7:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Okay, I tried that in the updated patch. I didn't try to come up with
examples that might break it though.

I looked through this.

Thank you.

* I think your logic for building the coalesce combinations was just
wrong. We need combinations of left-hand inputs with right-hand inputs,
not left-hand with left-hand or right-hand with right-hand. Also the
nesting should already have happened in the inputs, we don't need to
try to generate it here. The looping code was pretty messy as well.

It didn't occur to me that that many Coalesce combinations would be
necessary given the component rel combinations possible.

* I don't think using partopcintype is necessarily right; that could be
a polymorphic type, for instance. Safer to copy the type of the input
expressions. Likely we could have got away with using partcollation,
but for consistency I copied that as well.

Ah, seeing set_baserel_partition_key_exprs(), I suppose they will come
from parttypid and parttypcoll of the base partitioned tables, which
seems fine.

* You really need to update the data structure definitional comments
when you make a change like this.

Sorry, I should have.

* I did not like putting a test case that requires
enable_partitionwise_aggregate in the partition_join test; that seems
misplaced. But it's not necessary to the test, is it?

Earlier in the discussion (which turned into a separate discussion),
there were test cases where partition-level grouping would fail with
errors in setrefs.c, but I think that was fixed last year by
529ebb20aaa5. Agree that it has nothing to do with the problem being
solved here.

* I did not follow the point of your second test case. The WHERE
constraint on p1.a allows the planner to strength-reduce the joins,
which is why there's no full join in that explain result, but then
we aren't going to get to this code at all.

Oops, I thought I copy-pasted 4-way full join test not this one, but
evidently didn't.

I repaired the above in the attached.

I'm actually sort of pleasantly surprised that this worked; I was
not sure that building COALESCEs like this would provide the result
we needed. But it seems okay -- it does fix the behavior in this
3-way test case, as well as the 4-way join you showed at the top
of the thread. It's fairly dependent on the fact that the planner
won't rearrange full joins; otherwise, the COALESCE structures we
build here might not match those made at parse time. But that's
not likely to change anytime soon; and this is hardly the only
place that would break, so I'm not sweating about it. (I have
some vague ideas about getting rid of the COALESCEs as part of
the Var redefinition I've been muttering about, anyway; there
might be a cleaner fix for this if that happens.)

Anyway, I think this is probably OK for now. Given that the
nullable_partexprs lists are only used in one place, it's pretty
hard to see how it would break anything.

Makes sense.

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#22

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Langote (#21)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

On Mon, Apr 6, 2020 at 7:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

* I think your logic for building the coalesce combinations was just
wrong. We need combinations of left-hand inputs with right-hand inputs,
not left-hand with left-hand or right-hand with right-hand. Also the
nesting should already have happened in the inputs, we don't need to
try to generate it here. The looping code was pretty messy as well.

It didn't occur to me that that many Coalesce combinations would be
necessary given the component rel combinations possible.

Well, we don't of course: we only need the one pair that corresponds to
the COALESCE structures the parser would have generated. But we aren't
sure which one that is. I thought about looking through the full join
RTE's joinaliasvars list for COALESCE items instead of doing it like this,
but there is a problem: I don't believe that that data structure gets
maintained after flatten_join_alias_vars(). So it might contain
out-of-date expressions that don't match what we need them to match.

Possibly there will be a cleaner answer here if I succeed in redesigning
the Var data structure to account for outer joins better.

* I did not follow the point of your second test case. The WHERE
constraint on p1.a allows the planner to strength-reduce the joins,
which is why there's no full join in that explain result, but then
we aren't going to get to this code at all.

Oops, I thought I copy-pasted 4-way full join test not this one, but
evidently didn't.

Have you got such a query at hand? I wondered whether we shouldn't
use a 4-way rather than 3-way test case; it'd offer more assurance
that nesting of these things works.

regards, tom lane

#23

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Tom Lane (#22)

1 attachment(s)

Re: d25ea01275 and partitionwise join

On Mon, Apr 6, 2020 at 11:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Oops, I thought I copy-pasted 4-way full join test not this one, but
evidently didn't.

Have you got such a query at hand? I wondered whether we shouldn't
use a 4-way rather than 3-way test case; it'd offer more assurance
that nesting of these things works.

Hmm, I just did:

-SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL
JOIN prt2 p3(b,a,c) USING (a, b)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL
JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a,
b)

which does succeed in using partitionwise join. Please see attached
delta that applies on your v7 if that is what you'd rather have.

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v7-test-delta.patchapplication/octet-stream; name=v7-test-delta.patchDownload

diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index cd60b6a..9618d7f 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -751,48 +751,60 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 (12 rows)
 
 --
--- 3-way full join
+-- 4-way full join
 --
 EXPLAIN (COSTS OFF)
-SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
   WHERE a BETWEEN 490 AND 510;
-                                                               QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                 QUERY PLAN                                                                                  
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Append
          ->  Hash Full Join
-               Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
-               Filter: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) >= 490) AND (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) <= 510))
+               Hash Cond: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) = p4_1.a) AND (COALESCE(COALESCE(prt1_1.b, p2_1.b), p3_1.b) = p4_1.b))
+               Filter: ((COALESCE(COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a), p4_1.a) >= 490) AND (COALESCE(COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a), p4_1.a) <= 510))
                ->  Hash Full Join
-                     Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
-                     ->  Seq Scan on prt1_p1 prt1_1
+                     Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                           ->  Seq Scan on prt1_p1 prt1_1
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p1 p2_1
                      ->  Hash
-                           ->  Seq Scan on prt2_p1 p2_1
+                           ->  Seq Scan on prt2_p1 p3_1
                ->  Hash
-                     ->  Seq Scan on prt2_p1 p3_1
+                     ->  Seq Scan on prt1_p1 p4_1
          ->  Hash Full Join
-               Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
-               Filter: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) >= 490) AND (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) <= 510))
+               Hash Cond: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) = p4_2.a) AND (COALESCE(COALESCE(prt1_2.b, p2_2.b), p3_2.b) = p4_2.b))
+               Filter: ((COALESCE(COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a), p4_2.a) >= 490) AND (COALESCE(COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a), p4_2.a) <= 510))
                ->  Hash Full Join
-                     Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
-                     ->  Seq Scan on prt1_p2 prt1_2
+                     Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                           ->  Seq Scan on prt1_p2 prt1_2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p2 p2_2
                      ->  Hash
-                           ->  Seq Scan on prt2_p2 p2_2
+                           ->  Seq Scan on prt2_p2 p3_2
                ->  Hash
-                     ->  Seq Scan on prt2_p2 p3_2
+                     ->  Seq Scan on prt1_p2 p4_2
          ->  Hash Full Join
-               Hash Cond: ((COALESCE(prt1_3.a, p2_3.a) = p3_3.a) AND (COALESCE(prt1_3.b, p2_3.b) = p3_3.b))
-               Filter: ((COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) >= 490) AND (COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) <= 510))
+               Hash Cond: ((COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) = p4_3.a) AND (COALESCE(COALESCE(prt1_3.b, p2_3.b), p3_3.b) = p4_3.b))
+               Filter: ((COALESCE(COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a), p4_3.a) >= 490) AND (COALESCE(COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a), p4_3.a) <= 510))
                ->  Hash Full Join
-                     Hash Cond: ((prt1_3.a = p2_3.a) AND (prt1_3.b = p2_3.b))
-                     ->  Seq Scan on prt1_p3 prt1_3
+                     Hash Cond: ((COALESCE(prt1_3.a, p2_3.a) = p3_3.a) AND (COALESCE(prt1_3.b, p2_3.b) = p3_3.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((prt1_3.a = p2_3.a) AND (prt1_3.b = p2_3.b))
+                           ->  Seq Scan on prt1_p3 prt1_3
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p3 p2_3
                      ->  Hash
-                           ->  Seq Scan on prt2_p3 p2_3
+                           ->  Seq Scan on prt2_p3 p3_3
                ->  Hash
-                     ->  Seq Scan on prt2_p3 p3_3
-(32 rows)
+                     ->  Seq Scan on prt1_p3 p4_3
+(44 rows)
 
-SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
   WHERE a BETWEEN 490 AND 510;
  count 
 -------
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 6184bbd..cb8cb8d 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -146,12 +146,12 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 
 --
--- 3-way full join
+-- 4-way full join
 --
 EXPLAIN (COSTS OFF)
-SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
   WHERE a BETWEEN 490 AND 510;
-SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
   WHERE a BETWEEN 490 AND 510;
 
 -- Cases with non-nullable expressions in subquery results;

#24

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Langote (#23)

1 attachment(s)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

which does succeed in using partitionwise join. Please see attached
delta that applies on your v7 if that is what you'd rather have.

I figured these queries were cheap enough that we could afford to run
both. With that and some revision of the comments (per attached),
I was feeling like we were ready to go. However, re-reading the thread,
one of Richard's comments struck me as still relevant. If you try, say,

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

set enable_partitionwise_join = 1;

explain
select * from (p as t1 full join p as t2 on t1.k = t2.k) as t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.k1, t12.k2) = t3.k;

this patch will give you a partitioned join, with a different plan
than you get without enable_partitionwise_join. This is scary,
because it's not immediately obvious that the transformation is
correct.

I *think* that it might be all right, because although what we
are matching to is a user-written COALESCE() not an actual
FULL JOIN USING column, it has to behave in somewhat the same
way. In particular, by construction it must be a coalesce of
some representation of the matching partition columns of the
full join's inputs. So, even though it might go to null in
different cases than an actual USING variable would do, it
does not break the ability to partition the join.

However, I have not spent a whole lot of time thinking about
partitionwise joins, so rather than go ahead and commit I am
going to toss that point back out for community consideration.
At the very least, what I'd written in the comment needs a
lot more defense than it has now.

Thoughts?

regards, tom lane

Attachments:

v8-0001-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.patchtext/x-diff; charset=us-ascii; name*0=v8-0001-Fix-partitionwise-join-to-handle-FULL-JOINs-corre.p; name*1=atchDownload

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index af1fb48..d190b4b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
 #include <limits.h>
 
 #include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
@@ -1890,7 +1891,8 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 								RelOptInfo *outer_rel, RelOptInfo *inner_rel,
 								JoinType jointype)
 {
-	int			partnatts = joinrel->part_scheme->partnatts;
+	PartitionScheme part_scheme = joinrel->part_scheme;
+	int			partnatts = part_scheme->partnatts;
 
 	joinrel->partexprs = (List **) palloc0(sizeof(List *) * partnatts);
 	joinrel->nullable_partexprs =
@@ -1899,7 +1901,8 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 	/*
 	 * The joinrel's partition expressions are the same as those of the input
 	 * rels, but we must properly classify them as nullable or not in the
-	 * joinrel's output.
+	 * joinrel's output.  (Also, we add some more partition expressions if
+	 * it's a FULL JOIN.)
 	 */
 	for (int cnt = 0; cnt < partnatts; cnt++)
 	{
@@ -1910,6 +1913,7 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 		const List *inner_null_expr = inner_rel->nullable_partexprs[cnt];
 		List	   *partexpr = NIL;
 		List	   *nullable_partexpr = NIL;
+		ListCell   *lc;
 
 		switch (jointype)
 		{
@@ -1969,6 +1973,38 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel,
 												outer_null_expr);
 				nullable_partexpr = list_concat(nullable_partexpr,
 												inner_null_expr);
+
+				/*
+				 * Also add CoalesceExprs corresponding to each possible
+				 * full-join output variable (that is, left side coalesced to
+				 * right side), so that we can match equijoin expressions
+				 * using those variables.  We really only need these for
+				 * columns merged by JOIN USING, and only with the pairs of
+				 * input items that correspond to the data structures that
+				 * parse analysis would build for such variables.  But it's
+				 * hard to tell which those are, so just make all the pairs.
+				 * Extra items in the nullable_partexprs list won't cause big
+				 * problems.  We assume no type coercions are needed to make
+				 * the coalesce expressions, since columns of different types
+				 * won't have gotten classified as the same PartitionScheme.
+				 */
+				foreach(lc, list_concat_copy(outer_expr, outer_null_expr))
+				{
+					Node	   *larg = (Node *) lfirst(lc);
+					ListCell   *lc2;
+
+					foreach(lc2, list_concat_copy(inner_expr, inner_null_expr))
+					{
+						Node	   *rarg = (Node *) lfirst(lc2);
+						CoalesceExpr *c = makeNode(CoalesceExpr);
+
+						c->coalescetype = exprType(larg);
+						c->coalescecollid = exprCollation(larg);
+						c->args = list_make2(larg, rarg);
+						c->location = -1;
+						nullable_partexpr = lappend(nullable_partexpr, c);
+					}
+				}
 				break;
 
 			default:
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 469c686..39c7b2f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -613,6 +613,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
  * that expression goes in the partexprs[i] list if the base relation
  * is not nullable by this join or any lower outer join, or in the
  * nullable_partexprs[i] list if the base relation is nullable.
+ * Furthermore, FULL JOINs add extra nullable_partexprs expressions
+ * corresponding to COALESCE expressions of the left and right join columns,
+ * to simplify matching join clauses to those lists.
  *----------
  */
 typedef enum RelOptKind
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47..a35e8e3 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -750,6 +750,116 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
  550 | 0550 |     |      |     1100 | 0
 (12 rows)
 
+--
+-- 3-way full join
+--
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+                                                               QUERY PLAN                                                                
+-----------------------------------------------------------------------------------------------------------------------------------------
+ Aggregate
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
+               Filter: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) >= 490) AND (COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                     ->  Seq Scan on prt1_p1 prt1_1
+                     ->  Hash
+                           ->  Seq Scan on prt2_p1 p2_1
+               ->  Hash
+                     ->  Seq Scan on prt2_p1 p3_1
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
+               Filter: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) >= 490) AND (COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                     ->  Seq Scan on prt1_p2 prt1_2
+                     ->  Hash
+                           ->  Seq Scan on prt2_p2 p2_2
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 p3_2
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(prt1_3.a, p2_3.a) = p3_3.a) AND (COALESCE(prt1_3.b, p2_3.b) = p3_3.b))
+               Filter: ((COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) >= 490) AND (COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((prt1_3.a = p2_3.a) AND (prt1_3.b = p2_3.b))
+                     ->  Seq Scan on prt1_p3 prt1_3
+                     ->  Hash
+                           ->  Seq Scan on prt2_p3 p2_3
+               ->  Hash
+                     ->  Seq Scan on prt2_p3 p3_3
+(32 rows)
+
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+ count 
+-------
+    14
+(1 row)
+
+--
+-- 4-way full join
+--
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+                                                                                 QUERY PLAN                                                                                  
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Aggregate
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a) = p4_1.a) AND (COALESCE(COALESCE(prt1_1.b, p2_1.b), p3_1.b) = p4_1.b))
+               Filter: ((COALESCE(COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a), p4_1.a) >= 490) AND (COALESCE(COALESCE(COALESCE(prt1_1.a, p2_1.a), p3_1.a), p4_1.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(prt1_1.a, p2_1.a) = p3_1.a) AND (COALESCE(prt1_1.b, p2_1.b) = p3_1.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                           ->  Seq Scan on prt1_p1 prt1_1
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p1 p2_1
+                     ->  Hash
+                           ->  Seq Scan on prt2_p1 p3_1
+               ->  Hash
+                     ->  Seq Scan on prt1_p1 p4_1
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a) = p4_2.a) AND (COALESCE(COALESCE(prt1_2.b, p2_2.b), p3_2.b) = p4_2.b))
+               Filter: ((COALESCE(COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a), p4_2.a) >= 490) AND (COALESCE(COALESCE(COALESCE(prt1_2.a, p2_2.a), p3_2.a), p4_2.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(prt1_2.a, p2_2.a) = p3_2.a) AND (COALESCE(prt1_2.b, p2_2.b) = p3_2.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                           ->  Seq Scan on prt1_p2 prt1_2
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p2 p2_2
+                     ->  Hash
+                           ->  Seq Scan on prt2_p2 p3_2
+               ->  Hash
+                     ->  Seq Scan on prt1_p2 p4_2
+         ->  Hash Full Join
+               Hash Cond: ((COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a) = p4_3.a) AND (COALESCE(COALESCE(prt1_3.b, p2_3.b), p3_3.b) = p4_3.b))
+               Filter: ((COALESCE(COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a), p4_3.a) >= 490) AND (COALESCE(COALESCE(COALESCE(prt1_3.a, p2_3.a), p3_3.a), p4_3.a) <= 510))
+               ->  Hash Full Join
+                     Hash Cond: ((COALESCE(prt1_3.a, p2_3.a) = p3_3.a) AND (COALESCE(prt1_3.b, p2_3.b) = p3_3.b))
+                     ->  Hash Full Join
+                           Hash Cond: ((prt1_3.a = p2_3.a) AND (prt1_3.b = p2_3.b))
+                           ->  Seq Scan on prt1_p3 prt1_3
+                           ->  Hash
+                                 ->  Seq Scan on prt2_p3 p2_3
+                     ->  Hash
+                           ->  Seq Scan on prt2_p3 p3_3
+               ->  Hash
+                     ->  Seq Scan on prt1_p3 p4_3
+(44 rows)
+
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+ count 
+-------
+    14
+(1 row)
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 575ba7b..dad1e07 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -145,6 +145,24 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
 
+--
+-- 3-way full join
+--
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+
+--
+-- 4-way full join
+--
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+SELECT COUNT(*) FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b) FULL JOIN prt2 p3(b,a,c) USING (a, b) FULL JOIN prt1 p4 (a,b,c) USING (a, b)
+  WHERE a BETWEEN 490 AND 510;
+
 -- Cases with non-nullable expressions in subquery results;
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)

#25

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Tom Lane (#24)

Re: d25ea01275 and partitionwise join

On Tue, Apr 7, 2020 at 2:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

which does succeed in using partitionwise join. Please see attached
delta that applies on your v7 if that is what you'd rather have.

I figured these queries were cheap enough that we could afford to run
both. With that and some revision of the comments (per attached),
I was feeling like we were ready to go.

Looks good to me.

However, re-reading the thread,
one of Richard's comments struck me as still relevant. If you try, say,

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);

set enable_partitionwise_join = 1;

explain
select * from (p as t1 full join p as t2 on t1.k = t2.k) as t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.k1, t12.k2) = t3.k;

this patch will give you a partitioned join, with a different plan
than you get without enable_partitionwise_join. This is scary,
because it's not immediately obvious that the transformation is
correct.

I *think* that it might be all right, because although what we
are matching to is a user-written COALESCE() not an actual
FULL JOIN USING column, it has to behave in somewhat the same
way. In particular, by construction it must be a coalesce of
some representation of the matching partition columns of the
full join's inputs. So, even though it might go to null in
different cases than an actual USING variable would do, it
does not break the ability to partition the join.

Seems fine to me too. Maybe users should avoid writing it by hand if
possible anyway, because even slight variation in the way it's written
will affect this:

set enable_partitionwise_join = 1;

-- order of coalesce() arguments reversed
explain (costs off)
select * from (p as t1 full join p as t2 on t1.k = t2.k) as t12(k1,val1,k2,val2)
full join p as t3 on COALESCE(t12.k2, t12.k1) = t3.k;
QUERY PLAN
----------------------------------------------
Hash Full Join
Hash Cond: (COALESCE(t2.k, t1.k) = t3.k)
-> Append
-> Hash Full Join
Hash Cond: (t1_1.k = t2_1.k)
-> Seq Scan on p_1 t1_1
-> Hash
-> Seq Scan on p_1 t2_1
-> Hash Full Join
Hash Cond: (t1_2.k = t2_2.k)
-> Seq Scan on p_2 t1_2
-> Hash
-> Seq Scan on p_2 t2_2
-> Hash
-> Append
-> Seq Scan on p_1 t3_1
-> Seq Scan on p_2 t3_2
(17 rows)

However, I have not spent a whole lot of time thinking about
partitionwise joins, so rather than go ahead and commit I am
going to toss that point back out for community consideration.

Agreed.

At the very least, what I'd written in the comment needs a
lot more defense than it has now.

Sorry, which comment are you referring to?

--
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#26

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Amit Langote (#25)

Re: d25ea01275 and partitionwise join

Amit Langote <amitlangote09@gmail.com> writes:

On Tue, Apr 7, 2020 at 2:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I *think* that it might be all right, because although what we
are matching to is a user-written COALESCE() not an actual
FULL JOIN USING column, it has to behave in somewhat the same
way. In particular, by construction it must be a coalesce of
some representation of the matching partition columns of the
full join's inputs. So, even though it might go to null in
different cases than an actual USING variable would do, it
does not break the ability to partition the join.

Seems fine to me too. Maybe users should avoid writing it by hand if
possible anyway, because even slight variation in the way it's written
will affect this:

I'm not particularly concerned about users intentionally trying to trigger
this behavior. I just want to be sure that if someone accidentally does
so, we don't produce a wrong plan.

I waited till after the "advanced partitionwise join" patch went
in because that seemed more important (plus I wondered a bit if
that would subsume this). But this patch seems to still work,
and the other thing doesn't fix the problem, so pushed.

regards, tom lane

#27

Etsuro Fujita

etsuro.fujita@gmail.com

almost 6 years ago

In reply to: Tom Lane (#26)

Re: d25ea01275 and partitionwise join

On Wed, Apr 8, 2020 at 11:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

But this patch seems to still work,
and the other thing doesn't fix the problem, so pushed.

Thanks for working on this!

Best regards,
Etsuro Fujita

#28

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Tom Lane (#26)

Re: d25ea01275 and partitionwise join

On Wed, Apr 8, 2020 at 11:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Tue, Apr 7, 2020 at 2:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I *think* that it might be all right, because although what we
are matching to is a user-written COALESCE() not an actual
FULL JOIN USING column, it has to behave in somewhat the same
way. In particular, by construction it must be a coalesce of
some representation of the matching partition columns of the
full join's inputs. So, even though it might go to null in
different cases than an actual USING variable would do, it
does not break the ability to partition the join.

Seems fine to me too. Maybe users should avoid writing it by hand if
possible anyway, because even slight variation in the way it's written
will affect this:

I'm not particularly concerned about users intentionally trying to trigger
this behavior. I just want to be sure that if someone accidentally does
so, we don't produce a wrong plan.

I waited till after the "advanced partitionwise join" patch went
in because that seemed more important (plus I wondered a bit if
that would subsume this). But this patch seems to still work,
and the other thing doesn't fix the problem, so pushed.

Thank you for your time on this.

Amit Langote
EnterpriseDB: http://www.enterprisedb.com