Getting sorted data from foreign server for merge join

Started by Ashutosh Bapatabout 10 years ago26 messages

ashutosh.bapat@enterprisedb.com

about 10 years ago

1 attachment(s)

Hi All,
PFA patch to get data sorted from the foreign server (postgres_fdw)
according to the pathkeys useful for merge join.

For a given base relation (extendable to join when that becomes available
in postgres_fdw), the patch tries to find merge joinable clauses. It then
adds paths with pathkeys extracted out of these merge joinable clauses. The
merge joinable clauses form equivalence classes. The patch searches in
root->eq_classes for equivalence members belonging to the given relation.
For every such expression it creates a single member pathkey list and
corresponding path. The test postgres_fdw.sql has an existing join which
uses merge join. With this patch the data is sorted on the foreign server
than locally.

While mergejoinable clauses can be obtained from rel->joininfo as well. But
rel->joininfo contains other clauses as well and we need extra efforts to
remove duplicates if the same expression appears in multiple merge joinable
clauses.

Two joining relations can have multiple merge joinable clauses, requiring
multi-member pathkeys so that merge join is efficient to the maximum
extent. The order in which the expressions appears in pathkeys can change
the costs of sorting the data according to the pathkeys, depending upon the
expressions and the presence of indexes containing those expressions. Thus
ideally we would need to club all the expressions appearing in all the
clauses for given two relations and create paths with pathkeys for every
order of these expressions.That explodes the number of possible paths. We
may restrict the number of paths created by considering only certain orders
like sort_inner_and_outer(). In any case, costing such paths increases the
planning time which may not be worth it. So, this patch uses a heuristic
approach of creating single member pathkeys path for every merge joinable
expression.

The pathkeys need to be canonicalised using make_canonical_pathkey(), which
is a static function. I have added a TODO and comments in the patch
explaining possible ways to avoid "extern"alization of this function.

Comments/suggestions are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_sort_all_pd.patchtext/x-diff; charset=US-ASCII; name=pg_sort_all_pd.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0159bf5..ddb05f2 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3383,22 +3383,22 @@ select tableoid::regclass, * from bar order by 1,2;
  bar2     |  4 | 144
  bar2     |  7 |  77
 (6 rows)
 
 -- Check UPDATE with inherited target and an appendrel subquery
 explain (verbose, costs off)
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
  Update on public.bar
    Update on public.bar
    Foreign Update on public.bar2
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
          Output: bar.f1, (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
          Hash Cond: (foo.f1 = bar.f1)
          ->  Append
                ->  Seq Scan on public.foo
                      Output: ROW(foo.f1), foo.f1
@@ -3410,41 +3410,44 @@ where bar.f1 = ss.f1;
                ->  Foreign Scan on public.foo2 foo2_1
                      Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3)
                      Remote SQL: SELECT f1 FROM public.loct1
          ->  Hash
                Output: bar.f1, bar.f2, bar.ctid
                ->  Seq Scan on public.bar
                      Output: bar.f1, bar.f2, bar.ctid
    ->  Merge Join
          Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, (ROW(foo.f1))
          Merge Cond: (bar2.f1 = foo.f1)
-         ->  Sort
+         ->  Foreign Scan on public.bar2
                Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid
-               Sort Key: bar2.f1
-               ->  Foreign Scan on public.bar2
-                     Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
-         ->  Sort
+               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 ORDER BY f1 ASC FOR UPDATE
+         ->  Materialize
                Output: (ROW(foo.f1)), foo.f1
-               Sort Key: foo.f1
-               ->  Append
-                     ->  Seq Scan on public.foo
-                           Output: ROW(foo.f1), foo.f1
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Sort
+                           Output: (ROW(foo.f1)), foo.f1
+                           Sort Key: foo.f1
+                           ->  Seq Scan on public.foo
+                                 Output: ROW(foo.f1), foo.f1
                      ->  Foreign Scan on public.foo2
                            Output: ROW(foo2.f1), foo2.f1
-                           Remote SQL: SELECT f1 FROM public.loct1
-                     ->  Seq Scan on public.foo foo_1
-                           Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3)
+                           Remote SQL: SELECT f1 FROM public.loct1 ORDER BY f1 ASC
+                     ->  Sort
+                           Output: (ROW((foo_1.f1 + 3))), ((foo_1.f1 + 3))
+                           Sort Key: ((foo_1.f1 + 3))
+                           ->  Seq Scan on public.foo foo_1
+                                 Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3)
                      ->  Foreign Scan on public.foo2 foo2_1
                            Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3)
-                           Remote SQL: SELECT f1 FROM public.loct1
-(45 rows)
+                           Remote SQL: SELECT f1 FROM public.loct1 ORDER BY (f1 + 3) ASC
+(48 rows)
 
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cd4ed0c..02d17d2 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -251,20 +251,21 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *generate_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -501,90 +502,223 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
- * postgresGetForeignPaths
- *		Create possible scan paths for a scan on the foreign table
+ * generate_pathkeys_for_relation
+ * Function collects "interesting" pathkeys for given relation. The caller is
+ * possibily going to create paths with these pathkeys so as to get the data
+ * sorted according to the pathkeys.
+ * Following pathkeys are of interest here.
+ * 1. query pathkeys
+ * 2. pathkeys arising out of the equivalence classes
+ * Comments in the function explain these two in details.
+ *
+ * TODO: This function requires make_canonical_pathkey() to be "extern"alized.
+ * The function is required to get pathkey corresponding to a given equivalence
+ * class if there exists one already or create one otherwise. There are two ways
+ * we can avoid "extern"alization
+ * 1. Create a wrapper exerna function returning pathkey with inputs root and
+ * 		equivalence class. Having this function pathkeys.c would avoid
+ * 		"extern"alization of the function.
+ * 2. Move generate_pathkeys_for_relation() to pathkeys.c. Considering that this
+ *    function is heuristically finding the "interesting" pathkeys, it may not
+ *    be very useful in general, and thus can not be part of the core.
  */
-static void
-postgresGetForeignPaths(PlannerInfo *root,
-						RelOptInfo *baserel,
-						Oid foreigntableid)
+static List *
+generate_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
 {
-	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
-	ForeignPath *path;
-	List	   *ppi_list;
-	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
-
-	/*
-	 * Create simplest ForeignScan path node and add it to baserel.  This path
-	 * corresponds to SeqScan path of regular tables (though depending on what
-	 * baserestrict conditions we were able to send to remote, there might
-	 * actually be an indexscan happening there).  We already did all the work
-	 * to estimate cost and size of this path.
-	 */
-	path = create_foreignscan_path(root, baserel,
-								   fpinfo->rows,
-								   fpinfo->startup_cost,
-								   fpinfo->total_cost,
-								   NIL, /* no pathkeys */
-								   NULL,		/* no outer rel either */
-								   NIL);		/* no fdw_private list */
-	add_path(baserel, (Path *) path);
+	List	*usable_pklist = NIL;	/* List of pathkeys to be returned */
+	List	*usable_pathkeys = NIL; /* List of pathkey nodes gathered */
+	ListCell	*lc;
+	bool	is_child_rel = (rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
 
 	/*
 	 * Determine whether we can potentially push query pathkeys to the remote
 	 * side, avoiding a local sort.
 	 */
 	foreach(lc, root->query_pathkeys)
 	{
 		PathKey    *pathkey = (PathKey *) lfirst(lc);
 		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
 		Expr	   *em_expr;
 
 		/*
 		 * is_foreign_expr would detect volatile expressions as well, but
 		 * ec_has_volatile saves some cycles.
 		 */
 		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
+			(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) &&
+			is_foreign_expr(root, rel, em_expr))
 			usable_pathkeys = lappend(usable_pathkeys, pathkey);
 		else
 		{
 			/*
 			 * The planner and executor don't have any clever strategy for
 			 * taking data sorted by a prefix of the query's pathkeys and
 			 * getting it to be sorted by all of those pathekeys.  We'll just
 			 * end up resorting the entire data set.  So, unless we can push
 			 * down all of the query pathkeys, forget it.
 			 */
 			list_free(usable_pathkeys);
 			usable_pathkeys = NIL;
 			break;
 		}
 	}
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	if (usable_pathkeys)
+		usable_pklist = list_make1(usable_pathkeys);
+
+	/*
+	 * Check equivalence classes where this relation appears. Getting data
+	 * ordered on corresponding expressions might help merge join. A join
+	 * between two relations might have multiple merge joinable clauses and
+	 * might require the data to be sorted on all the expressions involved. But
+	 * we do not have knowledge about the indexes present on the foreign server.
+	 * The cost of sorting data on multiple expressions can vary greatly based on
+	 * the kind of expressions and their presence in an index. Hence for now, we
+	 * create paths with single pathkey.
+	 * Nothing to do if the relation does is not part of any equivalence class.
+	 */
+	if (!rel->has_eclass_joins)
+		return usable_pklist;
+
+	foreach(lc, root->eq_classes)
+	{
+		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+		Expr	*em_expr;
+		List	*pathkeys;
+
+		/*
+		 * Won't generate joinclauses if const, single-member or has
+		 * volatile expression.
+		 */
+		if (cur_ec->ec_has_const || cur_ec->ec_has_volatile ||
+				list_length(cur_ec->ec_members) <= 1)
+			continue;
+
+		/*
+		 * Equivalence classes covering relations other than the current one
+		 * are of interest here.
+		 */
+		if (!is_child_rel &&
+			bms_subset_compare(rel->relids, cur_ec->ec_relids) != BMS_SUBSET1)
+			continue;
+		
+		/* 
+		 * In case of child relation, we need to check that the
+		 * equivalence class indicates a join to a relation other than
+		 * parents, other children and itself (something similar to above).
+		 * Otherwise we will end up creating useless paths. The code below is
+		 * similar to generate_implied_equalities_for_column(), which might
+		 * give a hint.
+		 */
+		if (is_child_rel)
+		{
+			ListCell	*lc_em;
+			Relids		parent_relids = find_childrel_parents(root, rel);
+			bool		has_other_rel_joins = false;
+			foreach(lc_em, cur_ec->ec_members)
+			{
+				EquivalenceMember	*other_em = lfirst(lc_em);
+				
+				/*
+				 * Ignore equivalence members which correspond to children
+				 * or same relation or to parent relations
+				 */
+				if (other_em->em_is_child ||
+					bms_overlap(rel->relids, other_em->em_relids) ||
+					bms_overlap(parent_relids, other_em->em_relids))
+					continue;
+
+				/* Found one "other" relation */
+				has_other_rel_joins = true;
+				break;
+			}
+
+			/*
+			 * If this child relation doesn't join with relations other than
+			 * parents, other children or itself, no use getting data sorted.
+			 */
+			if (!has_other_rel_joins)
+				continue;
+		}
+
+		/*
+		 * If there exists an equivalence member which entirely belongs to
+		 * the current relation and corresponding expression is pushable to
+		 * the foreign server, create single element pathkeys for the same.
+		 * This equivalence class might be part of pathkeys derived from
+		 * root->query_pathkeys, but the respective paths might cost
+		 * different. add_path will discard any inferior paths.
+		 */
+		if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) &&
+				is_foreign_expr(root, rel, em_expr))
+		{
+
+			pathkeys = list_make1(make_canonical_pathkey(root, cur_ec,
+										linitial_oid(cur_ec->ec_opfamilies),
+										BTLessStrategyNumber,
+										false));
+			usable_pklist = lappend(usable_pklist, pathkeys);
+		}
+	}
+
+	return usable_pklist;
+}
+
+/*
+ * postgresGetForeignPaths
+ *		Create possible scan paths for a scan on the foreign table
+ */
+static void
+postgresGetForeignPaths(PlannerInfo *root,
+						RelOptInfo *baserel,
+						Oid foreigntableid)
+{
+	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+	ForeignPath *path;
+	List	   *ppi_list;
+	ListCell   *lc;
+	List		*usable_pklist = NIL;	/* List of all pathkeys */
+
+	/*
+	 * Create simplest ForeignScan path node and add it to baserel.  This path
+	 * corresponds to SeqScan path of regular tables (though depending on what
+	 * baserestrict conditions we were able to send to remote, there might
+	 * actually be an indexscan happening there).  We already did all the work
+	 * to estimate cost and size of this path.
+	 */
+	path = create_foreignscan_path(root, baserel,
+								   fpinfo->rows,
+								   fpinfo->startup_cost,
+								   fpinfo->total_cost,
+								   NIL, /* no pathkeys */
+								   NULL,		/* no outer rel either */
+								   NIL);		/* no fdw_private list */
+	add_path(baserel, (Path *) path);
+
+	usable_pklist = generate_pathkeys_for_relation(root, baserel);
+
+	/* Create one path for each set of pathkeys we found above. */
+	foreach (lc, usable_pklist)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List		*usable_pathkeys = lfirst(lc);
 
 		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
 										 usable_pathkeys,
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..d13a244 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
-					RelOptInfo *baserel, List *pathkeys);
+								RelOptInfo *baserel, List *pathkeys);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
 extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo);
 
 #endif   /* POSTGRES_FDW_H */
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 87123a5..5ef6e1a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -197,12 +197,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Bapat (#1)

Re: Getting sorted data from foreign server for merge join

On Thu, Nov 5, 2015 at 11:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Hi All,
PFA patch to get data sorted from the foreign server (postgres_fdw)
according to the pathkeys useful for merge join.

For a given base relation (extendable to join when that becomes available in
postgres_fdw), the patch tries to find merge joinable clauses. It then adds
paths with pathkeys extracted out of these merge joinable clauses. The merge
joinable clauses form equivalence classes. The patch searches in
root->eq_classes for equivalence members belonging to the given relation.
For every such expression it creates a single member pathkey list and
corresponding path. The test postgres_fdw.sql has an existing join which
uses merge join. With this patch the data is sorted on the foreign server
than locally.

While mergejoinable clauses can be obtained from rel->joininfo as well. But
rel->joininfo contains other clauses as well and we need extra efforts to
remove duplicates if the same expression appears in multiple merge joinable
clauses.

Two joining relations can have multiple merge joinable clauses, requiring
multi-member pathkeys so that merge join is efficient to the maximum extent.
The order in which the expressions appears in pathkeys can change the costs
of sorting the data according to the pathkeys, depending upon the
expressions and the presence of indexes containing those expressions. Thus
ideally we would need to club all the expressions appearing in all the
clauses for given two relations and create paths with pathkeys for every
order of these expressions.That explodes the number of possible paths. We
may restrict the number of paths created by considering only certain orders
like sort_inner_and_outer(). In any case, costing such paths increases the
planning time which may not be worth it. So, this patch uses a heuristic
approach of creating single member pathkeys path for every merge joinable
expression.

The pathkeys need to be canonicalised using make_canonical_pathkey(), which
is a static function. I have added a TODO and comments in the patch
explaining possible ways to avoid "extern"alization of this function.

Comments/suggestions are welcome.

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased. I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys. Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful". Lots of pathkeys are usable, but only a few of
those are useful. I suggest renaming usable_pathkeys to
query_pathkeys and usable_pklist to useful_pathkeys. Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects. Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage? Maybe it is, but if so I'd like to hear a good explanation as
to why.

Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

I don't find this comment illuminating:

+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to above).
+         * Otherwise we will end up creating useless paths. The code below is
+         * similar to generate_implied_equalities_for_column(), which might
+         * give a hint.

That basically just says that we have to do it this way because the
other way would be wrong. But it doesn't say WHY the other way would
be wrong. Then a few lines later, you have another comment which says
the same thing again:

+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

about 10 years ago

In reply to: Robert Haas (#2)

Re: Getting sorted data from foreign server for merge join

On Friday, November 6, 2015 10:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this approach is generally reasonable, but I suggested
parts of it, so may be biased. I would be interested in hearing
the opinions of others.

Has anyone taken a close look at what happens if the two sides of
the merge join have different implementations of the same collation
name? Is there anything we should do to defend against the
problem?

We already face the issue of corrupted indexes when we have
different revisions of glibc on a primary and a standby or when the
OS on a server is updated, so this wouldn't be entirely a *new*
problem:

/messages/by-id/BA6132ED-1F6B-4A0B-AC22-81278F5AB81E@tripadvisor.com

... but it would be a brand-new way to hit it, and we might be able
to spot the problem in a merge join by watching for rows being fed
to either side of the join which are not in order according to the
machine doing the join.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Corey Huinker

corey.huinker@gmail.com

about 10 years ago

In reply to: Robert Haas (#2)

Re: Getting sorted data from foreign server for merge join

On Fri, Nov 6, 2015 at 11:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 5, 2015 at 11:54 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Hi All,
PFA patch to get data sorted from the foreign server (postgres_fdw)
according to the pathkeys useful for merge join.

For a given base relation (extendable to join when that becomes

available in

postgres_fdw), the patch tries to find merge joinable clauses. It then

adds

paths with pathkeys extracted out of these merge joinable clauses. The

merge

joinable clauses form equivalence classes. The patch searches in
root->eq_classes for equivalence members belonging to the given relation.
For every such expression it creates a single member pathkey list and
corresponding path. The test postgres_fdw.sql has an existing join which
uses merge join. With this patch the data is sorted on the foreign server
than locally.

While mergejoinable clauses can be obtained from rel->joininfo as well.

But

rel->joininfo contains other clauses as well and we need extra efforts to
remove duplicates if the same expression appears in multiple merge

joinable

clauses.

Two joining relations can have multiple merge joinable clauses, requiring
multi-member pathkeys so that merge join is efficient to the maximum

extent.

The order in which the expressions appears in pathkeys can change the

costs

of sorting the data according to the pathkeys, depending upon the
expressions and the presence of indexes containing those expressions.

Thus

ideally we would need to club all the expressions appearing in all the
clauses for given two relations and create paths with pathkeys for every
order of these expressions.That explodes the number of possible paths. We
may restrict the number of paths created by considering only certain

orders

like sort_inner_and_outer(). In any case, costing such paths increases

the

planning time which may not be worth it. So, this patch uses a heuristic
approach of creating single member pathkeys path for every merge joinable
expression.

The pathkeys need to be canonicalised using make_canonical_pathkey(),

which

is a static function. I have added a TODO and comments in the patch
explaining possible ways to avoid "extern"alization of this function.

Comments/suggestions are welcome.

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased. I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys. Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful". Lots of pathkeys are usable, but only a few of
those are useful. I suggest renaming usable_pathkeys to
query_pathkeys and usable_pklist to useful_pathkeys. Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects. Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage? Maybe it is, but if so I'd like to hear a good explanation as
to why.

Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

I don't find this comment illuminating:
+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to
above).
+         * Otherwise we will end up creating useless paths. The code
below is
+         * similar to generate_implied_equalities_for_column(), which
might
+         * give a hint.
That basically just says that we have to do it this way because the
other way would be wrong. But it doesn't say WHY the other way would
be wrong. Then a few lines later, you have another comment which says
the same thing again:
+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Sorry to barge in late, but I was wondering if what we've learned with this
patch can be applied to the case of asserting a sort order on a query
returning from dblink().

This is a common use-case for us where one machine will issue a query
request to a non-Pg database that happens to speak libpq (Vertica,
Redshift, I guess Greenplum would be one too), but for whom foreign data
wrappers aren't yet an option, and then join that data to local tables,
many of which have indexes matching the sort order that I know is coming
from the dblink(), but have no way of telling that to the query planner.

I'm guessing the answer is "no", but I wanted to make you aware of a
similar real-world need.

Greg Stark

stark@mit.edu

about 10 years ago

In reply to: Ashutosh Bapat (#1)

Re: Getting sorted data from foreign server for merge join

On Fri, Nov 6, 2015 at 4:54 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

PFA patch to get data sorted from the foreign server (postgres_fdw)
according to the pathkeys useful for merge join.

An idle thought. There are going to be a lot of cases where different
software systems actually disagree about collation rules. I wonder if
it would be valuable to have a node that just checks that each row is
in fact greater than the previous row and throws an error if not. That
can be optional or a parameter of the FDW but it's probably cheap
enough to have enabled by default. It would save a lot of difficult to
heartache since the behaviour if the inputs aren't correctly sorted
will be strangely incorrect join results. Often the results may just
be missing or duplicated rows and that can be easy to miss and lead to
corrupted databases or security problems.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Corey Huinker (#4)

Re: Getting sorted data from foreign server for merge join

On Sat, Nov 7, 2015 at 12:07 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

Sorry to barge in late, but I was wondering if what we've learned with this
patch can be applied to the case of asserting a sort order on a query
returning from dblink().

Nope.

Sorry to the bearer of bad news, but that would be a different and
much harder development project. Set-returning functions have no way
to indicate anything about the ordering of their return values at
present. We could invent something, but the work involved wouldn't
have much to do with this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Greg Stark (#5)

Re: Getting sorted data from foreign server for merge join

On Sat, Nov 7, 2015 at 5:42 PM, Greg Stark <stark@mit.edu> wrote:

On Fri, Nov 6, 2015 at 4:54 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

PFA patch to get data sorted from the foreign server (postgres_fdw)
according to the pathkeys useful for merge join.

An idle thought. There are going to be a lot of cases where different
software systems actually disagree about collation rules. I wonder if
it would be valuable to have a node that just checks that each row is
in fact greater than the previous row and throws an error if not. That
can be optional or a parameter of the FDW but it's probably cheap
enough to have enabled by default. It would save a lot of difficult to
heartache since the behaviour if the inputs aren't correctly sorted
will be strangely incorrect join results. Often the results may just
be missing or duplicated rows and that can be easy to miss and lead to
corrupted databases or security problems.

It's not a bad thought, but it could come up even locally - we've had
more than one situation where indexes have gotten corrupted by
updating glibc. The new glibc doesn't agree with the old one on what
the collation ordering is, and so the indexes are wrong with respect
to the new glibc version. If we were going to design something like
this, rather than making it a separate node, I'd be inclined to create
it as a C-callable function that could be invoked anywhere we want to
check that the ordering is valid. I suspect you're wrong about the
cost, though: I bet it wouldn't be too hard to find cases where it
imposes a really noticeable penalty.

Also, to be clear, I don't think this patch needs to solve that problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Kevin Grittner (#3)

Re: Getting sorted data from foreign server for merge join

On Fri, Nov 6, 2015 at 2:03 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Has anyone taken a close look at what happens if the two sides of
the merge join have different implementations of the same collation
name? Is there anything we should do to defend against the
problem?

The issue of FDWs vs. collations has been thought about to some degree
(see FDW_COLLATE_NONE/SAFE/UNSAFE), but I don't quite understand that
stuff in detail.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 10 years ago

In reply to: Robert Haas (#6)

Re: Getting sorted data from foreign server for merge join

On 2015/11/10 0:56, Robert Haas wrote:

On Sat, Nov 7, 2015 at 12:07 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

Sorry to barge in late, but I was wondering if what we've learned with this
patch can be applied to the case of asserting a sort order on a query
returning from dblink().

Nope.

Sorry to the bearer of bad news, but that would be a different and
much harder development project. Set-returning functions have no way
to indicate anything about the ordering of their return values at
present. We could invent something, but the work involved wouldn't
have much to do with this patch.

There was a patch (which, it seems, was rejected [1]https://commitfest.postgresql.org/4/88/) to do this sort of
thing.

Thanks,
Amit

[1]: https://commitfest.postgresql.org/4/88/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#8)

Re: Getting sorted data from foreign server for merge join

On Mon, Nov 9, 2015 at 9:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 6, 2015 at 2:03 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Has anyone taken a close look at what happens if the two sides of
the merge join have different implementations of the same collation
name? Is there anything we should do to defend against the
problem?

The issue of FDWs vs. collations has been thought about to some degree
(see FDW_COLLATE_NONE/SAFE/UNSAFE), but I don't quite understand that
stuff in detail> .

collations arising from a foreign table's var are considered to be safer
(FDW_COLLATE_SAFE) to push down to the foreign server , since they are
either local default collation or are assumed to be same on foreign and
local server as per user declaration. The onus of making that sure is on
the user who declares particular collation for a foreign table var. As said
upthread, different glibc implementations can cause collation ordering
mismatch, this patch will be susceptible to the same problem as
master/standby problem.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#11

Craig Ringer

craig@2ndquadrant.com

about 10 years ago

In reply to: Ashutosh Bapat (#10)

Re: Getting sorted data from foreign server for merge join

On 16 November 2015 at 17:47, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

collations arising from a foreign table's var are considered to be safer
(FDW_COLLATE_SAFE) to push down to the foreign server , since they are
either local default collation or are assumed to be same on foreign and
local server as per user declaration. The onus of making that sure is on
the user who declares particular collation for a foreign table var. As said
upthread, different glibc implementations can cause collation ordering
mismatch, this patch will be susceptible to the same problem as
master/standby problem.

Yeah. It's less bad than the related problems we already have:

* Upgrading glibc can cause your indexes to no longer match your local
collations
* Different glibc versions on master and replica(s) can have the same effect

I don't see a problem with adding another way this same issue can be
expressed, since there's no sane way to fix it _properly_ without either
versioned collations in glibc or bringing collation onboard into Pg.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#2)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

Thanks Robert for your comments. Please see my replies inlined. Updated
patch is attached.

On Fri, Nov 6, 2015 at 10:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think this approach is generally reasonable, but I suggested parts
of it, so may be biased. I would be interested in hearing the
opinions of others.

Random notes:

"possibily" is a typo.

Fixed.

usable_pklist is confusing because it seems like it might be talking
about primary keys rather than pathkeys. Also, I realize now, looking
at this again, that you're saying "usable" when what I really think
you mean is "useful". Lots of pathkeys are usable, but only a few of
those are useful. I suggest renaming usable_pathkeys to
query_pathkeys

Done.

and usable_pklist to useful_pathkeys.

pathkeys is being used to mean a list of pathkey nodes. What we want here
is list of pathkeys so I have renamed it as useful_pathkeys_list, somewhat
long but clear.

Similarly, let's
rename generate_pathkeys_for_relation() to
get_useful_pathkeys_for_relation().

Done.

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects. Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage? Maybe it is, but if so I'd like to hear a good explanation as
to why.

For a foreign table no pathkeys are created before creating paths for
individual foreign table since there are no indexes. For a regular table,
the pathkeys get created for useful indexes. Exception to this is if the
expression appears in the ORDER BY clause, the pathkey for the same is
created while handling ORDER BY clause. So, we will always need to create
pathkey for a foreign table unless the corresponding expression does not
appear in the ORDER BY clause. This can be verified by breaking in
make_canonical_pathkey() while executing "explain verbose select * from ft1
join lt using (val);" ft1(val int, val2 int) is foreign table and lt (val
int, val2 int) is local table. You will hit the first breakpoint with stack
trace
Breakpoint 1, make_canonical_pathkey (root=0x231d858, eclass=0x231f030,
opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
(gdb) where
#0 make_canonical_pathkey (root=0x231d858, eclass=0x231f030,
opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
#1 0x00007f6740077e39 in get_useful_pathkeys_for_relation (root=0x231d858,
rel=0x231e390) at postgres_fdw.c:663
#2 0x00007f6740077f34 in postgresGetForeignPaths (root=0x231d858,
baserel=0x231e390, foreigntableid=16393) at postgres_fdw.c:705
#3 0x00000000007079cf in set_foreign_pathlist (root=0x231d858,
rel=0x231e390, rte=0x231c488) at allpaths.c:600
#4 0x0000000000707653 in set_rel_pathlist (root=0x231d858, rel=0x231e390,
rti=1, rte=0x231c488) at allpaths.c:395
#5 0x000000000070735f in set_base_rel_pathlists (root=0x231d858) at
allpaths.c:277

at this point

(gdb) p root->canon_pathkeys
$1 = (List *) 0x0

so get_useful_pathkeys_for_relation->make_canonical_pathkey() will create
the first pathkey.

I have left the corresponding TODO untouched, if anybody else wants to
review that part of the code. I will remove it in the final version of the
patch.

Is the comment "Equivalence classes covering relations other than the
current one are of interest here" missing a "not"?

The sentence is correct. We need equivalence class covering relations other
than the current one, because only such equivalence classes indicate joins
between two relations. If an equivalence class covers only a single baserel
(has only a single member in ec_relids), it indicates equivalence between
columns/expressions of the same table, which can not result in a join. I
have changed to comment to be more elaborate.

I don't find this comment illuminating:
+         * In case of child relation, we need to check that the
+         * equivalence class indicates a join to a relation other than
+         * parents, other children and itself (something similar to
above).
+         * Otherwise we will end up creating useless paths. The code
below is
+         * similar to generate_implied_equalities_for_column(), which
might
+         * give a hint.
That basically just says that we have to do it this way because the
other way would be wrong. But it doesn't say WHY the other way would
be wrong.

There's no "other way" which is wrong. What's the other way you are talking
about?

Anway, I have updated the comment to be more readable.

Then a few lines later, you have another comment which says
the same thing again:
+                /*
+                 * Ignore equivalence members which correspond to children
+                 * or same relation or to parent relations
+                 */

Updated this too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_sort_all_pd_v2.patchtext/x-diff; charset=US-ASCII; name=pg_sort_all_pd_v2.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..fa071d6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3467,22 +3467,22 @@ select tableoid::regclass, * from bar order by 1,2;
  bar2     |  4 | 144
  bar2     |  7 |  77
 (6 rows)
 
 -- Check UPDATE with inherited target and an appendrel subquery
 explain (verbose, costs off)
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                           QUERY PLAN                                           
+------------------------------------------------------------------------------------------------
  Update on public.bar
    Update on public.bar
    Foreign Update on public.bar2
      Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1
    ->  Hash Join
          Output: bar.f1, (bar.f2 + 100), bar.ctid, (ROW(foo.f1))
          Hash Cond: (foo.f1 = bar.f1)
          ->  Append
                ->  Seq Scan on public.foo
                      Output: ROW(foo.f1), foo.f1
@@ -3494,41 +3494,44 @@ where bar.f1 = ss.f1;
                ->  Foreign Scan on public.foo2 foo2_1
                      Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3)
                      Remote SQL: SELECT f1 FROM public.loct1
          ->  Hash
                Output: bar.f1, bar.f2, bar.ctid
                ->  Seq Scan on public.bar
                      Output: bar.f1, bar.f2, bar.ctid
    ->  Merge Join
          Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, (ROW(foo.f1))
          Merge Cond: (bar2.f1 = foo.f1)
-         ->  Sort
+         ->  Foreign Scan on public.bar2
                Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid
-               Sort Key: bar2.f1
-               ->  Foreign Scan on public.bar2
-                     Output: bar2.f1, bar2.f2, bar2.f3, bar2.ctid
-                     Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
-         ->  Sort
+               Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 ORDER BY f1 ASC FOR UPDATE
+         ->  Materialize
                Output: (ROW(foo.f1)), foo.f1
-               Sort Key: foo.f1
-               ->  Append
-                     ->  Seq Scan on public.foo
-                           Output: ROW(foo.f1), foo.f1
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Sort
+                           Output: (ROW(foo.f1)), foo.f1
+                           Sort Key: foo.f1
+                           ->  Seq Scan on public.foo
+                                 Output: ROW(foo.f1), foo.f1
                      ->  Foreign Scan on public.foo2
                            Output: ROW(foo2.f1), foo2.f1
-                           Remote SQL: SELECT f1 FROM public.loct1
-                     ->  Seq Scan on public.foo foo_1
-                           Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3)
+                           Remote SQL: SELECT f1 FROM public.loct1 ORDER BY f1 ASC
+                     ->  Sort
+                           Output: (ROW((foo_1.f1 + 3))), ((foo_1.f1 + 3))
+                           Sort Key: ((foo_1.f1 + 3))
+                           ->  Seq Scan on public.foo foo_1
+                                 Output: ROW((foo_1.f1 + 3)), (foo_1.f1 + 3)
                      ->  Foreign Scan on public.foo2 foo2_1
                            Output: ROW((foo2_1.f1 + 3)), (foo2_1.f1 + 3)
-                           Remote SQL: SELECT f1 FROM public.loct1
-(45 rows)
+                           Remote SQL: SELECT f1 FROM public.loct1 ORDER BY (f1 + 3) ASC
+(48 rows)
 
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cd4ed0c..0bc6375 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -251,20 +251,21 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -501,100 +502,256 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
+ * get_useful_pathkeys_for_relation
+ * Function collects useful pathkeys for given relation. The caller is
+ * possibly going to create paths with these pathkeys so as to get the data
+ * sorted according to the pathkeys.
+ * Following pathkeys are of interest here.
+ * 1. query pathkeys
+ * 2. pathkeys arising out of the equivalence classes
+ * Comments in the function explain these two in details.
+ *
+ * TODO: This function requires make_canonical_pathkey() to be "extern"alized.
+ * The function is required to get pathkey corresponding to a given equivalence
+ * class if there exists one already or create one otherwise. There are two ways
+ * we can avoid "extern"alization
+ * 1. Create a wrapper extern function returning pathkey with inputs root and
+ * 		equivalence class. Adding the wrapper function in pathkeys.c would avoid
+ * 		"extern"alization of the make_canonical_pathkey().
+ * 2. Move get_useful_pathkeys_for_relation() to pathkeys.c. Considering that this
+ *    function is heuristically finding the "interesting" pathkeys, it may not
+ *    be very useful in general, and thus can not be part of the core.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	*useful_pathkeys_list = NIL; /* List of pathkeys to be returned */
+	bool	useful_query_pathkeys;
+	ListCell	*lc;
+	bool	is_child_rel = (rel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+
+	/*
+	 * Determine whether we can potentially push query pathkeys to the remote
+	 * side, avoiding a local sort.
+	 * By default assume that root->query_pathkeys is pushable.
+	 */
+	useful_query_pathkeys = true;
+	foreach(lc, root->query_pathkeys)
+	{
+		PathKey    *pathkey = (PathKey *) lfirst(lc);
+		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+		Expr	   *em_expr;
+
+		/*
+		 * is_foreign_expr would detect volatile expressions as well, but
+		 * ec_has_volatile saves some cycles.
+		 */
+		if (pathkey_ec->ec_has_volatile ||
+			!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+			!is_foreign_expr(root, rel, em_expr))
+			{
+				/*
+				 * The planner and executor don't have any clever strategy for
+				 * taking data sorted by a prefix of the query's pathkeys and
+				 * getting it to be sorted by all of those pathekeys. We'll just
+				 * end up resorting the entire data set.  So, unless we can push
+				 * down all of the query pathkeys, forget it.
+				 */
+				useful_query_pathkeys = false;
+			}
+	}
+
+	if (useful_query_pathkeys)
+		useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+
+	/*
+	 * Check equivalence classes where this relation appears. Getting data
+	 * ordered on corresponding expressions might help merge join. A join
+	 * between two relations might have multiple merge joinable clauses, thus
+	 * fetching foreign data sorted on all the expressions involved optimized
+	 * merge join locally. But we do not have knowledge about the indexes
+	 * present on the foreign server. The cost of sorting data on multiple
+	 * expressions can vary greatly based on the kind of expressions and their
+	 * presence in an index. Hence for now, we create paths with single pathkey.
+	 * Nothing to do if the relation is not part of any equivalence class.
+	 */
+	if (!rel->has_eclass_joins)
+		return useful_pathkeys_list;
+
+	foreach(lc, root->eq_classes)
+	{
+		EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+		Expr	*em_expr;
+
+		/*
+		 * Won't generate joinclauses if const, single-member or has
+		 * volatile expression.
+		 */
+		if (cur_ec->ec_has_const || cur_ec->ec_has_volatile ||
+				list_length(cur_ec->ec_members) <= 1)
+			continue;
+
+		/*
+		 * If an equivalence class covers other relations along with the given
+		 * relation, it indicates a merge joinable clause between the given
+		 * relation and other relations. Such equivalence class would make a
+		 * useful pathkey for merge join. ec_relids contains only the parent
+		 * relation and not child relations, hence do not use this test if
+		 * given relation is a child relation.
+		 */
+		if (!is_child_rel &&
+			bms_subset_compare(rel->relids, cur_ec->ec_relids) != BMS_SUBSET1)
+			continue;
+
+		/*
+		 * In case of child relation, we examine individual equivalence class
+		 * members. If there exists a member which does not include the given relation,
+		 * its parent or other siblings, the equivalence class indicates a merge
+		 * joinable clause involving given relation with some relation other
+		 * than itself, its parent or siblings.
+		 */
+		if (is_child_rel)
+		{
+			ListCell	*lc_em;
+			Relids		parent_relids = find_childrel_parents(root, rel);
+			bool		has_other_rel_joins = false;
+			foreach(lc_em, cur_ec->ec_members)
+			{
+				EquivalenceMember	*other_em = lfirst(lc_em);
+
+				/*
+				 * It's ok to ignore all the children (not just siblings) as
+				 * equivalence members belonging to their parents will appear
+				 * in the same list and will not be ignored.
+				 */
+				if (other_em->em_is_child ||
+					bms_overlap(rel->relids, other_em->em_relids) ||
+					bms_overlap(parent_relids, other_em->em_relids))
+					continue;
+
+				/* Found one "other" relation */
+				has_other_rel_joins = true;
+				break;
+			}
+
+			if (!has_other_rel_joins)
+				continue;
+		}
+
+		/*
+		 * If there exists an equivalence member which entirely belongs to
+		 * the current relation and corresponding expression is pushable to
+		 * the foreign server, create single element pathkeys for the same.
+		 * This pathkey might be part of pathkeys derived from
+		 * root->query_pathkeys, but the respective paths might have different
+		 * costs. add_path() would reject any paths which have comparable costs
+		 * but inferior pathkeys.
+		 */
+		if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) &&
+				is_foreign_expr(root, rel, em_expr))
+		{
+			Pathkey	*pathkey;
+			List	*pathkeys;
+
+			pathkey = make_canonical_pathkey(root, cur_ec,
+											linitial_oid(cur_ec->ec_opfamilies),
+											BTLessStrategyNumber,
+											false);
+			pathkeys = list_make1(pathkey);
+
+			/*
+			 * If this single member pathkeys is same as the
+			 * root->query_pathkeys, do not include it again (note: in such case
+			 * root->query_pathkeys must have been found useful above.). Since
+			 * root->eq_classes is list of distinct equivalence classes,
+			 * pathkeys generated from this list should all be distinct. Hence
+			 * we need to compare this single member pathkeys with only
+			 * root->query_pathkeys and not others that are created in this
+			 * loop.
+			 */
+			if (compare_pathkeys(pathkeys, root->query_pathkeys) != PATHKEYS_EQUAL)
+				useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+			else
+			{
+				/*
+				 * For the convenience that compare_pathkeys provides, we have
+				 * to create a list above and free it here if found duplicate.
+				 * There is no function which can compare a single pathkey with
+				 * list of pathkeys.
+				 */
+				list_free(pathkeys);
+			}
+		}
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
 static void
 postgresGetForeignPaths(PlannerInfo *root,
 						RelOptInfo *baserel,
 						Oid foreigntableid)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List		*useful_pathkeys_list = NIL;	/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
 	 * corresponds to SeqScan path of regular tables (though depending on what
 	 * baserestrict conditions we were able to send to remote, there might
 	 * actually be an indexscan happening there).  We already did all the work
 	 * to estimate cost and size of this path.
 	 */
 	path = create_foreignscan_path(root, baserel,
 								   fpinfo->rows,
 								   fpinfo->startup_cost,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach (lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List		*useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NIL));
 	}
 
 	/*
 	 * If we're not using remote estimates, stop here.  We have no way to
 	 * estimate whether any join clauses would be worth sending across, so
 	 * don't bother building parameterized paths.
 	 */
 	if (!fpinfo->use_remote_estimate)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..d13a244 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
-					RelOptInfo *baserel, List *pathkeys);
+								RelOptInfo *baserel, List *pathkeys);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
 extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo);
 
 #endif   /* POSTGRES_FDW_H */
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 87123a5..5ef6e1a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -197,12 +197,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

#13

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Bapat (#12)

Re: Getting sorted data from foreign server for merge join

On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects. Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage? Maybe it is, but if so I'd like to hear a good explanation as
to why.

For a foreign table no pathkeys are created before creating paths for
individual foreign table since there are no indexes. For a regular table,
the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging? The logic looks very similar.

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging(). The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code. I wonder if we could just
generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that. This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates. It
seems more speculative than pushing down a toplevel sort.

This patch seems rather light on test cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#13)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects. Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage? Maybe it is, but if so I'd like to hear a good explanation as
to why.

For a foreign table no pathkeys are created before creating paths for
individual foreign table since there are no indexes. For a regular table,
the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging? The logic looks very similar.

Yes. I have incorporated this change in the patch attached.

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging(). The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code.

I wonder if we could just

generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that. This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.

pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of
functions in that area are crafted to assess the usefulness of given
pathkeys which for regular tables are "speculated" from indexes on that
table. Foreign tables do not have indexes and neither we have information
about the indexes (if any) on foreign server, thus pathkeys to be assessed
are not readily available. Hence we need some other way of "speculating"
pathkeys for foreign tables. We can not just generate pathkeys because
there are infinite possible expressions on which pathkeys can be built. So,
we hunt for the expressions at various places like root_pathkeys, merge
joinable clauses etc. The seeming duplication of code is because the places
where we are hunting for useful pathkeys in case of foreign table are same
as the places used for assessing usefulness for pathkeys in case of regular
table. Thus in case of foreign tables, we always generate useful pathkeys,
which do not need any further assessment. For regular tables we have set of
pathkeys which need to be assessed. Because of this difference the code
differs in details.

right_merge_direction() compares whether a given pathkey (readily available
or derived from an index) has the same direction as required by
root->query_pathkeys or ascending direction. For foreign tables, the
pathkeys are themselves created from root->query_pathkeys, thus doesn't
require this assessment. The patch creates pathkeys other than those from
root->query_pathkeys in the ascending order (like other places in the code
eg. select_outer_pathkeys_for_merge()), which is what
right_merge_direction() assesses. So again we don't need to call
right_merge_direction().

non-EC-derivable join is interesting. Thanks for bringing it out. These are
mergejoinable clauses in an OUTER join, where columns from inner side can
be NULL, and thus not exactly equal. I will incorporate that change in the
patch. Again while doing so, we have to just pick up the right or left
equivalence class from a given RestrictInfo and don't need to assess it
further like pathkeys_useful_for_merging().

Added this change in the attached patch.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates. It
seems more speculative than pushing down a toplevel sort.

I agree with you but let me elaborate why I agree. The pathkeys we are
generating are heauristic in nature and thus may not be useful in the same
sense as pathkeys for regular tables. If use_remote_estimate is ON, the
planner will spend time in explaining multiple queries, but it will be in
better position to cost the usage. If use_remote_estimate is OFF, we will
altogether loose chances of merge join, which doesn't look good to me. But
a speculative costing done in this case can result in choosing wrong plan
similar to the same case as toplevel sort. But then choosing a wrong join
has more serious implications than estimating wrong cost for top level join.

This patch seems rather light on test cases.

Added tests. Let me know if you have any specific scenario in mind.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_sort_all_pd_v3.patchtext/x-diff; charset=US-ASCII; name=pg_sort_all_pd_v3.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..8b05fcd 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -336,20 +336,91 @@ WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4
  10 |  0 | 00010 | Sun Jan 11 00:00:00 1970 PST
 (10 rows)
 
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  ?column? | ?column? 
 ----------+----------
  fixed    | 
 (1 row)
 
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list 
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Left Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
@@ -3531,20 +3602,119 @@ select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
  bar      |  6 | 166
  bar2     |  3 | 233
  bar2     |  4 | 244
  bar2     |  7 | 177
 (6 rows)
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list 
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 20 | 20
+ 22 | 22
+ 24 | 24
+ 26 | 26
+ 28 | 28
+ 30 | 30
+ 32 | 32
+ 34 | 34
+ 36 | 36
+ 38 | 38
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Left Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 10 | 10
+ 11 |   
+ 12 | 12
+ 13 |   
+ 14 | 14
+ 15 |   
+ 16 | 16
+ 17 |   
+ 18 | 18
+ 19 |   
+(10 rows)
+
+set enable_hashjoin to true;
+set enable_nestloop to true;
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
  f1 | f2  
 ----+-----
   7 | 177
 (1 row)
 
 update bar set f2 = null where current of c;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cd4ed0c..d84ad2d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -251,20 +251,21 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -501,100 +502,240 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
+ * get_useful_pathkeys_for_relation
+ * Function collects useful pathkeys for given relation. The caller is
+ * possibly going to create paths with these pathkeys so as to get the data
+ * sorted according to the pathkeys.
+ * Following pathkeys are of interest here.
+ * 1. query pathkeys
+ * 2. pathkeys arising out of the equivalence classes
+ * Comments in the function explain these two in details.
+ *
+ * TODO: This function requires make_canonical_pathkey() to be "extern"alized.
+ * The function is required to get pathkey corresponding to a given equivalence
+ * class if there exists one already or create one otherwise. There are two ways
+ * we can avoid "extern"alization
+ * 1. Create a wrapper extern function returning pathkey with inputs root and
+ * 		equivalence class. Adding the wrapper function in pathkeys.c would avoid
+ * 		"extern"alization of the make_canonical_pathkey().
+ * 2. Move get_useful_pathkeys_for_relation() to pathkeys.c. Considering that this
+ *    function is heuristically finding the "interesting" pathkeys, it may not
+ *    be very useful in general, and thus can not be part of the core.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	*useful_pathkeys_list = NIL; /* List of pathkeys to be returned */
+	bool	useful_query_pathkeys;
+	List	*useful_eclass_list = NIL;	/* List of unique eclasses */
+	PathKey	*first_query_pathkey = NULL;
+	PgFdwRelationInfo	*fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+	ListCell	*lc;
+
+	/*
+	 * Determine whether we can potentially push query pathkeys to the remote
+	 * side, avoiding a local sort.
+	 * By default assume that root->query_pathkeys is pushable.
+	 */
+	if (root->query_pathkeys)
+	{
+		useful_query_pathkeys = true;
+		foreach(lc, root->query_pathkeys)
+		{
+			PathKey    *pathkey = (PathKey *) lfirst(lc);
+			EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+			Expr	   *em_expr;
+	
+			/*
+			 * is_foreign_expr would detect volatile expressions as well, but
+			 * ec_has_volatile saves some cycles.
+			 */
+			if (pathkey_ec->ec_has_volatile ||
+				!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+				!is_foreign_expr(root, rel, em_expr))
+				{
+					/*
+					 * The planner and executor don't have any clever strategy
+					 * for taking data sorted by a prefix of the query's
+					 * pathkeys and getting it to be sorted by all of those
+					 * pathekeys. We'll just end up resorting the entire data
+					 * set.  So, unless we can push down all of the query
+					 * pathkeys, forget it.
+					 */
+					useful_query_pathkeys = false;
+				}
+		}
+	
+		if (useful_query_pathkeys)
+			useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+	}
+
+	/*
+	 * Next we create single member pathkeys useful for merge joins. Choosing
+	 * wrong join plan based on speculative costs can cause serious problems.
+	 * Create corresponding paths only when we can estimate costs with more
+	 * precision.
+	 */
+	if (!fpinfo->use_remote_estimate)
+		return useful_pathkeys_list;
+
+	/*
+	 * Check equivalence classes where this relation appears. Getting data
+	 * ordered on corresponding expressions might help merge join. A join
+	 * between two relations might have multiple merge joinable clauses, thus
+	 * fetching foreign data sorted on all the expressions involved optimized
+	 * merge join locally. But we do not have knowledge about the indexes
+	 * present on the foreign server. The cost of sorting data on multiple
+	 * expressions can vary greatly based on the kind of expressions and their
+	 * presence in an index. Hence for now, we create paths with single pathkey.
+	 * Nothing to do if the relation is not part of any equivalence class.
+	 */
+	if (rel->has_eclass_joins)
+	{
+		foreach(lc, root->eq_classes)
+		{
+			EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+
+			if (eclass_useful_for_merging(root, cur_ec, rel))
+				useful_eclass_list = lappend(useful_eclass_list, cur_ec);
+		}
+	}
+	
+	foreach (lc, rel->joininfo)
+	{
+		Relids relids;
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Equivalence classes are marked by the parent relations and not child.
+		 * If specified rel is a child, we must consider the topmost parent rel.
+		 */
+		if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+			relids = find_childrel_top_parent(root, rel)->relids;
+		else
+			relids = rel->relids;
+
+
+		/* Consider only mergejoinable clauses */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;
+
+		update_mergeclause_eclasses(root, restrictinfo);
+		
+		/* Pick up the equivalence class corresponding to the given relation */
+		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->right_ec);
+		else
+		{
+			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->left_ec);
+		}
+	}
+
+	/*
+	 * Scan through the distinct equivalence classes collected and create
+	 * one pathkey for each of them if the equivalence member belonging to the
+	 * given relation is pushable to the foreign server.
+	 * Also check if the equivalence class is different from the equivalence
+	 * class of single member root->query_pathkeys.
+	 */
+	if (list_length(root->query_pathkeys) == 1)
+		first_query_pathkey = linitial(root->query_pathkeys);
+
+	foreach(lc, useful_eclass_list)
+	{
+		EquivalenceClass *cur_ec = lfirst(lc);
+		Expr	*em_expr;
+
+		if (first_query_pathkey &&
+			cur_ec == first_query_pathkey->pk_eclass)
+			continue;
+
+		if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) &&
+				is_foreign_expr(root, rel, em_expr))
+		{
+			PathKey	*pathkey;
+			List	*pathkeys;
+
+			pathkey = make_canonical_pathkey(root, cur_ec,
+											linitial_oid(cur_ec->ec_opfamilies),
+											BTLessStrategyNumber,
+											false);
+			pathkeys = list_make1(pathkey);
+			useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+		}
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
 static void
 postgresGetForeignPaths(PlannerInfo *root,
 						RelOptInfo *baserel,
 						Oid foreigntableid)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List		*useful_pathkeys_list = NIL;	/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
 	 * corresponds to SeqScan path of regular tables (though depending on what
 	 * baserestrict conditions we were able to send to remote, there might
 	 * actually be an indexscan happening there).  We already did all the work
 	 * to estimate cost and size of this path.
 	 */
 	path = create_foreignscan_path(root, baserel,
 								   fpinfo->rows,
 								   fpinfo->startup_cost,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach (lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List		*useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NIL));
 	}
 
 	/*
 	 * If we're not using remote estimates, stop here.  We have no way to
 	 * estimate whether any join clauses would be worth sending across, so
 	 * don't bother building parameterized paths.
 	 */
 	if (!fpinfo->use_remote_estimate)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..d13a244 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
-					RelOptInfo *baserel, List *pathkeys);
+								RelOptInfo *baserel, List *pathkeys);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
 extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo);
 
 #endif   /* POSTGRES_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 671e38c..17e0763 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list 
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
@@ -805,20 +820,50 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 
 select tableoid::regclass, * from bar order by 1,2;
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list 
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+set enable_hashjoin to true;
+set enable_nestloop to true;
+
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
 update bar set f2 = null where current of c;
 rollback;
 
 drop table foo cascade;
 drop table bar cascade;
 drop table loct1;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 87123a5..5ef6e1a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -197,12 +197,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

#15

Rushabh Lathia

rushabh.lathia@gmail.com

about 10 years ago

In reply to: Ashutosh Bapat (#14)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

Hi Ashutosh,

I reviewed your latest version of patch and over all the implementation
and other details look good to me.

Here are few cosmetic issues which I found:

1) Patch not getting applied cleanly - white space warning

-    List       *usable_pathkeys = NIL;
+    List        *useful_pathkeys_list = NIL;    /* List of all pathkeys */

Code alignment is not correct with other declared variables.

+        {
+            PathKey    *pathkey;
+            List    *pathkeys;
+
+            pathkey = make_canonical_pathkey(root, cur_ec,
+
linitial_oid(cur_ec->ec_opfamilies),
+                                            BTLessStrategyNumber,
+                                            false);
+            pathkeys = list_make1(pathkey);
+            useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+        }

Code alignment need to fix at make_canonical_pathkey().

I don't understand the meaning of following added testcase into
postgres_fdw.

+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10)
ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY
c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3,
t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class
list
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 =
t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C
1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence
class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1
= t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 =
t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;

Because, I removed the code changes of the patch and then I run the test
seem like it has nothing to do with the code changes. Above set of test
giving
same result with/without patch.

Am I missing something ?

Apart from this I debugged the patch for each scenario (query pathkeys and
pathkeys arising out of the equivalence classes) and so far patch looks good
to me.

Attaching update version of patch by fixing the cosmetic changes.

On Fri, Nov 20, 2015 at 9:14 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Although I'm usually on the side of marking things as extern whenever
we find it convenient, I'm nervous about doing that to
make_canonical_pathkey(), because it has side effects. Searching the
list of canonical pathkeys for the one we need is reasonable, but is
it really right to ever think that we might create a new one at this
stage? Maybe it is, but if so I'd like to hear a good explanation as
to why.

For a foreign table no pathkeys are created before creating paths for
individual foreign table since there are no indexes. For a regular

table,

the pathkeys get created for useful indexes.

OK.

Could some portion of the second foreach loop inside
get_useful_pathkeys_for_relation be replaced by a call to
eclass_useful_for_merging? The logic looks very similar.

Yes. I have incorporated this change in the patch attached.

More broadly, I'm wondering about the asymmetries between this code
and pathkeys_useful_for_merging(). The latter has a
right_merge_direction() test and a case for non-EC-derivable join
clauses that aren't present in your code.

I wonder if we could just

generate pathkeys speculatively and then test which ones survive
truncate_useless_pathkeys(), or something like that. This isn't an
enormously complicated patch, but it duplicates more of the logic in
pathkeys.c than I'd like.

pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of
functions in that area are crafted to assess the usefulness of given
pathkeys which for regular tables are "speculated" from indexes on that
table. Foreign tables do not have indexes and neither we have information
about the indexes (if any) on foreign server, thus pathkeys to be assessed
are not readily available. Hence we need some other way of "speculating"
pathkeys for foreign tables. We can not just generate pathkeys because
there are infinite possible expressions on which pathkeys can be built. So,
we hunt for the expressions at various places like root_pathkeys, merge
joinable clauses etc. The seeming duplication of code is because the places
where we are hunting for useful pathkeys in case of foreign table are same
as the places used for assessing usefulness for pathkeys in case of regular
table. Thus in case of foreign tables, we always generate useful pathkeys,
which do not need any further assessment. For regular tables we have set of
pathkeys which need to be assessed. Because of this difference the code
differs in details.

right_merge_direction() compares whether a given pathkey (readily
available or derived from an index) has the same direction as required by
root->query_pathkeys or ascending direction. For foreign tables, the
pathkeys are themselves created from root->query_pathkeys, thus doesn't
require this assessment. The patch creates pathkeys other than those from
root->query_pathkeys in the ascending order (like other places in the code
eg. select_outer_pathkeys_for_merge()), which is what
right_merge_direction() assesses. So again we don't need to call
right_merge_direction().

non-EC-derivable join is interesting. Thanks for bringing it out. These
are mergejoinable clauses in an OUTER join, where columns from inner side
can be NULL, and thus not exactly equal. I will incorporate that change in
the patch. Again while doing so, we have to just pick up the right or left
equivalence class from a given RestrictInfo and don't need to assess it
further like pathkeys_useful_for_merging().

Added this change in the attached patch.

I'm inclined to think that we shouldn't consider pathkeys that might
be useful for merge-joining unless we're using remote estimates. It
seems more speculative than pushing down a toplevel sort.

I agree with you but let me elaborate why I agree. The pathkeys we are
generating are heauristic in nature and thus may not be useful in the same
sense as pathkeys for regular tables. If use_remote_estimate is ON, the
planner will spend time in explaining multiple queries, but it will be in
better position to cost the usage. If use_remote_estimate is OFF, we will
altogether loose chances of merge join, which doesn't look good to me. But
a speculative costing done in this case can result in choosing wrong plan
similar to the same case as toplevel sort. But then choosing a wrong join
has more serious implications than estimating wrong cost for top level join.

This patch seems rather light on test cases.

Added tests. Let me know if you have any specific scenario in mind.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Rushabh Lathia
www.EntepriseDB.com

Attachments:

pg_sort_all_pd_v4.patchapplication/x-download; name=pg_sort_all_pd_v4.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..ec45266 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -336,20 +336,91 @@ WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4
  10 |  0 | 00010 | Sun Jan 11 00:00:00 1970 PST
 (10 rows)
 
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  ?column? | ?column? 
 ----------+----------
  fixed    | 
 (1 row)
 
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Left Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
@@ -3531,20 +3602,119 @@ select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
  bar      |  6 | 166
  bar2     |  3 | 233
  bar2     |  4 | 244
  bar2     |  7 | 177
 (6 rows)
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 20 | 20
+ 22 | 22
+ 24 | 24
+ 26 | 26
+ 28 | 28
+ 30 | 30
+ 32 | 32
+ 34 | 34
+ 36 | 36
+ 38 | 38
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Left Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 10 | 10
+ 11 |   
+ 12 | 12
+ 13 |   
+ 14 | 14
+ 15 |   
+ 16 | 16
+ 17 |   
+ 18 | 18
+ 19 |   
+(10 rows)
+
+set enable_hashjoin to true;
+set enable_nestloop to true;
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
  f1 | f2  
 ----+-----
   7 | 177
 (1 row)
 
 update bar set f2 = null where current of c;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a6ba672..adee54c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -251,20 +251,21 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -501,100 +502,240 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
+ * get_useful_pathkeys_for_relation
+ * Function collects useful pathkeys for given relation. The caller is
+ * possibly going to create paths with these pathkeys so as to get the data
+ * sorted according to the pathkeys.
+ * Following pathkeys are of interest here.
+ * 1. query pathkeys
+ * 2. pathkeys arising out of the equivalence classes
+ * Comments in the function explain these two in details.
+ *
+ * TODO: This function requires make_canonical_pathkey() to be "extern"alized.
+ * The function is required to get pathkey corresponding to a given equivalence
+ * class if there exists one already or create one otherwise. There are two ways
+ * we can avoid "extern"alization
+ * 1. Create a wrapper extern function returning pathkey with inputs root and
+ * 		equivalence class. Adding the wrapper function in pathkeys.c would avoid
+ * 		"extern"alization of the make_canonical_pathkey().
+ * 2. Move get_useful_pathkeys_for_relation() to pathkeys.c. Considering that this
+ *    function is heuristically finding the "interesting" pathkeys, it may not
+ *    be very useful in general, and thus can not be part of the core.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	*useful_pathkeys_list = NIL; /* List of pathkeys to be returned */
+	bool	useful_query_pathkeys;
+	List	*useful_eclass_list = NIL;	/* List of unique eclasses */
+	PathKey	*first_query_pathkey = NULL;
+	PgFdwRelationInfo	*fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+	ListCell	*lc;
+
+	/*
+	 * Determine whether we can potentially push query pathkeys to the remote
+	 * side, avoiding a local sort.
+	 * By default assume that root->query_pathkeys is pushable.
+	 */
+	if (root->query_pathkeys)
+	{
+		useful_query_pathkeys = true;
+		foreach(lc, root->query_pathkeys)
+		{
+			PathKey    *pathkey = (PathKey *) lfirst(lc);
+			EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+			Expr	   *em_expr;
+
+			/*
+			 * is_foreign_expr would detect volatile expressions as well, but
+			 * ec_has_volatile saves some cycles.
+			 */
+			if (pathkey_ec->ec_has_volatile ||
+				!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+				!is_foreign_expr(root, rel, em_expr))
+				{
+					/*
+					 * The planner and executor don't have any clever strategy
+					 * for taking data sorted by a prefix of the query's
+					 * pathkeys and getting it to be sorted by all of those
+					 * pathekeys. We'll just end up resorting the entire data
+					 * set.  So, unless we can push down all of the query
+					 * pathkeys, forget it.
+					 */
+					useful_query_pathkeys = false;
+				}
+		}
+
+		if (useful_query_pathkeys)
+			useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+	}
+
+	/*
+	 * Next we create single member pathkeys useful for merge joins. Choosing
+	 * wrong join plan based on speculative costs can cause serious problems.
+	 * Create corresponding paths only when we can estimate costs with more
+	 * precision.
+	 */
+	if (!fpinfo->use_remote_estimate)
+		return useful_pathkeys_list;
+
+	/*
+	 * Check equivalence classes where this relation appears. Getting data
+	 * ordered on corresponding expressions might help merge join. A join
+	 * between two relations might have multiple merge joinable clauses, thus
+	 * fetching foreign data sorted on all the expressions involved optimized
+	 * merge join locally. But we do not have knowledge about the indexes
+	 * present on the foreign server. The cost of sorting data on multiple
+	 * expressions can vary greatly based on the kind of expressions and their
+	 * presence in an index. Hence for now, we create paths with single pathkey.
+	 * Nothing to do if the relation is not part of any equivalence class.
+	 */
+	if (rel->has_eclass_joins)
+	{
+		foreach(lc, root->eq_classes)
+		{
+			EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+
+			if (eclass_useful_for_merging(root, cur_ec, rel))
+				useful_eclass_list = lappend(useful_eclass_list, cur_ec);
+		}
+	}
+
+	foreach (lc, rel->joininfo)
+	{
+		Relids relids;
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Equivalence classes are marked by the parent relations and not child.
+		 * If specified rel is a child, we must consider the topmost parent rel.
+		 */
+		if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+			relids = find_childrel_top_parent(root, rel)->relids;
+		else
+			relids = rel->relids;
+
+
+		/* Consider only mergejoinable clauses */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;
+
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		/* Pick up the equivalence class corresponding to the given relation */
+		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->right_ec);
+		else
+		{
+			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->left_ec);
+		}
+	}
+
+	/*
+	 * Scan through the distinct equivalence classes collected and create
+	 * one pathkey for each of them if the equivalence member belonging to the
+	 * given relation is pushable to the foreign server.
+	 * Also check if the equivalence class is different from the equivalence
+	 * class of single member root->query_pathkeys.
+	 */
+	if (list_length(root->query_pathkeys) == 1)
+		first_query_pathkey = linitial(root->query_pathkeys);
+
+	foreach(lc, useful_eclass_list)
+	{
+		EquivalenceClass *cur_ec = lfirst(lc);
+		Expr	*em_expr;
+
+		if (first_query_pathkey &&
+			cur_ec == first_query_pathkey->pk_eclass)
+			continue;
+
+		if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) &&
+				is_foreign_expr(root, rel, em_expr))
+		{
+			PathKey	*pathkey;
+			List	*pathkeys;
+
+			pathkey = make_canonical_pathkey(root, cur_ec,
+											 linitial_oid(cur_ec->ec_opfamilies),
+											 BTLessStrategyNumber,
+											 false);
+			pathkeys = list_make1(pathkey);
+			useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+		}
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
 static void
 postgresGetForeignPaths(PlannerInfo *root,
 						RelOptInfo *baserel,
 						Oid foreigntableid)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List	   *useful_pathkeys_list = NIL;	/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
 	 * corresponds to SeqScan path of regular tables (though depending on what
 	 * baserestrict conditions we were able to send to remote, there might
 	 * actually be an indexscan happening there).  We already did all the work
 	 * to estimate cost and size of this path.
 	 */
 	path = create_foreignscan_path(root, baserel,
 								   fpinfo->rows,
 								   fpinfo->startup_cost,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach (lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List		*useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NIL));
 	}
 
 	/*
 	 * If we're not using remote estimates, stop here.  We have no way to
 	 * estimate whether any join clauses would be worth sending across, so
 	 * don't bother building parameterized paths.
 	 */
 	if (!fpinfo->use_remote_estimate)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..d13a244 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
-					RelOptInfo *baserel, List *pathkeys);
+								RelOptInfo *baserel, List *pathkeys);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
 extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo);
 
 #endif   /* POSTGRES_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 671e38c..a844117 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
@@ -805,20 +820,50 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 
 select tableoid::regclass, * from bar order by 1,2;
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+set enable_hashjoin to true;
+set enable_nestloop to true;
+
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
 update bar set f2 = null where current of c;
 rollback;
 
 drop table foo cascade;
 drop table bar cascade;
 drop table loct1;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 87123a5..5ef6e1a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -197,12 +197,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

#16

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Rushabh Lathia (#15)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

Thanks Rushabh for your review and comments.

On Thu, Nov 26, 2015 at 5:39 PM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

Hi Ashutosh,

I reviewed your latest version of patch and over all the implementation
and other details look good to me.

Here are few cosmetic issues which I found:

1) Patch not getting applied cleanly - white space warning

Done.

2)
-    List       *usable_pathkeys = NIL;
+    List        *useful_pathkeys_list = NIL;    /* List of all pathkeys */
Code alignment is not correct with other declared variables.

Incorporated the change in the patch.

+        {
+            PathKey    *pathkey;
+            List    *pathkeys;
+
+            pathkey = make_canonical_pathkey(root, cur_ec,
+
linitial_oid(cur_ec->ec_opfamilies),
+                                            BTLessStrategyNumber,
+                                            false);
+            pathkeys = list_make1(pathkey);
+            useful_pathkeys_list = lappend(useful_pathkeys_list,
pathkeys);
+        }

Code alignment need to fix at make_canonical_pathkey().

Incorporated the change in the patch.

I have also removed the TODO item in the prologue of this function, since
none has objected to externalization of make_canonical_pathkeys till now
and it's not expected to be part of the final commit.

I don't understand the meaning of following added testcase into
postgres_fdw.

+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
-- join two tables
SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
t1.c1 OFFSET 100 LIMIT 10;
-- subquery
SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <=
10) ORDER BY c1;
-- subquery+MAX
SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY
c1;
-- used in CTE
WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3,
t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
-- fixed values
SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class
list
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 =
t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C
1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence
class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1
= t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 =
t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;

Because, I removed the code changes of the patch and then I run the test
seem like it has nothing to do with the code changes. Above set of test
giving
same result with/without patch.

Am I missing something ?

Actually, the output of merge join is always ordered by the pathkeys used
for merge join. That routed through LIMIT node remains ordered. So, we
actually do not need ORDER BY t1.c1 clause in the above queries. Without
that clause, the tests will show difference output with and without patch.
I have changed the attached patch accordingly.

Apart from this I debugged the patch for each scenario (query pathkeys and
pathkeys arising out of the equivalence classes) and so far patch looks
good
to me.

Thanks.

Attaching update version of patch by fixing the cosmetic changes.

Attached version of patch contains your changes.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_sort_all_pd_v5.patchtext/x-diff; charset=US-ASCII; name=pg_sort_all_pd_v5.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..a9b0bd8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -336,20 +336,93 @@ WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4
  10 |  0 | 00010 | Sun Jan 11 00:00:00 1970 PST
 (10 rows)
 
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  ?column? | ?column? 
 ----------+----------
  fixed    | 
 (1 row)
 
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+-- Merge join always produces the result in the sorted order, so no need for
+-- separate ORDER BY clause to get ordered results.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Left Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
@@ -3531,20 +3604,119 @@ select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
  bar      |  6 | 166
  bar2     |  3 | 233
  bar2     |  4 | 244
  bar2     |  7 | 177
 (6 rows)
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 20 | 20
+ 22 | 22
+ 24 | 24
+ 26 | 26
+ 28 | 28
+ 30 | 30
+ 32 | 32
+ 34 | 34
+ 36 | 36
+ 38 | 38
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Left Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 10 | 10
+ 11 |   
+ 12 | 12
+ 13 |   
+ 14 | 14
+ 15 |   
+ 16 | 16
+ 17 |   
+ 18 | 18
+ 19 |   
+(10 rows)
+
+set enable_hashjoin to true;
+set enable_nestloop to true;
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
  f1 | f2  
 ----+-----
   7 | 177
 (1 row)
 
 update bar set f2 = null where current of c;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cd4ed0c..88dcb22 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -251,20 +251,21 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -501,100 +502,234 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
+ * get_useful_pathkeys_for_relation
+ * Function collects useful pathkeys for given relation. The caller is
+ * possibly going to create paths with these pathkeys so as to get the data
+ * sorted according to the pathkeys.
+ * Following pathkeys are of interest here.
+ * 1. query pathkeys
+ * 2. pathkeys arising out of the equivalence classes
+ * Comments in the function explain these two in details.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	*useful_pathkeys_list = NIL; /* List of pathkeys to be returned */
+	bool	useful_query_pathkeys;
+	List	*useful_eclass_list = NIL;	/* List of unique eclasses */
+	PathKey	*first_query_pathkey = NULL;
+	PgFdwRelationInfo	*fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+	ListCell	*lc;
+
+	/*
+	 * Determine whether we can potentially push query pathkeys to the remote
+	 * side, avoiding a local sort.
+	 */
+	if (root->query_pathkeys)
+	{
+		/* By default assume that root->query_pathkeys is pushable. */
+		useful_query_pathkeys = true;
+		foreach(lc, root->query_pathkeys)
+		{
+			PathKey    *pathkey = (PathKey *) lfirst(lc);
+			EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+			Expr	   *em_expr;
+
+			/*
+			 * is_foreign_expr would detect volatile expressions as well, but
+			 * ec_has_volatile saves some cycles.
+			 */
+			if (pathkey_ec->ec_has_volatile ||
+				!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+				!is_foreign_expr(root, rel, em_expr))
+				{
+					/*
+					 * The planner and executor don't have any clever strategy
+					 * for taking data sorted by a prefix of the query's
+					 * pathkeys and getting it to be sorted by all of those
+					 * pathekeys. We'll just end up resorting the entire data
+					 * set.  So, unless we can push down all of the query
+					 * pathkeys, forget it.
+					 */
+					useful_query_pathkeys = false;
+				}
+		}
+
+		if (useful_query_pathkeys)
+			useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+	}
+
+	/*
+	 * Next we create single member pathkeys useful for merge joins. Choosing
+	 * wrong join plan based on speculative costs can cause serious problems.
+	 * Create corresponding paths only when we can estimate costs with more
+	 * precision i.e. get those from foreign server.
+	 */
+	if (!fpinfo->use_remote_estimate)
+		return useful_pathkeys_list;
+
+	/*
+	 * Check equivalence classes where this relation appears. Getting data
+	 * ordered on corresponding expressions might help merge join. A join
+	 * between two relations might have multiple merge joinable clauses, thus
+	 * fetching foreign data sorted on all the expressions involved optimizes
+	 * merge join locally. But we do not have knowledge about the indexes
+	 * present on the foreign server. The cost of sorting data on multiple
+	 * expressions can vary greatly based on the kind of expressions and their
+	 * presence in an index. Hence for now, we create paths with single pathkey.
+	 * Nothing to do if the relation is not part of any equivalence class.
+	 */
+	if (rel->has_eclass_joins)
+	{
+		foreach(lc, root->eq_classes)
+		{
+			EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+
+			if (eclass_useful_for_merging(root, cur_ec, rel))
+				useful_eclass_list = lappend(useful_eclass_list, cur_ec);
+		}
+	}
+
+	foreach (lc, rel->joininfo)
+	{
+		Relids relids;
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Equivalence classes are marked by the parent relations and not child.
+		 * If specified rel is a child, we must consider the topmost parent rel.
+		 */
+		if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+			relids = find_childrel_top_parent(root, rel)->relids;
+		else
+			relids = rel->relids;
+
+
+		/* Consider only mergejoinable clauses */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;
+
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		/* Pick up the equivalence class corresponding to the given relation */
+		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->right_ec);
+		else
+		{
+			/*
+			 * One of left or right should belong correspond to the relation we
+			 * are looking for.
+			 */
+			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->left_ec);
+		}
+	}
+
+	/*
+	 * Scan through the distinct equivalence classes collected and create
+	 * one pathkey for each of them if the equivalence member belonging to the
+	 * given relation is pushable to the foreign server.
+	 * Also check if the equivalence class is different from the equivalence
+	 * class of single member root->query_pathkeys, in which case, we already
+	 * have already considered that pathkey.
+	 */
+	if (list_length(root->query_pathkeys) == 1)
+		first_query_pathkey = linitial(root->query_pathkeys);
+
+	foreach(lc, useful_eclass_list)
+	{
+		EquivalenceClass *cur_ec = lfirst(lc);
+		Expr	*em_expr;
+
+		if (first_query_pathkey &&
+			cur_ec == first_query_pathkey->pk_eclass)
+			continue;
+
+		if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) &&
+				is_foreign_expr(root, rel, em_expr))
+		{
+			PathKey	*pathkey;
+			List	*pathkeys;
+
+			pathkey = make_canonical_pathkey(root, cur_ec,
+											 linitial_oid(cur_ec->ec_opfamilies),
+											 BTLessStrategyNumber,
+											 false);
+			pathkeys = list_make1(pathkey);
+			useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+		}
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
 static void
 postgresGetForeignPaths(PlannerInfo *root,
 						RelOptInfo *baserel,
 						Oid foreigntableid)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List	   *useful_pathkeys_list = NIL;	/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
 	 * corresponds to SeqScan path of regular tables (though depending on what
 	 * baserestrict conditions we were able to send to remote, there might
 	 * actually be an indexscan happening there).  We already did all the work
 	 * to estimate cost and size of this path.
 	 */
 	path = create_foreignscan_path(root, baserel,
 								   fpinfo->rows,
 								   fpinfo->startup_cost,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach (lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List		*useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NIL));
 	}
 
 	/*
 	 * If we're not using remote estimates, stop here.  We have no way to
 	 * estimate whether any join clauses would be worth sending across, so
 	 * don't bother building parameterized paths.
 	 */
 	if (!fpinfo->use_remote_estimate)
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..d13a244 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
-					RelOptInfo *baserel, List *pathkeys);
+								RelOptInfo *baserel, List *pathkeys);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
 extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo);
 
 #endif   /* POSTGRES_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 671e38c..49e48a4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,37 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+-- Merge join always produces the result in the sorted order, so no need for
+-- separate ORDER BY clause to get ordered results.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
@@ -805,20 +822,50 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 
 select tableoid::regclass, * from bar order by 1,2;
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+set enable_hashjoin to true;
+set enable_nestloop to true;
+
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
 update bar set f2 = null where current of c;
 rollback;
 
 drop table foo cascade;
 drop table bar cascade;
 drop table loct1;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 87123a5..5ef6e1a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -197,12 +197,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

#17

Rushabh Lathia

rushabh.lathia@gmail.com

about 10 years ago

In reply to: Ashutosh Bapat (#16)

Re: Getting sorted data from foreign server for merge join

Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.

On Fri, Nov 27, 2015 at 3:02 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Thanks Rushabh for your review and comments.

On Thu, Nov 26, 2015 at 5:39 PM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

Hi Ashutosh,

I reviewed your latest version of patch and over all the implementation
and other details look good to me.

Here are few cosmetic issues which I found:

1) Patch not getting applied cleanly - white space warning

Done.
2)
-    List       *usable_pathkeys = NIL;
+    List        *useful_pathkeys_list = NIL;    /* List of all pathkeys
*/
Code alignment is not correct with other declared variables.
Incorporated the change in the patch.

3)
+        {
+            PathKey    *pathkey;
+            List    *pathkeys;
+
+            pathkey = make_canonical_pathkey(root, cur_ec,
+
linitial_oid(cur_ec->ec_opfamilies),
+                                            BTLessStrategyNumber,
+                                            false);
+            pathkeys = list_make1(pathkey);
+            useful_pathkeys_list = lappend(useful_pathkeys_list,
pathkeys);
+        }
Code alignment need to fix at make_canonical_pathkey().
Incorporated the change in the patch.

I have also removed the TODO item in the prologue of this function, since
none has objected to externalization of make_canonical_pathkeys till now
and it's not expected to be part of the final commit.
4)

I don't understand the meaning of following added testcase into
postgres_fdw.
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
-- join two tables
SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
t1.c1 OFFSET 100 LIMIT 10;
-- subquery
SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <=
10) ORDER BY c1;
-- subquery+MAX
SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY
c1;
-- used in CTE
WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2,
t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
-- fixed values
SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence
class list
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 =
t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C
1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence
class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+    SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON
(t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 =
t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
Because, I removed the code changes of the patch and then I run the test
seem like it has nothing to do with the code changes. Above set of test
giving
same result with/without patch.

Am I missing something ?
Actually, the output of merge join is always ordered by the pathkeys used
for merge join. That routed through LIMIT node remains ordered. So, we
actually do not need ORDER BY t1.c1 clause in the above queries. Without
that clause, the tests will show difference output with and without patch.
I have changed the attached patch accordingly.

Apart from this I debugged the patch for each scenario (query pathkeys and
pathkeys arising out of the equivalence classes) and so far patch looks
good
to me.

Thanks.

Attaching update version of patch by fixing the cosmetic changes.

Attached version of patch contains your changes.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Rushabh Lathia

#18

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Rushabh Lathia (#17)

Re: Getting sorted data from foreign server for merge join

On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.

This patch needs a rebase.

It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed. And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).

Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea. Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny. I'd like
to see some sample data and some sample queries that get appreciably
faster with this code. If we can't find any, we don't need the code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#18)

3 attachment(s)

Re: Getting sorted data from foreign server for merge join

On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.

This patch needs a rebase.

Done.

It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed. And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).

The TODO was present in v4 but not in v5 and is not present in v6 attached
here.. Formatted comment according estimate_path_cost_size(),
convert_prep_stmt_params().

Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea. Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny. I'd like
to see some sample data and some sample queries that get appreciably
faster with this code. If we can't find any, we don't need the code.

I tested the patch on my laptop with two types of queries, a join between
two foreign tables on different foreign servers (pointing to the same self
server) and a join between one foreign and one local table. The foreign
tables and servers are created using sort_pd_setup.sql attached. Foreign
tables pointed to table with index useful for join clause. Both the joining
tables had 10M rows. The execution time of query was measured for 100 runs
and average and standard deviation were calculated (using function
query_execution_stats() in script sort_pd.sql) and are presented below.

1. Query between foreign tables
SELECT ft1.val, ft2.val FROM ft1 join ft2 on (ft1.val = ft2.val)

Plan and timings without patch
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=508510.02..1129945.94 rows=999995 width=8) (actual
time=33803.826..82416.342 rows=10000000 loops=1)
Output: ft1.val, ft2.val
Hash Cond: (ft1.val = ft2.val)
-> Foreign Scan on public.ft1 (cost=100.00..344347.31 rows=9999977
width=4) (actual time=0.624..28531.803 rows=10000000 loops=1)
Output: ft1.val
Remote SQL: SELECT val FROM public.lt
-> Hash (cost=344347.31..344347.31 rows=9999977 width=4) (actual
time=33258.025..33258.025 rows=10000000 loops=1)
Output: ft2.val
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
-> Foreign Scan on public.ft2 (cost=100.00..344347.31
rows=9999977 width=4) (actual time=22.171..28134.970 rows=10000000 loops=1)
Output: ft2.val
Remote SQL: SELECT val FROM public.lt
Planning time: 33.155 ms
Execution time: 82914.607 ms
(14 rows)

avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
78750.95487 | 2911.51825687913 | 74314.886 | 89358.464

Plan and timing with patch
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=200.86..1183070.86 rows=10000000 width=8) (actual
time=1.776..73140.219 rows=10000000 loops=1)
Output: ft1.val, ft2.val
Merge Cond: (ft1.val = ft2.val)
-> Foreign Scan on public.ft1 (cost=100.43..504035.43 rows=10000000
width=4) (actual time=0.937..30422.457 rows=10000000 loops=1)
Output: ft1.val, ft1.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
-> Materialize (cost=100.43..529035.43 rows=10000000 width=4) (actual
time=0.826..33448.822 rows=10000000 loops=1)
Output: ft2.val, ft2.val2
-> Foreign Scan on public.ft2 (cost=100.43..504035.43
rows=10000000 width=4) (actual time=0.818..31035.362 rows=10000000 loops=1)
Output: ft2.val, ft2.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
Planning time: 163.161 ms
Execution time: 73654.106 ms
(13 rows)

avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
71881.15916 | 819.091605498189 | 70197.312 | 74653.314

It can be observed that the with the patch, merge join strategy is used
instead of hash join and the execution time reduces by approx 9%. A desired
effect is that the deviation in the execution time has reduced heavily
(almost by 75%).

2. Join between local and foreign table

Without patch the plan and timings are
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=308410.66..1019846.69 rows=9999970 width=8) (actual
time=7674.681..47767.136 rows=10000000 loops=1)
Output: lt.val, ft1.val
Hash Cond: (ft1.val = lt.val)
-> Foreign Scan on public.ft1 (cost=100.00..344347.55 rows=9999985
width=4) (actual time=0.506..26679.980 rows=10000000 loops=1)
Output: ft1.val
Remote SQL: SELECT val FROM public.lt
-> Hash (cost=144247.85..144247.85 rows=9999985 width=4) (actual
time=7667.598..7667.598 rows=10000000 loops=1)
Output: lt.val
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
-> Seq Scan on public.lt (cost=0.00..144247.85 rows=9999985
width=4) (actual time=0.018..2959.111 rows=10000000 loops=1)
Output: lt.val
Planning time: 8.668 ms
Execution time: 48209.365 ms
(13 rows)

SELECT avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
FROM query_execution_stats(:'query', :num_samples);
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
47246.46956 | 2579.42041949119 | 43603.411 | 56096.759

With the patch the plan and timings are
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=155.01..957924.85 rows=9999970 width=8) (actual
time=0.592..45125.356 rows=10000000 loops=1)
Output: lt.val, ft1.val
Merge Cond: (ft1.val = lt.val)
-> Foreign Scan on public.ft1 (cost=100.43..504038.91 rows=9999985
width=4) (actual time=0.551..30526.048 rows=10000000 loops=1)
Output: ft1.val, ft1.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
-> Index Only Scan using i_lt_val on public.lt (cost=0.43..303939.21
rows=9999985 width=4) (actual time=0.032..6192.406 rows=10000000 loops=1)
Output: lt.val
Heap Fetches: 10000000
Planning time: 9.043 ms
Execution time: 45666.023 ms
(11 rows)

avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
42803.36105 | 166.874491432755 | 42321.314 | 43316.902

Again observe that with the patch, merge join is used instead of hash join
and timing reduces by approx 9%. Again the deviation in execution reduces
heavily (almost by 75%). There is increase in planning time with the patch
owing to firing EXPLAIN on the foreign server.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

sort_pd_setup.sqltext/x-sql; charset=US-ASCII; name=sort_pd_setup.sqlDownload

pg_sort_all_pd_v6.patchbinary/octet-stream; name=pg_sort_all_pd_v6.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..a9b0bd8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -336,20 +336,93 @@ WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4
  10 |  0 | 00010 | Sun Jan 11 00:00:00 1970 PST
 (10 rows)
 
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  ?column? | ?column? 
 ----------+----------
  fixed    | 
 (1 row)
 
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+-- Merge join always produces the result in the sorted order, so no need for
+-- separate ORDER BY clause to get ordered results.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Left Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
@@ -3531,20 +3604,119 @@ select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
  bar      |  6 | 166
  bar2     |  3 | 233
  bar2     |  4 | 244
  bar2     |  7 | 177
 (6 rows)
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 20 | 20
+ 22 | 22
+ 24 | 24
+ 26 | 26
+ 28 | 28
+ 30 | 30
+ 32 | 32
+ 34 | 34
+ 36 | 36
+ 38 | 38
+(10 rows)
+
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Left Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 10 | 10
+ 11 |   
+ 12 | 12
+ 13 |   
+ 14 | 14
+ 15 |   
+ 16 | 16
+ 17 |   
+ 18 | 18
+ 19 |   
+(10 rows)
+
+set enable_hashjoin to true;
+set enable_nestloop to true;
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
  f1 | f2  
 ----+-----
   7 | 177
 (1 row)
 
 update bar set f2 = null where current of c;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9a014d4..78cb09f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -252,20 +252,21 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -502,101 +503,237 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
+ * get_useful_pathkeys_for_relation
+ *		Collect useful pathkeys for given relation.
+ *
+ * The caller is possibly going to create paths with these pathkeys so as to
+ * get the data sorted according to the pathkeys from the foreign server.
+ *
+ * Following pathkeys are of interest
+ * 1. query pathkeys
+ * 2. pathkeys arising out of the equivalence classes
+ * Comments in the function explain these two in details.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	*useful_pathkeys_list = NIL; /* List of pathkeys to be returned */
+	bool	useful_query_pathkeys;
+	List	*useful_eclass_list = NIL;	/* List of unique eclasses */
+	PathKey	*first_query_pathkey = NULL;
+	PgFdwRelationInfo	*fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+	ListCell	*lc;
+
+	/*
+	 * Determine whether we can potentially push query pathkeys to the remote
+	 * side, avoiding a local sort.
+	 */
+	if (root->query_pathkeys)
+	{
+		/* By default assume that root->query_pathkeys is pushable. */
+		useful_query_pathkeys = true;
+		foreach(lc, root->query_pathkeys)
+		{
+			PathKey    *pathkey = (PathKey *) lfirst(lc);
+			EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+			Expr	   *em_expr;
+
+			/*
+			 * is_foreign_expr would detect volatile expressions as well, but
+			 * ec_has_volatile saves some cycles.
+			 */
+			if (pathkey_ec->ec_has_volatile ||
+				!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+				!is_foreign_expr(root, rel, em_expr))
+				{
+					/*
+					 * The planner and executor don't have any clever strategy
+					 * for taking data sorted by a prefix of the query's
+					 * pathkeys and getting it to be sorted by all of those
+					 * pathekeys. We'll just end up resorting the entire data
+					 * set.  So, unless we can push down all of the query
+					 * pathkeys, forget it.
+					 */
+					useful_query_pathkeys = false;
+				}
+		}
+
+		if (useful_query_pathkeys)
+			useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+	}
+
+	/*
+	 * Next we create single member pathkeys useful for merge joins. Choosing
+	 * wrong join plan based on speculative costs can cause serious problems.
+	 * Create corresponding paths only when we can estimate costs with more
+	 * precision i.e. get those from foreign server.
+	 */
+	if (!fpinfo->use_remote_estimate)
+		return useful_pathkeys_list;
+
+	/*
+	 * Check equivalence classes where this relation appears. Getting data
+	 * ordered on corresponding expressions might help merge join. A join
+	 * between two relations might have multiple merge joinable clauses, thus
+	 * fetching foreign data sorted on all the expressions involved optimizes
+	 * merge join locally. But we do not have knowledge about the indexes
+	 * present on the foreign server. The cost of sorting data on multiple
+	 * expressions can vary greatly based on the kind of expressions and their
+	 * presence in an index. Hence for now, we create paths with single pathkey.
+	 * Nothing to do if the relation is not part of any equivalence class.
+	 */
+	if (rel->has_eclass_joins)
+	{
+		foreach(lc, root->eq_classes)
+		{
+			EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+
+			if (eclass_useful_for_merging(root, cur_ec, rel))
+				useful_eclass_list = lappend(useful_eclass_list, cur_ec);
+		}
+	}
+
+	foreach (lc, rel->joininfo)
+	{
+		Relids relids;
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Equivalence classes are marked by the parent relations and not child.
+		 * If specified rel is a child, we must consider the topmost parent rel.
+		 */
+		if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+			relids = find_childrel_top_parent(root, rel)->relids;
+		else
+			relids = rel->relids;
+
+
+		/* Consider only mergejoinable clauses */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;
+
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		/* Pick up the equivalence class corresponding to the given relation */
+		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->right_ec);
+		else
+		{
+			/*
+			 * One of left or right should belong correspond to the relation we
+			 * are looking for.
+			 */
+			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+														restrictinfo->left_ec);
+		}
+	}
+
+	/*
+	 * Scan through the distinct equivalence classes collected and create
+	 * one pathkey for each of them if the equivalence member belonging to the
+	 * given relation is pushable to the foreign server.
+	 * Also check if the equivalence class is different from the equivalence
+	 * class of single member root->query_pathkeys, in which case, we already
+	 * have already considered that pathkey.
+	 */
+	if (list_length(root->query_pathkeys) == 1)
+		first_query_pathkey = linitial(root->query_pathkeys);
+
+	foreach(lc, useful_eclass_list)
+	{
+		EquivalenceClass *cur_ec = lfirst(lc);
+		Expr	*em_expr;
+
+		if (first_query_pathkey &&
+			cur_ec == first_query_pathkey->pk_eclass)
+			continue;
+
+		if ((em_expr = find_em_expr_for_rel(cur_ec, rel)) &&
+				is_foreign_expr(root, rel, em_expr))
+		{
+			PathKey	*pathkey;
+			List	*pathkeys;
+
+			pathkey = make_canonical_pathkey(root, cur_ec,
+											 linitial_oid(cur_ec->ec_opfamilies),
+											 BTLessStrategyNumber,
+											 false);
+			pathkeys = list_make1(pathkey);
+			useful_pathkeys_list = lappend(useful_pathkeys_list, pathkeys);
+		}
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
 static void
 postgresGetForeignPaths(PlannerInfo *root,
 						RelOptInfo *baserel,
 						Oid foreigntableid)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List	   *useful_pathkeys_list = NIL;	/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
 	 * corresponds to SeqScan path of regular tables (though depending on what
 	 * baserestrict conditions we were able to send to remote, there might
 	 * actually be an indexscan happening there).  We already did all the work
 	 * to estimate cost and size of this path.
 	 */
 	path = create_foreignscan_path(root, baserel,
 								   fpinfo->rows,
 								   fpinfo->startup_cost,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
 								   NULL,		/* no extra plan */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach (lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List		*useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NULL,
 										 NIL));
 	}
 
 	/*
 	 * If we're not using remote estimates, stop here.  We have no way to
 	 * estimate whether any join clauses would be worth sending across, so
 	 * don't bother building parameterized paths.
 	 */
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..d13a244 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -106,17 +106,17 @@ extern void deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 extern void deparseDeleteSql(StringInfo buf, PlannerInfo *root,
 				 Index rtindex, Relation rel,
 				 List *returningList,
 				 List **retrieved_attrs);
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 				  List **retrieved_attrs);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern void appendOrderByClause(StringInfo buf, PlannerInfo *root,
-					RelOptInfo *baserel, List *pathkeys);
+								RelOptInfo *baserel, List *pathkeys);
 
 /* in shippable.c */
 extern bool is_builtin(Oid objectId);
 extern bool is_shippable(Oid objectId, Oid classId, PgFdwRelationInfo *fpinfo);
 
 #endif   /* POSTGRES_FDW_H */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 671e38c..49e48a4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,37 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- getting data sorted from the foreign table for merge join
+-- Since we are interested in merge join, disable other joins
+-- Merge join always produces the result in the sorted order, so no need for
+-- separate ORDER BY clause to get ordered results.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join, expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SET enable_hashjoin TO true;
+SET enable_nestloop TO true;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
@@ -805,20 +822,50 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 
 select tableoid::regclass, * from bar order by 1,2;
 
+-- Tests for sort pushdown for merge joins for foreign children.
+-- Fill enough data in the tables, so that getting data sorted from the foreign
+-- server is cheaper than sorting it locally.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+set enable_hashjoin to false;
+set enable_nestloop to false;
+-- Sort pushdown for merge join is enabled only when use_remote_estimate is
+-- true.
+alter foreign table foo2 options (use_remote_estimate 'true');
+-- Create indexes so that sorted joining relations cost less.
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+-- outer join, expression in the clauses do not appear in equivalence class list
+-- but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+set enable_hashjoin to true;
+set enable_nestloop to true;
+
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
 update bar set f2 = null where current of c;
 rollback;
 
 drop table foo cascade;
 drop table bar cascade;
 drop table loct1;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7757741..4e00e9f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -199,12 +199,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

sort_pd.sqltext/x-sql; charset=US-ASCII; name=sort_pd.sqlDownload

#20

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Bapat (#19)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

On Thu, Dec 17, 2015 at 3:32 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.

This patch needs a rebase.

Done.

Thanks.

It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed. And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).

The TODO was present in v4 but not in v5 and is not present in v6 attached
here.. Formatted comment according estimate_path_cost_size(),
convert_prep_stmt_params().

Hrm, I must have been looking at the wrong version somehow. Sorry about that.

Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea. Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny. I'd like
to see some sample data and some sample queries that get appreciably
faster with this code. If we can't find any, we don't need the code.

I tested the patch on my laptop with two types of queries, a join between
two foreign tables on different foreign servers (pointing to the same self
server) and a join between one foreign and one local table. The foreign
tables and servers are created using sort_pd_setup.sql attached. Foreign
tables pointed to table with index useful for join clause. Both the joining
tables had 10M rows. The execution time of query was measured for 100 runs
and average and standard deviation were calculated (using function
query_execution_stats() in script sort_pd.sql) and are presented below.

OK, cool.

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

order-for-merge-pushdown-rmh.patchtext/x-patch; charset=US-ASCII; name=order-for-merge-pushdown-rmh.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..f10752d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -343,6 +343,76 @@ SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  fixed    | 
 (1 row)
 
+-- Test forcing the remote server to produce sorted data for a merge join.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join; expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- outer join; expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Left Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
@@ -3538,6 +3608,101 @@ select tableoid::regclass, * from bar order by 1,2;
  bar2     |  7 | 177
 (6 rows)
 
+-- Test forcing the remote server to produce sorted data for a merge join,
+-- but the foreign table is an inheritance child.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+SET enable_hashjoin to false;
+SET enable_nestloop to false;
+alter foreign table foo2 options (use_remote_estimate 'true');
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 20 | 20
+ 22 | 22
+ 24 | 24
+ 26 | 26
+ 28 | 28
+ 30 | 30
+ 32 | 32
+ 34 | 34
+ 36 | 36
+ 38 | 38
+(10 rows)
+
+-- outer join, expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Left Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 10 | 10
+ 11 |   
+ 12 | 12
+ 13 |   
+ 14 | 14
+ 15 |   
+ 16 | 16
+ 17 |   
+ 18 | 18
+ 19 |   
+(10 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9a014d4..d52f6ce 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -259,6 +259,8 @@ static bool postgresAnalyzeForeignTable(Relation relation,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root,
+								 RelOptInfo *rel);
 
 /*
  * Helper functions
@@ -509,6 +511,197 @@ postgresGetForeignRelSize(PlannerInfo *root,
 }
 
 /*
+ * get_useful_ecs_for_relation
+ *		Determine which EquivalenceClasses might be involved in useful
+ *		orderings of this relation.
+ *
+ * This function is in some respects a mirror image of the core function
+ * pathkeys_useful_for_merging: for a regular table, we know what indexes
+ * we have and want to test whether any of them are useful.  For a foreign
+ * table, we don't know what indexes are present on the remote side but
+ * want to speculate about which ones we'd like to use if they existed.
+ */
+static List *
+get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	   *useful_eclass_list = NIL;
+	ListCell   *lc;
+	Relids		relids;
+
+	/*
+	 * First, consider whether any each active EC is potentially useful for a
+	 * merge join against this relation.
+	 */
+	if (rel->has_eclass_joins)
+	{
+		foreach(lc, root->eq_classes)
+		{
+			EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+
+			if (eclass_useful_for_merging(root, cur_ec, rel))
+				useful_eclass_list = lappend(useful_eclass_list, cur_ec);
+		}
+	}
+
+	/*
+	 * Next, consider whether there are any non-EC derivable join clauses that
+	 * are merge-joinable.  If the joininfo list is empty, we can exit
+	 * quickly.
+	 */
+	if (rel->joininfo == NIL)
+		return useful_eclass_list;
+
+	/* If this is a child rel, we must use the topmost parent rel to search. */
+	if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+		relids = find_childrel_top_parent(root, rel)->relids;
+	else
+		relids = rel->relids;
+
+	/* Check each join clause in turn. */
+	foreach(lc, rel->joininfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/* Consider only mergejoinable clauses */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;
+
+		/* Make sure we've got canonical ECs. */
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		/*
+		 * restrictinfo->mergeopfamilies != NIL is sufficient to guarantee
+		 * that left_ec and right_ec will be initialized, per comments in
+		 * distribute_qual_to_rels, and rel->joininfo should only contain ECs
+		 * where this relation appears on one side or the other.
+		 */
+		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+													 restrictinfo->right_ec);
+		else
+		{
+			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+													  restrictinfo->left_ec);
+		}
+	}
+
+	return useful_eclass_list;
+}
+
+/*
+ * get_useful_pathkeys_for_relation
+ *		Determine which orderings of a relation might be useful.
+ *
+ * Getting data in sorted order can be useful either because the requested
+ * order matches the final output ordering for the overall query we're
+ * planning, or because it enables an efficient merge join.  Here, we try
+ * to figure out which pathkeys to consider.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	   *useful_pathkeys_list = NIL;
+	List	   *useful_eclass_list;
+	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+	EquivalenceClass *query_ec = NULL;
+	ListCell   *lc;
+
+	/*
+	 * Pushing the query_pathkeys to the remote server is always worth
+	 * considering, because it might let us avoid a local sort.
+	 */
+	if (root->query_pathkeys)
+	{
+		bool		query_pathkeys_ok = true;
+
+		foreach(lc, root->query_pathkeys)
+		{
+			PathKey    *pathkey = (PathKey *) lfirst(lc);
+			EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+			Expr	   *em_expr;
+
+			/*
+			 * The planner and executor don't have any clever strategy for
+			 * taking data sorted by a prefix of the query's pathkeys and
+			 * getting it to be sorted by all of those pathkeys. We'll just
+			 * end up resorting the entire data set.  So, unless we can push
+			 * down all of the query pathkeys, forget it.
+			 *
+			 * is_foreign_expr would detect volatile expressions as well, but
+			 * checking ec_has_volatile here saves some cycles.
+			 */
+			if (pathkey_ec->ec_has_volatile ||
+				!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+				!is_foreign_expr(root, rel, em_expr))
+			{
+				query_pathkeys_ok = false;
+				break;
+			}
+		}
+
+		if (query_pathkeys_ok)
+			useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+	}
+
+	/*
+	 * Even if we're not using remote estimates, having the remote side due
+	 * the sort generally won't be any worse than doing it locally, and it
+	 * might be much better if the remote side can generate data in the right
+	 * order without needing a sort at all.  However, what we're going to do
+	 * next is try to generate pathkeys that seem promising for possible merge
+	 * joins, and that's more speculative.  A wrong choice might hurt quite a
+	 * bit, so bail out if we can't use remote estimates.
+	 */
+	if (!fpinfo->use_remote_estimate)
+		return useful_pathkeys_list;
+
+	/* Get the list of interesting EquivalenceClasses. */
+	useful_eclass_list = get_useful_ecs_for_relation(root, rel);
+
+	/* Extract unique EC for query, if any, so we don't consider it again. */
+	if (list_length(root->query_pathkeys) == 1)
+	{
+		PathKey    *query_pathkey = linitial(root->query_pathkeys);
+
+		query_ec = query_pathkey->pk_eclass;
+	}
+
+	/*
+	 * As a heuristic, the only pathkeys we consider here are those of length
+	 * one.  It's surely possible to consider more, but since each one we
+	 * choose to consider will generate a round-trip to the remote side, we
+	 * need to be a bit cautious here.  It would sure be nice to have a local
+	 * cache of information about remote index definitions...
+	 */
+	foreach(lc, useful_eclass_list)
+	{
+		EquivalenceClass *cur_ec = lfirst(lc);
+		Expr	   *em_expr;
+		PathKey    *pathkey;
+
+		/* If redundant with what we did above, skip it. */
+		if (cur_ec == query_ec)
+			continue;
+
+		/* If no pushable expression for this rel, skip it. */
+		em_expr = find_em_expr_for_rel(cur_ec, rel);
+		if (em_expr == NULL || !is_foreign_expr(root, rel, em_expr))
+			continue;
+
+		/* Looks like we can generate a pathkey, so let's do it. */
+		pathkey = make_canonical_pathkey(root, cur_ec,
+										 linitial_oid(cur_ec->ec_opfamilies),
+										 BTLessStrategyNumber,
+										 false);
+		useful_pathkeys_list = lappend(useful_pathkeys_list,
+									   list_make1(pathkey));
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
@@ -521,7 +714,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List	   *useful_pathkeys_list = NIL;		/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
@@ -540,48 +733,18 @@ postgresGetForeignPaths(PlannerInfo *root,
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach(lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List	   *useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
@@ -589,7 +752,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NULL,
 										 NIL));
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 671e38c..b903034 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -178,6 +178,20 @@ SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- Test forcing the remote server to produce sorted data for a merge join.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join; expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+-- outer join; expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+RESET enable_hashjoin;
+RESET enable_nestloop;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
@@ -812,6 +826,32 @@ where bar.f1 = ss.f1;
 
 select tableoid::regclass, * from bar order by 1,2;
 
+-- Test forcing the remote server to produce sorted data for a merge join,
+-- but the foreign table is an inheritance child.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+SET enable_hashjoin to false;
+SET enable_nestloop to false;
+alter foreign table foo2 options (use_remote_estimate 'true');
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join, expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+-- outer join, expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+RESET enable_hashjoin;
+RESET enable_nestloop;
+
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -28,9 +28,6 @@
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
@@ -50,7 +47,7 @@ static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7757741..4e00e9f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -206,5 +206,8 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

#21

Rushabh Lathia

rushabh.lathia@gmail.com

about 10 years ago

In reply to: Robert Haas (#20)

Re: Getting sorted data from foreign server for merge join

On Sat, Dec 19, 2015 at 2:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 17, 2015 at 3:32 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <

rushabh.lathia@gmail.com>

wrote:

Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.

This patch needs a rebase.

Done.

Thanks.

It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed. And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).

The TODO was present in v4 but not in v5 and is not present in v6

attached

here.. Formatted comment according estimate_path_cost_size(),
convert_prep_stmt_params().

Hrm, I must have been looking at the wrong version somehow. Sorry about
that.

Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea. Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny. I'd like
to see some sample data and some sample queries that get appreciably
faster with this code. If we can't find any, we don't need the code.

I tested the patch on my laptop with two types of queries, a join between
two foreign tables on different foreign servers (pointing to the same

self

server) and a join between one foreign and one local table. The foreign
tables and servers are created using sort_pd_setup.sql attached. Foreign
tables pointed to table with index useful for join clause. Both the

joining

tables had 10M rows. The execution time of query was measured for 100

runs

and average and standard deviation were calculated (using function
query_execution_stats() in script sort_pd.sql) and are presented below.

OK, cool.

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

Had a quick look at the patch changes and also performed basic
sanity check. Patch looks good to me.

Thanks.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Rushabh Lathia
www.EnterpriseDB.com

#22

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#20)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

Attached patch with some cosmetic changes (listed here for your quick
reference)
1. , was replaced with ; in comment "inner join, expressions in the " at
one place, which is correct, but missed other place.
2. The comment "First, consider whether any each active EC is potentially"
should use either "any" or "each". I have reworded it as "First, consider
whether any of the active ECs is potentially ...". Or we can use "First,
find all of the active ECs which are potentially ....".
3. "having the remote side due the sort generally won't be any worse ..." -
instead of "due" we should use "do"?
4. Added static prototype of function get_useful_ecs_for_relation().
5. The comment "Extract unique EC for query, if any, so we don't consider
it again." is too crisp. Phrase "Unique EC for query" is confusing; EC can
not be associated with a query per say and EC's are always unique because
of canonicalisation. May be we should reword it as "Extract single EC for
ordering of query, if any, so we don't consider it again." Is that cryptic
as well?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

order-for-merge-pushdown-rmh_v2.patchbinary/octet-stream; name=order-for-merge-pushdown-rmh_v2.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 866a09b..b471c67 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -336,20 +336,90 @@ WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4
  10 |  0 | 00010 | Sun Jan 11 00:00:00 1970 PST
 (10 rows)
 
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
  ?column? | ?column? 
 ----------+----------
  fixed    | 
 (1 row)
 
+-- Test forcing the remote server to produce sorted data for a merge join.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join; expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- outer join; expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Limit
+   Output: t1.c1, t2."C 1"
+   ->  Merge Left Join
+         Output: t1.c1, t2."C 1"
+         Merge Cond: (t1.c1 = t2."C 1")
+         ->  Foreign Scan on public.ft2 t1
+               Output: t1.c1
+               Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t2
+               Output: t2."C 1"
+(10 rows)
+
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+ c1  | C 1 
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
    Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 1))
 (3 rows)
@@ -3531,20 +3601,115 @@ select tableoid::regclass, * from bar order by 1,2;
  tableoid | f1 | f2  
 ----------+----+-----
  bar      |  1 | 211
  bar      |  2 | 222
  bar      |  6 | 166
  bar2     |  3 | 233
  bar2     |  4 | 244
  bar2     |  7 | 177
 (6 rows)
 
+-- Test forcing the remote server to produce sorted data for a merge join,
+-- but the foreign table is an inheritance child.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+SET enable_hashjoin to false;
+SET enable_nestloop to false;
+alter foreign table foo2 options (use_remote_estimate 'true');
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join; expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 20 | 20
+ 22 | 22
+ 24 | 24
+ 26 | 26
+ 28 | 28
+ 30 | 30
+ 32 | 32
+ 34 | 34
+ 36 | 36
+ 38 | 38
+(10 rows)
+
+-- outer join; expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Limit
+   Output: foo.f1, loct1.f1, foo.f2
+   ->  Sort
+         Output: foo.f1, loct1.f1, foo.f2
+         Sort Key: foo.f2
+         ->  Merge Left Join
+               Output: foo.f1, loct1.f1, foo.f2
+               Merge Cond: (foo.f1 = loct1.f1)
+               ->  Merge Append
+                     Sort Key: foo.f1
+                     ->  Index Scan using i_foo_f1 on public.foo
+                           Output: foo.f1, foo.f2
+                     ->  Foreign Scan on public.foo2
+                           Output: foo2.f1, foo2.f2
+                           Remote SQL: SELECT f1, f2 FROM public.loct1 ORDER BY f1 ASC
+               ->  Index Only Scan using i_loct1_f1 on public.loct1
+                     Output: loct1.f1
+(17 rows)
+
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+ f1 | f1 
+----+----
+ 10 | 10
+ 11 |   
+ 12 | 12
+ 13 |   
+ 14 | 14
+ 15 |   
+ 16 | 16
+ 17 |   
+ 18 | 18
+ 19 |   
+(10 rows)
+
+RESET enable_hashjoin;
+RESET enable_nestloop;
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
  f1 | f2  
 ----+-----
   7 | 177
 (1 row)
 
 update bar set f2 = null where current of c;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9a014d4..121c225 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -252,20 +252,23 @@ static void postgresExplainForeignScan(ForeignScanState *node,
 static void postgresExplainForeignModify(ModifyTableState *mtstate,
 							 ResultRelInfo *rinfo,
 							 List *fdw_private,
 							 int subplan_index,
 							 ExplainState *es);
 static bool postgresAnalyzeForeignTable(Relation relation,
 							AcquireSampleRowsFunc *func,
 							BlockNumber *totalpages);
 static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
 							Oid serverOid);
+static List *get_useful_pathkeys_for_relation(PlannerInfo *root,
+								 RelOptInfo *rel);
+static List *get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel);
 
 /*
  * Helper functions
  */
 static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *baserel,
 						List *join_conds,
 						List *pathkeys,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
@@ -502,101 +505,262 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
 		estimate_path_cost_size(root, baserel, NIL, NIL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
 }
 
 /*
+ * get_useful_ecs_for_relation
+ *		Determine which EquivalenceClasses might be involved in useful
+ *		orderings of this relation.
+ *
+ * This function is in some respects a mirror image of the core function
+ * pathkeys_useful_for_merging: for a regular table, we know what indexes
+ * we have and want to test whether any of them are useful.  For a foreign
+ * table, we don't know what indexes are present on the remote side but
+ * want to speculate about which ones we'd like to use if they existed.
+ */
+static List *
+get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	   *useful_eclass_list = NIL;
+	ListCell   *lc;
+	Relids		relids;
+
+	/*
+	 * First, consider whether any of the active ECs is potentially useful for a
+	 * merge join against this relation.
+	 */
+	if (rel->has_eclass_joins)
+	{
+		foreach(lc, root->eq_classes)
+		{
+			EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+
+			if (eclass_useful_for_merging(root, cur_ec, rel))
+				useful_eclass_list = lappend(useful_eclass_list, cur_ec);
+		}
+	}
+
+	/*
+	 * Next, consider whether there are any non-EC derivable join clauses that
+	 * are merge-joinable.  If the joininfo list is empty, we can exit
+	 * quickly.
+	 */
+	if (rel->joininfo == NIL)
+		return useful_eclass_list;
+
+	/* If this is a child rel, we must use the topmost parent rel to search. */
+	if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+		relids = find_childrel_top_parent(root, rel)->relids;
+	else
+		relids = rel->relids;
+
+	/* Check each join clause in turn. */
+	foreach(lc, rel->joininfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/* Consider only mergejoinable clauses */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;
+
+		/* Make sure we've got canonical ECs. */
+		update_mergeclause_eclasses(root, restrictinfo);
+
+		/*
+		 * restrictinfo->mergeopfamilies != NIL is sufficient to guarantee
+		 * that left_ec and right_ec will be initialized, per comments in
+		 * distribute_qual_to_rels, and rel->joininfo should only contain ECs
+		 * where this relation appears on one side or the other.
+		 */
+		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+													 restrictinfo->right_ec);
+		else
+		{
+			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
+													  restrictinfo->left_ec);
+		}
+	}
+
+	return useful_eclass_list;
+}
+
+/*
+ * get_useful_pathkeys_for_relation
+ *		Determine which orderings of a relation might be useful.
+ *
+ * Getting data in sorted order can be useful either because the requested
+ * order matches the final output ordering for the overall query we're
+ * planning, or because it enables an efficient merge join.  Here, we try
+ * to figure out which pathkeys to consider.
+ */
+static List *
+get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	List	   *useful_pathkeys_list = NIL;
+	List	   *useful_eclass_list;
+	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
+	EquivalenceClass *query_ec = NULL;
+	ListCell   *lc;
+
+	/*
+	 * Pushing the query_pathkeys to the remote server is always worth
+	 * considering, because it might let us avoid a local sort.
+	 */
+	if (root->query_pathkeys)
+	{
+		bool		query_pathkeys_ok = true;
+
+		foreach(lc, root->query_pathkeys)
+		{
+			PathKey    *pathkey = (PathKey *) lfirst(lc);
+			EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+			Expr	   *em_expr;
+
+			/*
+			 * The planner and executor don't have any clever strategy for
+			 * taking data sorted by a prefix of the query's pathkeys and
+			 * getting it to be sorted by all of those pathkeys. We'll just
+			 * end up resorting the entire data set.  So, unless we can push
+			 * down all of the query pathkeys, forget it.
+			 *
+			 * is_foreign_expr would detect volatile expressions as well, but
+			 * checking ec_has_volatile here saves some cycles.
+			 */
+			if (pathkey_ec->ec_has_volatile ||
+				!(em_expr = find_em_expr_for_rel(pathkey_ec, rel)) ||
+				!is_foreign_expr(root, rel, em_expr))
+			{
+				query_pathkeys_ok = false;
+				break;
+			}
+		}
+
+		if (query_pathkeys_ok)
+			useful_pathkeys_list = list_make1(list_copy(root->query_pathkeys));
+	}
+
+	/*
+	 * Even if we're not using remote estimates, having the remote side do
+	 * the sort generally won't be any worse than doing it locally, and it
+	 * might be much better if the remote side can generate data in the right
+	 * order without needing a sort at all.  However, what we're going to do
+	 * next is try to generate pathkeys that seem promising for possible merge
+	 * joins, and that's more speculative.  A wrong choice might hurt quite a
+	 * bit, so bail out if we can't use remote estimates.
+	 */
+	if (!fpinfo->use_remote_estimate)
+		return useful_pathkeys_list;
+
+	/* Get the list of interesting EquivalenceClasses. */
+	useful_eclass_list = get_useful_ecs_for_relation(root, rel);
+
+	/* Extract unique EC for query, if any, so we don't consider it again. */
+	if (list_length(root->query_pathkeys) == 1)
+	{
+		PathKey    *query_pathkey = linitial(root->query_pathkeys);
+
+		query_ec = query_pathkey->pk_eclass;
+	}
+
+	/*
+	 * As a heuristic, the only pathkeys we consider here are those of length
+	 * one.  It's surely possible to consider more, but since each one we
+	 * choose to consider will generate a round-trip to the remote side, we
+	 * need to be a bit cautious here.  It would sure be nice to have a local
+	 * cache of information about remote index definitions...
+	 */
+	foreach(lc, useful_eclass_list)
+	{
+		EquivalenceClass *cur_ec = lfirst(lc);
+		Expr	   *em_expr;
+		PathKey    *pathkey;
+
+		/* If redundant with what we did above, skip it. */
+		if (cur_ec == query_ec)
+			continue;
+
+		/* If no pushable expression for this rel, skip it. */
+		em_expr = find_em_expr_for_rel(cur_ec, rel);
+		if (em_expr == NULL || !is_foreign_expr(root, rel, em_expr))
+			continue;
+
+		/* Looks like we can generate a pathkey, so let's do it. */
+		pathkey = make_canonical_pathkey(root, cur_ec,
+										 linitial_oid(cur_ec->ec_opfamilies),
+										 BTLessStrategyNumber,
+										 false);
+		useful_pathkeys_list = lappend(useful_pathkeys_list,
+									   list_make1(pathkey));
+	}
+
+	return useful_pathkeys_list;
+}
+
+/*
  * postgresGetForeignPaths
  *		Create possible scan paths for a scan on the foreign table
  */
 static void
 postgresGetForeignPaths(PlannerInfo *root,
 						RelOptInfo *baserel,
 						Oid foreigntableid)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	ForeignPath *path;
 	List	   *ppi_list;
 	ListCell   *lc;
-	List	   *usable_pathkeys = NIL;
+	List	   *useful_pathkeys_list = NIL;		/* List of all pathkeys */
 
 	/*
 	 * Create simplest ForeignScan path node and add it to baserel.  This path
 	 * corresponds to SeqScan path of regular tables (though depending on what
 	 * baserestrict conditions we were able to send to remote, there might
 	 * actually be an indexscan happening there).  We already did all the work
 	 * to estimate cost and size of this path.
 	 */
 	path = create_foreignscan_path(root, baserel,
 								   fpinfo->rows,
 								   fpinfo->startup_cost,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
 								   NULL,		/* no extra plan */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
-	/*
-	 * Determine whether we can potentially push query pathkeys to the remote
-	 * side, avoiding a local sort.
-	 */
-	foreach(lc, root->query_pathkeys)
-	{
-		PathKey    *pathkey = (PathKey *) lfirst(lc);
-		EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
-		Expr	   *em_expr;
-
-		/*
-		 * is_foreign_expr would detect volatile expressions as well, but
-		 * ec_has_volatile saves some cycles.
-		 */
-		if (!pathkey_ec->ec_has_volatile &&
-			(em_expr = find_em_expr_for_rel(pathkey_ec, baserel)) &&
-			is_foreign_expr(root, baserel, em_expr))
-			usable_pathkeys = lappend(usable_pathkeys, pathkey);
-		else
-		{
-			/*
-			 * The planner and executor don't have any clever strategy for
-			 * taking data sorted by a prefix of the query's pathkeys and
-			 * getting it to be sorted by all of those pathekeys.  We'll just
-			 * end up resorting the entire data set.  So, unless we can push
-			 * down all of the query pathkeys, forget it.
-			 */
-			list_free(usable_pathkeys);
-			usable_pathkeys = NIL;
-			break;
-		}
-	}
+	useful_pathkeys_list = get_useful_pathkeys_for_relation(root, baserel);
 
-	/* Create a path with useful pathkeys, if we found one. */
-	if (usable_pathkeys != NULL)
+	/* Create one path for each set of pathkeys we found above. */
+	foreach(lc, useful_pathkeys_list)
 	{
 		double		rows;
 		int			width;
 		Cost		startup_cost;
 		Cost		total_cost;
+		List	   *useful_pathkeys = lfirst(lc);
 
-		estimate_path_cost_size(root, baserel, NIL, usable_pathkeys,
+		estimate_path_cost_size(root, baserel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
 		add_path(baserel, (Path *)
 				 create_foreignscan_path(root, baserel,
 										 rows,
 										 startup_cost,
 										 total_cost,
-										 usable_pathkeys,
+										 useful_pathkeys,
 										 NULL,
 										 NULL,
 										 NIL));
 	}
 
 	/*
 	 * If we're not using remote estimates, stop here.  We have no way to
 	 * estimate whether any join clauses would be worth sending across, so
 	 * don't bother building parameterized paths.
 	 */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 671e38c..73fa9f6 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -171,20 +171,34 @@ SELECT COUNT(*) FROM ft1 t1;
 -- join two tables
 SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
 -- subquery
 SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
 -- subquery+MAX
 SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY c1;
 -- used in CTE
 WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2, t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
 -- fixed values
 SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
+-- Test forcing the remote server to produce sorted data for a merge join.
+SET enable_hashjoin TO false;
+SET enable_nestloop TO false;
+-- inner join; expressions in the clauses appear in the equivalence class list
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+-- outer join; expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+EXPLAIN (VERBOSE, COSTS false)
+	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+RESET enable_hashjoin;
+RESET enable_nestloop;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = -c1;          -- OpExpr(l)
@@ -805,20 +819,46 @@ update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 update bar set f2 = f2 + 100
 from
   ( select f1 from foo union all select f1+3 from foo ) ss
 where bar.f1 = ss.f1;
 
 select tableoid::regclass, * from bar order by 1,2;
 
+-- Test forcing the remote server to produce sorted data for a merge join,
+-- but the foreign table is an inheritance child.
+truncate table loct1;
+truncate table only foo;
+\set num_rows_foo 2000
+insert into loct1 select generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2), generate_series(0, :num_rows_foo, 2);
+insert into foo select generate_series(1, :num_rows_foo, 2), generate_series(1, :num_rows_foo, 2);
+SET enable_hashjoin to false;
+SET enable_nestloop to false;
+alter foreign table foo2 options (use_remote_estimate 'true');
+create index i_loct1_f1 on loct1(f1);
+create index i_foo_f1 on foo(f1);
+analyze foo;
+analyze loct1;
+-- inner join; expressions in the clauses appear in the equivalence class list
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+-- outer join; expressions in the clauses do not appear in equivalence class
+-- list but no output change as compared to the previous query
+explain (verbose, costs off)
+	select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+select foo.f1, loct1.f1 from foo left join loct1 on (foo.f1 = loct1.f1) order by foo.f2 offset 10 limit 10;
+RESET enable_hashjoin;
+RESET enable_nestloop;
+
 -- Test that WHERE CURRENT OF is not supported
 begin;
 declare c cursor for select * from bar where f1 = 7;
 fetch from c;
 update bar set f2 = null where current of c;
 rollback;
 
 drop table foo cascade;
 drop table bar cascade;
 drop table loct1;
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c6b5d78..b81cc49 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -21,43 +21,40 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/plannodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
 
-static PathKey *make_canonical_pathkey(PlannerInfo *root,
-					   EquivalenceClass *eclass, Oid opfamily,
-					   int strategy, bool nulls_first);
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
 /****************************************************************************
  *		PATHKEY CONSTRUCTION AND REDUNDANCY TESTING
  ****************************************************************************/
 
 /*
  * make_canonical_pathkey
  *	  Given the parameters for a PathKey, find any pre-existing matching
  *	  pathkey in the query's list of "canonical" pathkeys.  Make a new
  *	  entry if there's not one already.
  *
  * Note that this function must not be used until after we have completed
  * merging EquivalenceClasses.  (We don't try to enforce that here; instead,
  * equivclass.c will complain if a merge occurs after root->canon_pathkeys
  * has become nonempty.)
  */
-static PathKey *
+PathKey *
 make_canonical_pathkey(PlannerInfo *root,
 					   EquivalenceClass *eclass, Oid opfamily,
 					   int strategy, bool nulls_first)
 {
 	PathKey    *pk;
 	ListCell   *lc;
 	MemoryContext oldcontext;
 
 	/* The passed eclass might be non-canonical, so chase up to the top */
 	while (eclass->ec_merged)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7757741..4e00e9f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -199,12 +199,15 @@ extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
 extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
 								List *mergeclauses,
 								RelOptInfo *joinrel);
 extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
 							  List *mergeclauses,
 							  List *outer_pathkeys);
 extern List *truncate_useless_pathkeys(PlannerInfo *root,
 						  RelOptInfo *rel,
 						  List *pathkeys);
 extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
+extern PathKey *make_canonical_pathkey(PlannerInfo *root,
+					   EquivalenceClass *eclass, Oid opfamily,
+					   int strategy, bool nulls_first);
 
 #endif   /* PATHS_H */

#23

Robert Haas

robertmhaas@gmail.com

about 10 years ago

In reply to: Ashutosh Bapat (#22)

Re: Getting sorted data from foreign server for merge join

On Mon, Dec 21, 2015 at 6:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

Attached patch with some cosmetic changes (listed here for your quick
reference)
1. , was replaced with ; in comment "inner join, expressions in the " at one
place, which is correct, but missed other place.
2. The comment "First, consider whether any each active EC is potentially"
should use either "any" or "each". I have reworded it as "First, consider
whether any of the active ECs is potentially ...". Or we can use "First,
find all of the active ECs which are potentially ....".
3. "having the remote side due the sort generally won't be any worse ..." -
instead of "due" we should use "do"?
4. Added static prototype of function get_useful_ecs_for_relation().
5. The comment "Extract unique EC for query, if any, so we don't consider it
again." is too crisp. Phrase "Unique EC for query" is confusing; EC can not
be associated with a query per say and EC's are always unique because of
canonicalisation. May be we should reword it as "Extract single EC for
ordering of query, if any, so we don't consider it again." Is that cryptic
as well?

Thanks. I committed this version with one small tweak.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#23)

Re: Getting sorted data from foreign server for merge join

Thanks.

On Wed, Dec 23, 2015 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Dec 21, 2015 at 6:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

Attached patch with some cosmetic changes (listed here for your quick
reference)
1. , was replaced with ; in comment "inner join, expressions in the " at

one

place, which is correct, but missed other place.
2. The comment "First, consider whether any each active EC is

potentially"

should use either "any" or "each". I have reworded it as "First, consider
whether any of the active ECs is potentially ...". Or we can use "First,
find all of the active ECs which are potentially ....".
3. "having the remote side due the sort generally won't be any worse

..." -

instead of "due" we should use "do"?
4. Added static prototype of function get_useful_ecs_for_relation().
5. The comment "Extract unique EC for query, if any, so we don't

consider it

again." is too crisp. Phrase "Unique EC for query" is confusing; EC can

not

be associated with a query per say and EC's are always unique because of
canonicalisation. May be we should reword it as "Extract single EC for
ordering of query, if any, so we don't consider it again." Is that

cryptic

as well?

Thanks. I committed this version with one small tweak.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#25

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 10 years ago

In reply to: Robert Haas (#23)

1 attachment(s)

Re: Getting sorted data from foreign server for merge join

Hi,
In get_useful_ecs_for_relation(), while checking whether to use left or
right argument of a mergejoinable operator, the arguments to
bms_is_subset() are passed in reverse order. bms_is_subset() checks whether
the first argument in subset of the second, but in this function the subset
to be checked is passed as the second argument. Because of this following
query when run in contrib_regression database after "make installcheck" in
contrib/postgres_fdw trips assertion Assert(bms_is_subset(relids,
restrictinfo->left_ec->ec_relids));

EXPLAIN (COSTS false, VERBOSE)
SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on
(t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;

PFA patch to fix it.

Reason why it was not caught earlier: this code is excercised when
expressions in left/right join clauses are considered. For mergejoinable
clauses, the left and right side are not merged into a single EC but appear
as separate ECs. All tests in the postgres_fdw.sql that excercised this
code involved only two relations, thus ec_relids and relids had only a
single member and bms_is_subset() returned true. But in the above query the
left/right EC has relids of t2 and t3, which caused bms_is_subset() to
return false and thus trip the assertion.

I have added above query and the output to the tests. The output of EXPLAIN
shows that ORDER BY clause is pushed down for ft2 but not ft1. This is
because ft2 has use_remote_estimate true and ft1 has that false. So, we
push down ORDER BY corresponding merge join for ft2 but not for ft1.

On Wed, Dec 23, 2015 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Dec 21, 2015 at 6:34 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I went over this patch in some detail today and did a lot of cosmetic
cleanup. The results are attached. I'm fairly happy with this
version, but let me know what you think. Of course, feedback from
others is more than welcome also.

Attached patch with some cosmetic changes (listed here for your quick
reference)
1. , was replaced with ; in comment "inner join, expressions in the " at

one

place, which is correct, but missed other place.
2. The comment "First, consider whether any each active EC is

potentially"

should use either "any" or "each". I have reworded it as "First, consider
whether any of the active ECs is potentially ...". Or we can use "First,
find all of the active ECs which are potentially ....".
3. "having the remote side due the sort generally won't be any worse

..." -

instead of "due" we should use "do"?
4. Added static prototype of function get_useful_ecs_for_relation().
5. The comment "Extract unique EC for query, if any, so we don't

consider it

again." is too crisp. Phrase "Unique EC for query" is confusing; EC can

not

be associated with a query per say and EC's are always unique because of
canonicalisation. May be we should reword it as "Extract single EC for
ordering of query, if any, so we don't consider it again." Is that

cryptic

as well?

Thanks. I committed this version with one small tweak.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_fdw_mjsort_assert.patchtext/x-patch; charset=US-ASCII; name=pg_fdw_mjsort_assert.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b471c67..f50cc62 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -404,20 +404,65 @@ SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1"
  103 | 103
  104 | 104
  105 | 105
  106 | 106
  107 | 107
  108 | 108
  109 | 109
  110 | 110
 (10 rows)
 
+-- outer join and inner join combination; expressions in the inner join clause appear
+-- in same equivalence class and become part of equivalence class list.
+-- Expressions in outer join clause appear in two separate equivalence class
+-- structures and EC corresponding to the outer relation doesn't appear in the
+-- equivalence class list.
+EXPLAIN (COSTS false, VERBOSE)
+	SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+                                    QUERY PLAN                                    
+----------------------------------------------------------------------------------
+ Limit
+   Output: t1."C 1"
+   ->  Merge Right Join
+         Output: t1."C 1"
+         Merge Cond: (t3.c1 = t1."C 1")
+         ->  Merge Join
+               Output: t3.c1
+               Merge Cond: (t3.c1 = t2.c1)
+               ->  Foreign Scan on public.ft2 t3
+                     Output: t3.c1
+                     Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+               ->  Sort
+                     Output: t2.c1
+                     Sort Key: t2.c1
+                     ->  Foreign Scan on public.ft1 t2
+                           Output: t2.c1
+                           Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+         ->  Index Only Scan using t1_pkey on "S 1"."T 1" t1
+               Output: t1."C 1"
+(19 rows)
+
+SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+ C 1 
+-----
+ 101
+ 102
+ 103
+ 104
+ 105
+ 106
+ 107
+ 108
+ 109
+ 110
+(10 rows)
+
 RESET enable_hashjoin;
 RESET enable_nestloop;
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
  Foreign Scan on public.ft1 t1
    Output: c1, c2, c3, c4, c5, c6, c7, c8
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 374faf5..e9872b3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -569,26 +569,26 @@ get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel)
 
 		/* Make sure we've got canonical ECs. */
 		update_mergeclause_eclasses(root, restrictinfo);
 
 		/*
 		 * restrictinfo->mergeopfamilies != NIL is sufficient to guarantee
 		 * that left_ec and right_ec will be initialized, per comments in
 		 * distribute_qual_to_rels, and rel->joininfo should only contain ECs
 		 * where this relation appears on one side or the other.
 		 */
-		if (bms_is_subset(restrictinfo->right_ec->ec_relids, relids))
+		if (bms_is_subset(relids, restrictinfo->right_ec->ec_relids))
 			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
 													 restrictinfo->right_ec);
 		else
 		{
-			Assert(bms_is_subset(restrictinfo->left_ec->ec_relids, relids));
+			Assert(bms_is_subset(relids, restrictinfo->left_ec->ec_relids));
 			useful_eclass_list = list_append_unique_ptr(useful_eclass_list,
 													  restrictinfo->left_ec);
 		}
 	}
 
 	return useful_eclass_list;
 }
 
 /*
  * get_useful_pathkeys_for_relation
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 73fa9f6..7e80045 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -183,20 +183,28 @@ SET enable_hashjoin TO false;
 SET enable_nestloop TO false;
 -- inner join; expressions in the clauses appear in the equivalence class list
 EXPLAIN (VERBOSE, COSTS false)
 	SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
 SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
 -- outer join; expressions in the clauses do not appear in equivalence class
 -- list but no output change as compared to the previous query
 EXPLAIN (VERBOSE, COSTS false)
 	SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
 SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C 1") OFFSET 100 LIMIT 10;
+-- outer join and inner join combination; expressions in the inner join clause appear
+-- in same equivalence class and become part of equivalence class list.
+-- Expressions in outer join clause appear in two separate equivalence class
+-- structures and EC corresponding to the outer relation doesn't appear in the
+-- equivalence class list.
+EXPLAIN (COSTS false, VERBOSE)
+	SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
+SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
 RESET enable_hashjoin;
 RESET enable_nestloop;
 
 -- ===================================================================
 -- WHERE with remotely-executable conditions
 -- ===================================================================
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 1;         -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NULL;        -- NullTest
 EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;    -- NullTest

#26

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Ashutosh Bapat (#25)

Re: Getting sorted data from foreign server for merge join

On Thu, Jan 7, 2016 at 4:05 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

In get_useful_ecs_for_relation(), while checking whether to use left or
right argument of a mergejoinable operator, the arguments to bms_is_subset()
are passed in reverse order. bms_is_subset() checks whether the first
argument in subset of the second, but in this function the subset to be
checked is passed as the second argument. Because of this following query
when run in contrib_regression database after "make installcheck" in
contrib/postgres_fdw trips assertion Assert(bms_is_subset(relids,
restrictinfo->left_ec->ec_relids));

EXPLAIN (COSTS false, VERBOSE)
SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on
(t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;

PFA patch to fix it.

The test case failed for me, possibly because of Tom's upper planner
pathification, but the substantive part of the fix looks right to me,
so committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers