Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

Started by David Rowleyover 1 year ago7 messages

dgrowleyml@gmail.com

over 1 year ago

2 attachment(s)

Earlier today in [1]https://postgr.es/message-id/Zktzf926vslR35Fv%40depesz.com, a bug was reported regarding a problem with the
code added in 66c0185a3 where I'd failed to handle the case correctly
where the UNION's targetlist has columns which are not sortable. For
pg_class, that's relfrozenxid, relminmxid and relacl.

The most minimal reproducer prior to the revert is:

set enable_hashagg=0;
explain (costs off) select '123'::xid union select '123'::xid;

There is still some ongoing discussion about this on the release
mailing list as per mentioned by Tom in the commit message in
7204f3591.

At some point that discussion is going to need to circle back onto
-hackers again, and since I've already written a patch to fix the
issue and un-revert Tom's revert. I just wanted a place on -hackers to
allow that code to be viewed and discussed. I did also post a patch
on [2]/messages/by-id/CAApHDvpDQh1NcL7nAsd3YAKj4vgORwesB3GYuNPnEXXRfA2g4w@mail.gmail.com, but that no longer applies to master due to the revert.

I'll allow the RMT to choose where the outcome of the RMT decision
goes. Let this thread be for at least the coding portion of this or
be my thread for this patch for the v18 cycle if the RMT rules in
favour of keeping that code reverted for v17.

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

If anyone wants to have a look, I'd be grateful for that. Tom did
call for further review after this being the 4th issue reported for
66c0185a3.

David

[1]: https://postgr.es/message-id/Zktzf926vslR35Fv%40depesz.com
[2]: /messages/by-id/CAApHDvpDQh1NcL7nAsd3YAKj4vgORwesB3GYuNPnEXXRfA2g4w@mail.gmail.com

Attachments:

v2-0001-Revert-Revert-commit-66c0185a3-and-follow-on-patc.patchapplication/octet-stream; name=v2-0001-Revert-Revert-commit-66c0185a3-and-follow-on-patc.patchDownload

From 079be6ffdf0ca69e8affdc72951bf258666b8451 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 21 May 2024 12:24:11 +1200
Subject: [PATCH v2 1/2] Revert "Revert commit 66c0185a3 and follow-on
 patches."

This reverts commit 7204f35919b7e021e8d1bc9f2d76fd6bfcdd2070.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   7 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   9 +
 src/backend/optimizer/path/allpaths.c         |   5 +-
 src/backend/optimizer/path/equivclass.c       |  61 ++
 src/backend/optimizer/path/pathkeys.c         |  19 +
 src/backend/optimizer/plan/planner.c          | 116 ++-
 src/backend/optimizer/plan/subselect.c        |  15 +-
 src/backend/optimizer/prep/prepunion.c        | 720 ++++++++++++------
 src/backend/parser/analyze.c                  |   3 +-
 src/include/nodes/pathnodes.h                 |   2 +
 src/include/optimizer/paths.h                 |   4 +
 src/include/optimizer/planner.h               |   3 +-
 src/include/optimizer/prep.h                  |   2 +-
 .../regress/expected/collate.icu.utf8.out     |   2 +
 .../regress/expected/incremental_sort.out     |  13 +-
 src/test/regress/expected/union.out           |  46 +-
 src/test/regress/sql/collate.icu.utf8.sql     |   2 +
 src/test/regress/sql/union.sql                |  19 +-
 18 files changed, 761 insertions(+), 287 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 9ae36d3059..078b8a966f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11511,6 +11511,10 @@ DROP INDEX base_tbl1_idx;
 DROP INDEX base_tbl2_idx;
 DROP INDEX async_p3_idx;
 -- UNION queries
+SET enable_sort TO off;
+SET enable_incremental_sort TO off;
+-- Adjust fdw_startup_cost so that we get an unordered path in the Append.
+ALTER SERVER loopback2 OPTIONS (ADD fdw_startup_cost '0.00');
 EXPLAIN (VERBOSE, COSTS OFF)
 INSERT INTO result_tbl
 (SELECT a, b, 'AAA' || c FROM async_p1 ORDER BY a LIMIT 10)
@@ -11592,6 +11596,9 @@ SELECT * FROM result_tbl ORDER BY a;
 (12 rows)
 
 DELETE FROM result_tbl;
+RESET enable_incremental_sort;
+RESET enable_sort;
+ALTER SERVER loopback2 OPTIONS (DROP fdw_startup_cost);
 -- Disable async execution if we use gating Result nodes for pseudoconstant
 -- quals
 EXPLAIN (VERBOSE, COSTS OFF)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1f31ac14df..09ba234e43 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3885,6 +3885,11 @@ DROP INDEX base_tbl2_idx;
 DROP INDEX async_p3_idx;
 
 -- UNION queries
+SET enable_sort TO off;
+SET enable_incremental_sort TO off;
+-- Adjust fdw_startup_cost so that we get an unordered path in the Append.
+ALTER SERVER loopback2 OPTIONS (ADD fdw_startup_cost '0.00');
+
 EXPLAIN (VERBOSE, COSTS OFF)
 INSERT INTO result_tbl
 (SELECT a, b, 'AAA' || c FROM async_p1 ORDER BY a LIMIT 10)
@@ -3911,6 +3916,10 @@ UNION ALL
 SELECT * FROM result_tbl ORDER BY a;
 DELETE FROM result_tbl;
 
+RESET enable_incremental_sort;
+RESET enable_sort;
+ALTER SERVER loopback2 OPTIONS (DROP fdw_startup_cost);
+
 -- Disable async execution if we use gating Result nodes for pseudoconstant
 -- quals
 EXPLAIN (VERBOSE, COSTS OFF)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 10aeebc2c1..4895cee994 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2633,9 +2633,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	Assert(root->plan_params == NIL);
 
 	/* Generate a subroot and Paths for the subquery */
-	rel->subroot = subquery_planner(root->glob, subquery,
-									root,
-									false, tuple_fraction);
+	rel->subroot = subquery_planner(root->glob, subquery, root, false,
+									tuple_fraction, NULL);
 
 	/* Isolate the params needed by this specific subplan */
 	rel->subplan_params = root->plan_params;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 13400af3ef..21ce1ae2e1 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2882,6 +2882,67 @@ add_child_join_rel_equivalences(PlannerInfo *root,
 	MemoryContextSwitchTo(oldcontext);
 }
 
+/*
+ * add_setop_child_rel_equivalences
+ *		Add equivalence members for each non-resjunk target in 'child_tlist'
+ *		to the EquivalenceClass in the corresponding setop_pathkey's pk_eclass.
+ *
+ * 'root' is the PlannerInfo belonging to the top-level set operation.
+ * 'child_rel' is the RelOptInfo of the child relation we're adding
+ * EquivalenceMembers for.
+ * 'child_tlist' is the target list for the setop child relation.  The target
+ * list expressions are what we add as EquivalenceMembers.
+ * 'setop_pathkeys' is a list of PathKeys which must contain an entry for each
+ * non-resjunk target in 'child_tlist'.
+ */
+void
+add_setop_child_rel_equivalences(PlannerInfo *root, RelOptInfo *child_rel,
+								 List *child_tlist, List *setop_pathkeys)
+{
+	ListCell   *lc;
+	ListCell   *lc2 = list_head(setop_pathkeys);
+
+	foreach(lc, child_tlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		EquivalenceMember *parent_em;
+		PathKey    *pk;
+
+		if (tle->resjunk)
+			continue;
+
+		if (lc2 == NULL)
+			elog(ERROR, "too few pathkeys for set operation");
+
+		pk = lfirst_node(PathKey, lc2);
+		parent_em = linitial(pk->pk_eclass->ec_members);
+
+		/*
+		 * We can safely pass the parent member as the first member in the
+		 * ec_members list as this is added first in generate_union_paths,
+		 * likewise, the JoinDomain can be that of the initial member of the
+		 * Pathkey's EquivalenceClass.
+		 */
+		add_eq_member(pk->pk_eclass,
+					  tle->expr,
+					  child_rel->relids,
+					  parent_em->em_jdomain,
+					  parent_em,
+					  exprType((Node *) tle->expr));
+
+		lc2 = lnext(setop_pathkeys, lc2);
+	}
+
+	/*
+	 * transformSetOperationStmt() ensures that the targetlist never contains
+	 * any resjunk columns, so all eclasses that exist in 'root' must have
+	 * received a new member in the loop above.  Add them to the child_rel's
+	 * eclass_indexes.
+	 */
+	child_rel->eclass_indexes = bms_add_range(child_rel->eclass_indexes, 0,
+											  list_length(root->eq_classes) - 1);
+}
+
 
 /*
  * generate_implied_equalities_for_column
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 157bc6a36d..8b258cbef9 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2191,6 +2191,22 @@ pathkeys_useful_for_grouping(PlannerInfo *root, List *pathkeys)
 	return n;
 }
 
+/*
+ * pathkeys_useful_for_setop
+ *		Count the number of leading common pathkeys root's 'setop_pathkeys' in
+ *		'pathkeys'.
+ */
+static int
+pathkeys_useful_for_setop(PlannerInfo *root, List *pathkeys)
+{
+	int			n_common_pathkeys;
+
+	(void) pathkeys_count_contained_in(root->setop_pathkeys, pathkeys,
+									   &n_common_pathkeys);
+
+	return n_common_pathkeys;
+}
+
 /*
  * truncate_useless_pathkeys
  *		Shorten the given pathkey list to just the useful pathkeys.
@@ -2208,6 +2224,9 @@ truncate_useless_pathkeys(PlannerInfo *root,
 	if (nuseful2 > nuseful)
 		nuseful = nuseful2;
 	nuseful2 = pathkeys_useful_for_grouping(root, pathkeys);
+	if (nuseful2 > nuseful)
+		nuseful = nuseful2;
+	nuseful2 = pathkeys_useful_for_setop(root, pathkeys);
 	if (nuseful2 > nuseful)
 		nuseful = nuseful2;
 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ddd2387840..032818423f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -54,6 +54,7 @@
 #include "optimizer/tlist.h"
 #include "parser/analyze.h"
 #include "parser/parse_agg.h"
+#include "parser/parse_clause.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
@@ -119,12 +120,15 @@ typedef struct
 {
 	List	   *activeWindows;	/* active windows, if any */
 	grouping_sets_data *gset_data;	/* grouping sets data, if any */
+	SetOperationStmt *setop;	/* parent set operation or NULL if not a
+								 * subquery belonging to a set operation */
 } standard_qp_extra;
 
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
-static void grouping_planner(PlannerInfo *root, double tuple_fraction);
+static void grouping_planner(PlannerInfo *root, double tuple_fraction,
+							 SetOperationStmt *setops);
 static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
 static List *remap_to_groupclause_idx(List *groupClause, List *gsets,
 									  int *tleref_to_colnum_map);
@@ -249,6 +253,8 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
 								 List *targetList,
 								 List *groupClause);
 static int	common_prefix_cmp(const void *a, const void *b);
+static List *generate_setop_child_grouplist(SetOperationStmt *op,
+											List *targetlist);
 
 
 /*****************************************************************************
@@ -406,8 +412,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	}
 
 	/* primary planning entry point (may recurse for subqueries) */
-	root = subquery_planner(glob, parse, NULL,
-							false, tuple_fraction);
+	root = subquery_planner(glob, parse, NULL, false, tuple_fraction, NULL);
 
 	/* Select best Path and turn it into a Plan */
 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
@@ -598,6 +603,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
  * hasRecursion is true if this is a recursive WITH query.
  * tuple_fraction is the fraction of tuples we expect will be retrieved.
  * tuple_fraction is interpreted as explained for grouping_planner, below.
+ * setops is used for set operation subqueries to provide the subquery with
+ * the context in which it's being used so that Paths correctly sorted for the
+ * set operation can be generated.  NULL when not planning a set operation
+ * child.
  *
  * Basically, this routine does the stuff that should only be done once
  * per Query object.  It then calls grouping_planner.  At one time,
@@ -616,9 +625,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
  *--------------------
  */
 PlannerInfo *
-subquery_planner(PlannerGlobal *glob, Query *parse,
-				 PlannerInfo *parent_root,
-				 bool hasRecursion, double tuple_fraction)
+subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
+				 bool hasRecursion, double tuple_fraction,
+				 SetOperationStmt *setops)
 {
 	PlannerInfo *root;
 	List	   *newWithCheckOptions;
@@ -1077,7 +1086,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 	/*
 	 * Do the main planning.
 	 */
-	grouping_planner(root, tuple_fraction);
+	grouping_planner(root, tuple_fraction, setops);
 
 	/*
 	 * Capture the set of outer-level param IDs we have access to, for use in
@@ -1275,7 +1284,11 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
  *	  0 < tuple_fraction < 1: expect the given fraction of tuples available
  *		from the plan to be retrieved
  *	  tuple_fraction >= 1: tuple_fraction is the absolute number of tuples
- *		expected to be retrieved (ie, a LIMIT specification)
+ *		expected to be retrieved (ie, a LIMIT specification).
+ * setops is used for set operation subqueries to provide the subquery with
+ * the context in which it's being used so that Paths correctly sorted for the
+ * set operation can be generated.  NULL when not planning a set operation
+ * child.
  *
  * Returns nothing; the useful output is in the Paths we attach to the
  * (UPPERREL_FINAL, NULL) upperrel in *root.  In addition,
@@ -1286,7 +1299,8 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
  *--------------------
  */
 static void
-grouping_planner(PlannerInfo *root, double tuple_fraction)
+grouping_planner(PlannerInfo *root, double tuple_fraction,
+				 SetOperationStmt *setops)
 {
 	Query	   *parse = root->parse;
 	int64		offset_est = 0;
@@ -1321,17 +1335,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 
 	if (parse->setOperations)
 	{
-		/*
-		 * If there's a top-level ORDER BY, assume we have to fetch all the
-		 * tuples.  This might be too simplistic given all the hackery below
-		 * to possibly avoid the sort; but the odds of accurate estimates here
-		 * are pretty low anyway.  XXX try to get rid of this in favor of
-		 * letting plan_set_operations generate both fast-start and
-		 * cheapest-total paths.
-		 */
-		if (parse->sortClause)
-			root->tuple_fraction = 0.0;
-
 		/*
 		 * Construct Paths for set operations.  The results will not need any
 		 * work except perhaps a top-level sort and/or LIMIT.  Note that any
@@ -1501,6 +1504,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		qp_extra.activeWindows = activeWindows;
 		qp_extra.gset_data = gset_data;
 
+		/*
+		 * If we're a subquery for a set operation, store the SetOperationStmt
+		 * in qp_extra.
+		 */
+		qp_extra.setop = setops;
+
 		/*
 		 * Generate the best unsorted and presorted paths for the scan/join
 		 * portion of this Query, ie the processing represented by the
@@ -3453,6 +3462,27 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 									  parse->sortClause,
 									  tlist);
 
+	/* setting setop_pathkeys might be useful to the union planner */
+	if (qp_extra->setop != NULL &&
+		set_operation_ordered_results_useful(qp_extra->setop))
+	{
+		List	   *groupClauses;
+		bool		sortable;
+
+		groupClauses = generate_setop_child_grouplist(qp_extra->setop, tlist);
+
+		root->setop_pathkeys =
+			make_pathkeys_for_sortclauses_extended(root,
+												   &groupClauses,
+												   tlist,
+												   false,
+												   &sortable);
+		if (!sortable)
+			root->setop_pathkeys = NIL;
+	}
+	else
+		root->setop_pathkeys = NIL;
+
 	/*
 	 * Figure out whether we want a sorted result from query_planner.
 	 *
@@ -3462,7 +3492,9 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * sortable DISTINCT clause that's more rigorous than the ORDER BY clause,
 	 * we try to produce output that's sufficiently well sorted for the
 	 * DISTINCT.  Otherwise, if there is an ORDER BY clause, we want to sort
-	 * by the ORDER BY clause.
+	 * by the ORDER BY clause.  Otherwise, if we're a subquery being planned
+	 * for a set operation which can benefit from presorted results and have a
+	 * sortable targetlist, we want to sort by the target list.
 	 *
 	 * Note: if we have both ORDER BY and GROUP BY, and ORDER BY is a superset
 	 * of GROUP BY, it would be tempting to request sort by ORDER BY --- but
@@ -3480,6 +3512,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 		root->query_pathkeys = root->distinct_pathkeys;
 	else if (root->sort_pathkeys)
 		root->query_pathkeys = root->sort_pathkeys;
+	else if (root->setop_pathkeys != NIL)
+		root->query_pathkeys = root->setop_pathkeys;
 	else
 		root->query_pathkeys = NIL;
 }
@@ -7923,3 +7957,43 @@ group_by_has_partkey(RelOptInfo *input_rel,
 
 	return true;
 }
+
+/*
+ * generate_setop_child_grouplist
+ *		Build a SortGroupClause list defining the sort/grouping properties
+ *		of the child of a set operation.
+ *
+ * This is similar to generate_setop_grouplist() but differs as the setop
+ * child query's targetlist entries may already have a tleSortGroupRef
+ * assigned for other purposes, such as GROUP BYs.  Here we keep the
+ * SortGroupClause list in the same order as 'op' groupClauses and just adjust
+ * the tleSortGroupRef to reference the TargetEntry's 'ressortgroupref'.
+ */
+static List *
+generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
+{
+	List	   *grouplist = copyObject(op->groupClauses);
+	ListCell   *lg;
+	ListCell   *lt;
+
+	lg = list_head(grouplist);
+	foreach(lt, targetlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lt);
+		SortGroupClause *sgc;
+
+		/* resjunk columns could have sortgrouprefs.  Leave these alone */
+		if (tle->resjunk)
+			continue;
+
+		/* we expect every non-resjunk target to have a SortGroupClause */
+		Assert(lg != NULL);
+		sgc = (SortGroupClause *) lfirst(lg);
+		lg = lnext(grouplist, lg);
+
+		/* assign a tleSortGroupRef, or reuse the existing one */
+		sgc->tleSortGroupRef = assignSortGroupRef(tle, targetlist);
+	}
+	Assert(lg == NULL);
+	return grouplist;
+}
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d5fa281b10..e35ebea8b4 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -218,9 +218,8 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	Assert(root->plan_params == NIL);
 
 	/* Generate Paths for the subquery */
-	subroot = subquery_planner(root->glob, subquery,
-							   root,
-							   false, tuple_fraction);
+	subroot = subquery_planner(root->glob, subquery, root, false,
+							   tuple_fraction, NULL);
 
 	/* Isolate the params needed by this specific subplan */
 	plan_params = root->plan_params;
@@ -266,9 +265,8 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 		if (subquery)
 		{
 			/* Generate Paths for the ANY subquery; we'll need all rows */
-			subroot = subquery_planner(root->glob, subquery,
-									   root,
-									   false, 0.0);
+			subroot = subquery_planner(root->glob, subquery, root, false, 0.0,
+									   NULL);
 
 			/* Isolate the params needed by this specific subplan */
 			plan_params = root->plan_params;
@@ -967,9 +965,8 @@ SS_process_ctes(PlannerInfo *root)
 		 * Generate Paths for the CTE query.  Always plan for full retrieval
 		 * --- we don't have enough info to predict otherwise.
 		 */
-		subroot = subquery_planner(root->glob, subquery,
-								   root,
-								   cte->cterecursive, 0.0);
+		subroot = subquery_planner(root->glob, subquery, root,
+								   cte->cterecursive, 0.0, NULL);
 
 		/*
 		 * Since the current query level doesn't yet contain any RTEs, it
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 296f866677..30068c27a1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -43,11 +43,15 @@ static RelOptInfo *recurse_set_operations(Node *setOp, PlannerInfo *root,
 										  bool junkOK,
 										  int flag, List *refnames_tlist,
 										  List **pTargetList,
-										  double *pNumGroups);
+										  bool *istrivial_tlist);
 static RelOptInfo *generate_recursion_path(SetOperationStmt *setOp,
 										   PlannerInfo *root,
 										   List *refnames_tlist,
 										   List **pTargetList);
+static void build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
+									bool trivial_tlist, List *child_tlist,
+									List *interesting_pathkeys,
+									double *pNumGroups);
 static RelOptInfo *generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 										List *refnames_tlist,
 										List **pTargetList);
@@ -57,9 +61,8 @@ static RelOptInfo *generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *ro
 static List *plan_union_children(PlannerInfo *root,
 								 SetOperationStmt *top_union,
 								 List *refnames_tlist,
-								 List **tlist_list);
-static Path *make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
-							   PlannerInfo *root);
+								 List **tlist_list,
+								 List **istrivial_tlist);
 static void postprocess_setop_rel(PlannerInfo *root, RelOptInfo *rel);
 static bool choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 								Path *input_path,
@@ -114,10 +117,10 @@ plan_set_operations(PlannerInfo *root)
 	Assert(parse->distinctClause == NIL);
 
 	/*
-	 * In the outer query level, we won't have any true equivalences to deal
-	 * with; but we do want to be able to make pathkeys, which will require
-	 * single-member EquivalenceClasses.  Indicate that EC merging is complete
-	 * so that pathkeys.c won't complain.
+	 * In the outer query level, equivalence classes are limited to classes
+	 * which define that the top-level target entry is equivalent to the
+	 * corresponding child target entry.  There won't be any equivalence class
+	 * merging.  Mark that merging is complete to allow us to make pathkeys.
 	 */
 	Assert(root->eq_classes == NIL);
 	root->ec_merging_done = true;
@@ -152,6 +155,8 @@ plan_set_operations(PlannerInfo *root)
 	}
 	else
 	{
+		bool		trivial_tlist;
+
 		/*
 		 * Recurse on setOperations tree to generate paths for set ops. The
 		 * final output paths should have just the column types shown as the
@@ -163,7 +168,7 @@ plan_set_operations(PlannerInfo *root)
 										   true, -1,
 										   leftmostQuery->targetList,
 										   &top_tlist,
-										   NULL);
+										   &trivial_tlist);
 	}
 
 	/* Must return the built tlist into root->processed_tlist. */
@@ -172,6 +177,31 @@ plan_set_operations(PlannerInfo *root)
 	return setop_rel;
 }
 
+/*
+ * set_operation_ordered_results_useful
+ *		Return true if the given SetOperationStmt can be executed by utilizing
+ *		paths that provide sorted input according to the setop's targetlist.
+ *		Returns false when sorted paths are not any more useful then unsorted
+ *		ones.
+ */
+bool
+set_operation_ordered_results_useful(SetOperationStmt *setop)
+{
+	/*
+	 * Paths sorted by the targetlist are useful for UNION as we can opt to
+	 * MergeAppend the sorted paths then Unique them.  Ordered paths are no
+	 * more useful than unordered ones for UNION ALL.
+	 */
+	if (!setop->all && setop->op == SETOP_UNION)
+		return true;
+
+	/*
+	 * EXCEPT / EXCEPT ALL / INTERSECT / INTERSECT ALL cannot yet utilize
+	 * correctly sorted input paths.
+	 */
+	return false;
+}
+
 /*
  * recurse_set_operations
  *	  Recursively handle one step in a tree of set operations
@@ -184,8 +214,8 @@ plan_set_operations(PlannerInfo *root)
  *
  * Returns a RelOptInfo for the subtree, as well as these output parameters:
  * *pTargetList: receives the fully-fledged tlist for the subtree's top plan
- * *pNumGroups: if not NULL, we estimate the number of distinct groups
- *		in the result, and store it there
+ * *istrivial_tlist: true if, and only if, datatypes between parent and child
+ * match.
  *
  * The pTargetList output parameter is mostly redundant with the pathtarget
  * of the returned RelOptInfo, but for the moment we need it because much of
@@ -202,9 +232,11 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 					   bool junkOK,
 					   int flag, List *refnames_tlist,
 					   List **pTargetList,
-					   double *pNumGroups)
+					   bool *istrivial_tlist)
 {
-	RelOptInfo *rel = NULL;		/* keep compiler quiet */
+	RelOptInfo *rel;
+
+	*istrivial_tlist = true;	/* for now */
 
 	/* Guard against stack overflow due to overly complex setop nests */
 	check_stack_depth();
@@ -213,11 +245,9 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 	{
 		RangeTblRef *rtr = (RangeTblRef *) setOp;
 		RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex];
+		SetOperationStmt *setops;
 		Query	   *subquery = rte->subquery;
 		PlannerInfo *subroot;
-		RelOptInfo *final_rel;
-		Path	   *subpath;
-		Path	   *path;
 		List	   *tlist;
 		bool		trivial_tlist;
 
@@ -229,11 +259,16 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 		/* plan_params should not be in use in current query level */
 		Assert(root->plan_params == NIL);
 
+		/*
+		 * Pass the set operation details to the subquery_planner to have it
+		 * consider generating Paths correctly ordered for the set operation.
+		 */
+		setops = castNode(SetOperationStmt, root->parse->setOperations);
+
 		/* Generate a subroot and Paths for the subquery */
-		subroot = rel->subroot = subquery_planner(root->glob, subquery,
-												  root,
-												  false,
-												  root->tuple_fraction);
+		subroot = rel->subroot = subquery_planner(root->glob, subquery, root,
+												  false, root->tuple_fraction,
+												  setops);
 
 		/*
 		 * It should not be possible for the primitive query to contain any
@@ -254,90 +289,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 
 		/* Return the fully-fledged tlist to caller, too */
 		*pTargetList = tlist;
-
-		/*
-		 * Mark rel with estimated output rows, width, etc.  Note that we have
-		 * to do this before generating outer-query paths, else
-		 * cost_subqueryscan is not happy.
-		 */
-		set_subquery_size_estimates(root, rel);
-
-		/*
-		 * Since we may want to add a partial path to this relation, we must
-		 * set its consider_parallel flag correctly.
-		 */
-		final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
-		rel->consider_parallel = final_rel->consider_parallel;
-
-		/*
-		 * For the moment, we consider only a single Path for the subquery.
-		 * This should change soon (make it look more like
-		 * set_subquery_pathlist).
-		 */
-		subpath = get_cheapest_fractional_path(final_rel,
-											   root->tuple_fraction);
-
-		/*
-		 * Stick a SubqueryScanPath atop that.
-		 *
-		 * We don't bother to determine the subquery's output ordering since
-		 * it won't be reflected in the set-op result anyhow; so just label
-		 * the SubqueryScanPath with nil pathkeys.  (XXX that should change
-		 * soon too, likely.)
-		 */
-		path = (Path *) create_subqueryscan_path(root, rel, subpath,
-												 trivial_tlist,
-												 NIL, NULL);
-
-		add_path(rel, path);
-
-		/*
-		 * If we have a partial path for the child relation, we can use that
-		 * to build a partial path for this relation.  But there's no point in
-		 * considering any path but the cheapest.
-		 */
-		if (rel->consider_parallel && bms_is_empty(rel->lateral_relids) &&
-			final_rel->partial_pathlist != NIL)
-		{
-			Path	   *partial_subpath;
-			Path	   *partial_path;
-
-			partial_subpath = linitial(final_rel->partial_pathlist);
-			partial_path = (Path *)
-				create_subqueryscan_path(root, rel, partial_subpath,
-										 trivial_tlist,
-										 NIL, NULL);
-			add_partial_path(rel, partial_path);
-		}
-
-		/*
-		 * Estimate number of groups if caller wants it.  If the subquery used
-		 * grouping or aggregation, its output is probably mostly unique
-		 * anyway; otherwise do statistical estimation.
-		 *
-		 * XXX you don't really want to know about this: we do the estimation
-		 * using the subquery's original targetlist expressions, not the
-		 * subroot->processed_tlist which might seem more appropriate.  The
-		 * reason is that if the subquery is itself a setop, it may return a
-		 * processed_tlist containing "varno 0" Vars generated by
-		 * generate_append_tlist, and those would confuse estimate_num_groups
-		 * mightily.  We ought to get rid of the "varno 0" hack, but that
-		 * requires a redesign of the parsetree representation of setops, so
-		 * that there can be an RTE corresponding to each setop's output.
-		 */
-		if (pNumGroups)
-		{
-			if (subquery->groupClause || subquery->groupingSets ||
-				subquery->distinctClause ||
-				subroot->hasHavingQual || subquery->hasAggs)
-				*pNumGroups = subpath->rows;
-			else
-				*pNumGroups = estimate_num_groups(subroot,
-												  get_tlist_exprs(subquery->targetList, false),
-												  subpath->rows,
-												  NULL,
-												  NULL);
-		}
+		*istrivial_tlist = trivial_tlist;
 	}
 	else if (IsA(setOp, SetOperationStmt))
 	{
@@ -352,8 +304,6 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 			rel = generate_nonunion_paths(op, root,
 										  refnames_tlist,
 										  pTargetList);
-		if (pNumGroups)
-			*pNumGroups = rel->rows;
 
 		/*
 		 * If necessary, add a Result node to project the caller-requested
@@ -383,6 +333,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 												*pTargetList,
 												refnames_tlist,
 												&trivial_tlist);
+			*istrivial_tlist = trivial_tlist;
 			target = create_pathtarget(root, *pTargetList);
 
 			/* Apply projection to each path */
@@ -413,16 +364,16 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				lfirst(lc) = path;
 			}
 		}
+		postprocess_setop_rel(root, rel);
 	}
 	else
 	{
 		elog(ERROR, "unrecognized node type: %d",
 			 (int) nodeTag(setOp));
 		*pTargetList = NIL;
+		rel = NULL;				/* keep compiler quiet */
 	}
 
-	postprocess_setop_rel(root, rel);
-
 	return rel;
 }
 
@@ -441,7 +392,9 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 	Path	   *lpath;
 	Path	   *rpath;
 	List	   *lpath_tlist;
+	bool		lpath_trivial_tlist;
 	List	   *rpath_tlist;
+	bool		rpath_trivial_tlist;
 	List	   *tlist;
 	List	   *groupList;
 	double		dNumGroups;
@@ -461,7 +414,10 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 								  false, -1,
 								  refnames_tlist,
 								  &lpath_tlist,
-								  NULL);
+								  &lpath_trivial_tlist);
+	if (lrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, lrel, lpath_trivial_tlist, lpath_tlist,
+								NIL, NULL);
 	lpath = lrel->cheapest_total_path;
 	/* The right path will want to look at the left one ... */
 	root->non_recursive_path = lpath;
@@ -470,7 +426,10 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 								  false, -1,
 								  refnames_tlist,
 								  &rpath_tlist,
-								  NULL);
+								  &rpath_trivial_tlist);
+	if (rrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, rrel, rpath_trivial_tlist, rpath_tlist,
+								NIL, NULL);
 	rpath = rrel->cheapest_total_path;
 	root->non_recursive_path = NULL;
 
@@ -532,6 +491,204 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 	return result_rel;
 }
 
+/*
+ * build_setop_child_paths
+ *		Build paths for the set op child relation denoted by 'rel'.
+ *
+ * interesting_pathkeys: if not NIL, also include paths that suit these
+ * pathkeys, sorting any unsorted paths as required.
+ * *pNumGroups: if not NULL, we estimate the number of distinct groups
+ *		in the result, and store it there
+ */
+static void
+build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
+						bool trivial_tlist, List *child_tlist,
+						List *interesting_pathkeys, double *pNumGroups)
+{
+	RelOptInfo *final_rel;
+	List	   *setop_pathkeys = rel->subroot->setop_pathkeys;
+	ListCell   *lc;
+
+	/* it can't be a set op child rel if it's not a subquery */
+	Assert(rel->rtekind == RTE_SUBQUERY);
+
+	/* when sorting is needed, add child rel equivalences */
+	if (interesting_pathkeys != NIL)
+		add_setop_child_rel_equivalences(root,
+										 rel,
+										 child_tlist,
+										 interesting_pathkeys);
+
+	/*
+	 * Mark rel with estimated output rows, width, etc.  Note that we have to
+	 * do this before generating outer-query paths, else cost_subqueryscan is
+	 * not happy.
+	 */
+	set_subquery_size_estimates(root, rel);
+
+	/*
+	 * Since we may want to add a partial path to this relation, we must set
+	 * its consider_parallel flag correctly.
+	 */
+	final_rel = fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
+	rel->consider_parallel = final_rel->consider_parallel;
+
+	/* Generate subquery scan paths for any interesting path in final_rel */
+	foreach(lc, final_rel->pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		List	   *pathkeys;
+		Path	   *cheapest_input_path = final_rel->cheapest_total_path;
+		bool		is_sorted;
+		int			presorted_keys;
+
+		/*
+		 * Include the cheapest path as-is so that the set operation can be
+		 * cheaply implemented using a method which does not require the input
+		 * to be sorted.
+		 */
+		if (subpath == cheapest_input_path)
+		{
+			/* Convert subpath's pathkeys to outer representation */
+			pathkeys = convert_subquery_pathkeys(root, rel, subpath->pathkeys,
+												 make_tlist_from_pathtarget(subpath->pathtarget));
+
+			/* Generate outer path using this subpath */
+			add_path(rel, (Path *) create_subqueryscan_path(root,
+															rel,
+															subpath,
+															trivial_tlist,
+															pathkeys,
+															NULL));
+		}
+
+		/* skip dealing with sorted paths if the setop doesn't need them */
+		if (interesting_pathkeys == NIL)
+			continue;
+
+		/*
+		 * Create paths to suit final sort order required for setop_pathkeys.
+		 * Here we'll sort the cheapest input path (if not sorted already) and
+		 * incremental sort any paths which are partially sorted.
+		 */
+		is_sorted = pathkeys_count_contained_in(setop_pathkeys,
+												subpath->pathkeys,
+												&presorted_keys);
+
+		if (!is_sorted)
+		{
+			double		limittuples = rel->subroot->limit_tuples;
+
+			/*
+			 * Try at least sorting the cheapest path and also try
+			 * incrementally sorting any path which is partially sorted
+			 * already (no need to deal with paths which have presorted keys
+			 * when incremental sort is disabled unless it's the cheapest
+			 * input path).
+			 */
+			if (subpath != cheapest_input_path &&
+				(presorted_keys == 0 || !enable_incremental_sort))
+				continue;
+
+			/*
+			 * We've no need to consider both a sort and incremental sort.
+			 * We'll just do a sort if there are no presorted keys and an
+			 * incremental sort when there are presorted keys.
+			 */
+			if (presorted_keys == 0 || !enable_incremental_sort)
+				subpath = (Path *) create_sort_path(rel->subroot,
+													final_rel,
+													subpath,
+													setop_pathkeys,
+													limittuples);
+			else
+				subpath = (Path *) create_incremental_sort_path(rel->subroot,
+																final_rel,
+																subpath,
+																setop_pathkeys,
+																presorted_keys,
+																limittuples);
+		}
+
+		/*
+		 * subpath is now sorted, so add it to the pathlist.  We already added
+		 * the cheapest_input_path above, so don't add it again unless we just
+		 * sorted it.
+		 */
+		if (subpath != cheapest_input_path)
+		{
+			/* Convert subpath's pathkeys to outer representation */
+			pathkeys = convert_subquery_pathkeys(root, rel, subpath->pathkeys,
+												 make_tlist_from_pathtarget(subpath->pathtarget));
+
+			/* Generate outer path using this subpath */
+			add_path(rel, (Path *) create_subqueryscan_path(root,
+															rel,
+															subpath,
+															trivial_tlist,
+															pathkeys,
+															NULL));
+		}
+	}
+
+	/* if consider_parallel is false, there should be no partial paths */
+	Assert(final_rel->consider_parallel ||
+		   final_rel->partial_pathlist == NIL);
+
+	/*
+	 * If we have a partial path for the child relation, we can use that to
+	 * build a partial path for this relation.  But there's no point in
+	 * considering any path but the cheapest.
+	 */
+	if (rel->consider_parallel && bms_is_empty(rel->lateral_relids) &&
+		final_rel->partial_pathlist != NIL)
+	{
+		Path	   *partial_subpath;
+		Path	   *partial_path;
+
+		partial_subpath = linitial(final_rel->partial_pathlist);
+		partial_path = (Path *)
+			create_subqueryscan_path(root, rel, partial_subpath,
+									 trivial_tlist,
+									 NIL, NULL);
+		add_partial_path(rel, partial_path);
+	}
+
+	postprocess_setop_rel(root, rel);
+
+	/*
+	 * Estimate number of groups if caller wants it.  If the subquery used
+	 * grouping or aggregation, its output is probably mostly unique anyway;
+	 * otherwise do statistical estimation.
+	 *
+	 * XXX you don't really want to know about this: we do the estimation
+	 * using the subquery's original targetlist expressions, not the
+	 * subroot->processed_tlist which might seem more appropriate.  The reason
+	 * is that if the subquery is itself a setop, it may return a
+	 * processed_tlist containing "varno 0" Vars generated by
+	 * generate_append_tlist, and those would confuse estimate_num_groups
+	 * mightily.  We ought to get rid of the "varno 0" hack, but that requires
+	 * a redesign of the parsetree representation of setops, so that there can
+	 * be an RTE corresponding to each setop's output.
+	 */
+	if (pNumGroups)
+	{
+		PlannerInfo *subroot = rel->subroot;
+		Query	   *subquery = subroot->parse;
+
+		if (subquery->groupClause || subquery->groupingSets ||
+			subquery->distinctClause || subroot->hasHavingQual ||
+			subquery->hasAggs)
+			*pNumGroups = rel->cheapest_total_path->rows;
+		else
+			*pNumGroups = estimate_num_groups(subroot,
+											  get_tlist_exprs(subquery->targetList, false),
+											  rel->cheapest_total_path->rows,
+											  NULL,
+											  NULL);
+	}
+}
+
 /*
  * Generate paths for a UNION or UNION ALL node
  */
@@ -542,41 +699,38 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 {
 	Relids		relids = NULL;
 	RelOptInfo *result_rel;
-	double		save_fraction = root->tuple_fraction;
 	ListCell   *lc;
-	List	   *pathlist = NIL;
+	ListCell   *lc2;
+	ListCell   *lc3;
+	List	   *cheapest_pathlist = NIL;
+	List	   *ordered_pathlist = NIL;
 	List	   *partial_pathlist = NIL;
 	bool		partial_paths_valid = true;
 	bool		consider_parallel = true;
 	List	   *rellist;
 	List	   *tlist_list;
+	List	   *trivial_tlist_list;
 	List	   *tlist;
-	Path	   *path;
-
-	/*
-	 * If plain UNION, tell children to fetch all tuples.
-	 *
-	 * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to
-	 * each arm of the UNION ALL.  One could make a case for reducing the
-	 * tuple fraction for later arms (discounting by the expected size of the
-	 * earlier arms' results) but it seems not worth the trouble. The normal
-	 * case where tuple_fraction isn't already zero is a LIMIT at top level,
-	 * and passing it down as-is is usually enough to get the desired result
-	 * of preferring fast-start plans.
-	 */
-	if (!op->all)
-		root->tuple_fraction = 0.0;
+	List	   *groupList = NIL;
+	Path	   *apath;
+	Path	   *gpath = NULL;
+	bool		try_sorted;
+	List	   *union_pathkeys = NIL;
 
 	/*
 	 * If any of my children are identical UNION nodes (same op, all-flag, and
 	 * colTypes) then they can be merged into this node so that we generate
-	 * only one Append and unique-ification for the lot.  Recurse to find such
-	 * nodes and compute their children's paths.
+	 * only one Append/MergeAppend and unique-ification for the lot.  Recurse
+	 * to find such nodes.
 	 */
-	rellist = plan_union_children(root, op, refnames_tlist, &tlist_list);
+	rellist = plan_union_children(root,
+								  op,
+								  refnames_tlist,
+								  &tlist_list,
+								  &trivial_tlist_list);
 
 	/*
-	 * Generate tlist for Append plan node.
+	 * Generate tlist for Append/MergeAppend plan node.
 	 *
 	 * The tlist for an Append plan isn't important as far as the Append is
 	 * concerned, but we must make it look real anyway for the benefit of the
@@ -584,15 +738,68 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	 */
 	tlist = generate_append_tlist(op->colTypes, op->colCollations, false,
 								  tlist_list, refnames_tlist);
-
 	*pTargetList = tlist;
 
+	/* For for UNIONs (not UNION ALL), try sorting, if sorting is possible */
+	try_sorted = !op->all && grouping_is_sortable(op->groupClauses);
+
+	if (try_sorted)
+	{
+		/* Identify the grouping semantics */
+		groupList = generate_setop_grouplist(op, tlist);
+
+		/* Determine the pathkeys for sorting by the whole target list */
+		union_pathkeys = make_pathkeys_for_sortclauses(root, groupList, tlist);
+
+		root->query_pathkeys = union_pathkeys;
+	}
+
+	/*
+	 * Now that we've got the append target list, we can build the union child
+	 * paths.
+	 */
+	forthree(lc, rellist, lc2, trivial_tlist_list, lc3, tlist_list)
+	{
+		RelOptInfo *rel = lfirst(lc);
+		bool		trivial_tlist = lfirst_int(lc2);
+		List	   *child_tlist = lfirst_node(List, lc3);
+
+		/* only build paths for the union children */
+		if (rel->rtekind == RTE_SUBQUERY)
+			build_setop_child_paths(root, rel, trivial_tlist, child_tlist,
+									union_pathkeys, NULL);
+	}
+
 	/* Build path lists and relid set. */
 	foreach(lc, rellist)
 	{
 		RelOptInfo *rel = lfirst(lc);
+		Path	   *ordered_path;
 
-		pathlist = lappend(pathlist, rel->cheapest_total_path);
+		cheapest_pathlist = lappend(cheapest_pathlist,
+									rel->cheapest_total_path);
+
+		if (try_sorted)
+		{
+			ordered_path = get_cheapest_path_for_pathkeys(rel->pathlist,
+														  union_pathkeys,
+														  NULL,
+														  TOTAL_COST,
+														  false);
+
+			if (ordered_path != NULL)
+				ordered_pathlist = lappend(ordered_pathlist, ordered_path);
+			else
+			{
+				/*
+				 * If we can't find a sorted path, just give up trying to
+				 * generate a list of correctly sorted child paths.  This can
+				 * happen when type coercion was added to the targetlist due
+				 * to mismatching types from the union children.
+				 */
+				try_sorted = false;
+			}
+		}
 
 		if (consider_parallel)
 		{
@@ -615,28 +822,21 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	result_rel = fetch_upper_rel(root, UPPERREL_SETOP, relids);
 	result_rel->reltarget = create_pathtarget(root, tlist);
 	result_rel->consider_parallel = consider_parallel;
+	result_rel->consider_startup = (root->tuple_fraction > 0);
 
 	/*
-	 * Append the child results together.
-	 */
-	path = (Path *) create_append_path(root, result_rel, pathlist, NIL,
-									   NIL, NULL, 0, false, -1);
-
-	/*
-	 * For UNION ALL, we just need the Append path.  For UNION, need to add
-	 * node(s) to remove duplicates.
+	 * Append the child results together using the cheapest paths from each
+	 * union child.
 	 */
-	if (!op->all)
-		path = make_union_unique(op, path, tlist, root);
-
-	add_path(result_rel, path);
+	apath = (Path *) create_append_path(root, result_rel, cheapest_pathlist,
+										NIL, NIL, NULL, 0, false, -1);
 
 	/*
 	 * Estimate number of groups.  For now we just assume the output is unique
 	 * --- this is certainly true for the UNION case, and we want worst-case
 	 * estimates anyway.
 	 */
-	result_rel->rows = path->rows;
+	result_rel->rows = apath->rows;
 
 	/*
 	 * Now consider doing the same thing using the partial paths plus Append
@@ -644,7 +844,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	 */
 	if (partial_paths_valid)
 	{
-		Path	   *ppath;
+		Path	   *papath;
 		int			parallel_workers = 0;
 
 		/* Find the highest number of workers requested for any subpath. */
@@ -673,21 +873,137 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 		}
 		Assert(parallel_workers > 0);
 
-		ppath = (Path *)
+		papath = (Path *)
 			create_append_path(root, result_rel, NIL, partial_pathlist,
-							   NIL, NULL,
-							   parallel_workers, enable_parallel_append,
-							   -1);
-		ppath = (Path *)
-			create_gather_path(root, result_rel, ppath,
+							   NIL, NULL, parallel_workers,
+							   enable_parallel_append, -1);
+		gpath = (Path *)
+			create_gather_path(root, result_rel, papath,
 							   result_rel->reltarget, NULL, NULL);
-		if (!op->all)
-			ppath = make_union_unique(op, ppath, tlist, root);
-		add_path(result_rel, ppath);
 	}
 
-	/* Undo effects of possibly forcing tuple_fraction to 0 */
-	root->tuple_fraction = save_fraction;
+	if (!op->all)
+	{
+		double		dNumGroups;
+		bool		can_sort = grouping_is_sortable(groupList);
+		bool		can_hash = grouping_is_hashable(groupList);
+
+		/*
+		 * XXX for the moment, take the number of distinct groups as equal to
+		 * the total input size, i.e., the worst case.  This is too
+		 * conservative, but it's not clear how to get a decent estimate of
+		 * the true size.  One should note as well the propensity of novices
+		 * to write UNION rather than UNION ALL even when they don't expect
+		 * any duplicates...
+		 */
+		dNumGroups = apath->rows;
+
+		if (can_hash)
+		{
+			Path	   *path;
+
+			/*
+			 * Try a hash aggregate plan on 'apath'.  This is the cheapest
+			 * available path containing each append child.
+			 */
+			path = (Path *) create_agg_path(root,
+											result_rel,
+											apath,
+											create_pathtarget(root, tlist),
+											AGG_HASHED,
+											AGGSPLIT_SIMPLE,
+											groupList,
+											NIL,
+											NULL,
+											dNumGroups);
+			add_path(result_rel, path);
+
+			/* Try hash aggregate on the Gather path, if valid */
+			if (gpath != NULL)
+			{
+				/* Hashed aggregate plan --- no sort needed */
+				path = (Path *) create_agg_path(root,
+												result_rel,
+												gpath,
+												create_pathtarget(root, tlist),
+												AGG_HASHED,
+												AGGSPLIT_SIMPLE,
+												groupList,
+												NIL,
+												NULL,
+												dNumGroups);
+				add_path(result_rel, path);
+			}
+		}
+
+		if (can_sort)
+		{
+			Path	   *path = apath;
+
+			/* Try Sort -> Unique on the Append path */
+			if (groupList != NIL)
+				path = (Path *) create_sort_path(root, result_rel, path,
+												 make_pathkeys_for_sortclauses(root, groupList, tlist),
+												 -1.0);
+
+			path = (Path *) create_upper_unique_path(root,
+													 result_rel,
+													 path,
+													 list_length(path->pathkeys),
+													 dNumGroups);
+
+			add_path(result_rel, path);
+
+			/* Try Sort -> Unique on the Gather path, if set */
+			if (gpath != NULL)
+			{
+				path = gpath;
+
+				path = (Path *) create_sort_path(root, result_rel, path,
+												 make_pathkeys_for_sortclauses(root, groupList, tlist),
+												 -1.0);
+
+				path = (Path *) create_upper_unique_path(root,
+														 result_rel,
+														 path,
+														 list_length(path->pathkeys),
+														 dNumGroups);
+				add_path(result_rel, path);
+			}
+		}
+
+		/*
+		 * Try making a MergeAppend path if we managed to find a path with the
+		 * correct pathkeys in each union child query.
+		 */
+		if (try_sorted && groupList != NIL)
+		{
+			Path	   *path;
+
+			path = (Path *) create_merge_append_path(root,
+													 result_rel,
+													 ordered_pathlist,
+													 union_pathkeys,
+													 NULL);
+
+			/* and make the MergeAppend unique */
+			path = (Path *) create_upper_unique_path(root,
+													 result_rel,
+													 path,
+													 list_length(tlist),
+													 dNumGroups);
+
+			add_path(result_rel, path);
+		}
+	}
+	else
+	{
+		/* UNION ALL */
+		add_path(result_rel, apath);
+
+		if (gpath != NULL)
+			add_path(result_rel, gpath);
+	}
 
 	return result_rel;
 }
@@ -713,6 +1029,8 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root,
 			   *tlist,
 			   *groupList,
 			   *pathlist;
+	bool		lpath_trivial_tlist,
+				rpath_trivial_tlist;
 	double		dLeftGroups,
 				dRightGroups,
 				dNumGroups,
@@ -732,14 +1050,26 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root,
 								  false, 0,
 								  refnames_tlist,
 								  &lpath_tlist,
-								  &dLeftGroups);
+								  &lpath_trivial_tlist);
+	if (lrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, lrel, lpath_trivial_tlist, lpath_tlist,
+								NIL, &dLeftGroups);
+	else
+		dLeftGroups = lrel->rows;
+
 	lpath = lrel->cheapest_total_path;
 	rrel = recurse_set_operations(op->rarg, root,
 								  op->colTypes, op->colCollations,
 								  false, 1,
 								  refnames_tlist,
 								  &rpath_tlist,
-								  &dRightGroups);
+								  &rpath_trivial_tlist);
+	if (rrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, rrel, rpath_trivial_tlist, rpath_tlist,
+								NIL, &dRightGroups);
+	else
+		dRightGroups = rrel->rows;
+
 	rpath = rrel->cheapest_total_path;
 
 	/* Undo effects of forcing tuple_fraction to 0 */
@@ -876,13 +1206,16 @@ static List *
 plan_union_children(PlannerInfo *root,
 					SetOperationStmt *top_union,
 					List *refnames_tlist,
-					List **tlist_list)
+					List **tlist_list,
+					List **istrivial_tlist)
 {
 	List	   *pending_rels = list_make1(top_union);
 	List	   *result = NIL;
 	List	   *child_tlist;
+	bool		trivial_tlist;
 
 	*tlist_list = NIL;
+	*istrivial_tlist = NIL;
 
 	while (pending_rels != NIL)
 	{
@@ -921,75 +1254,14 @@ plan_union_children(PlannerInfo *root,
 														false, -1,
 														refnames_tlist,
 														&child_tlist,
-														NULL));
+														&trivial_tlist));
 		*tlist_list = lappend(*tlist_list, child_tlist);
+		*istrivial_tlist = lappend_int(*istrivial_tlist, trivial_tlist);
 	}
 
 	return result;
 }
 
-/*
- * Add nodes to the given path tree to unique-ify the result of a UNION.
- */
-static Path *
-make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
-				  PlannerInfo *root)
-{
-	RelOptInfo *result_rel = fetch_upper_rel(root, UPPERREL_SETOP, NULL);
-	List	   *groupList;
-	double		dNumGroups;
-
-	/* Identify the grouping semantics */
-	groupList = generate_setop_grouplist(op, tlist);
-
-	/*
-	 * XXX for the moment, take the number of distinct groups as equal to the
-	 * total input size, ie, the worst case.  This is too conservative, but
-	 * it's not clear how to get a decent estimate of the true size.  One
-	 * should note as well the propensity of novices to write UNION rather
-	 * than UNION ALL even when they don't expect any duplicates...
-	 */
-	dNumGroups = path->rows;
-
-	/* Decide whether to hash or sort */
-	if (choose_hashed_setop(root, groupList, path,
-							dNumGroups, dNumGroups,
-							"UNION"))
-	{
-		/* Hashed aggregate plan --- no sort needed */
-		path = (Path *) create_agg_path(root,
-										result_rel,
-										path,
-										create_pathtarget(root, tlist),
-										AGG_HASHED,
-										AGGSPLIT_SIMPLE,
-										groupList,
-										NIL,
-										NULL,
-										dNumGroups);
-	}
-	else
-	{
-		/* Sort and Unique */
-		if (groupList)
-			path = (Path *)
-				create_sort_path(root,
-								 result_rel,
-								 path,
-								 make_pathkeys_for_sortclauses(root,
-															   groupList,
-															   tlist),
-								 -1.0);
-		path = (Path *) create_upper_unique_path(root,
-												 result_rel,
-												 path,
-												 list_length(path->pathkeys),
-												 dNumGroups);
-	}
-
-	return path;
-}
-
 /*
  * postprocess_setop_rel - perform steps required after adding paths
  */
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 40ea19e6f1..28fed9d87f 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1890,7 +1890,8 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
 	 * For now, we don't support resjunk sort clauses on the output of a
 	 * setOperation tree --- you can only use the SQL92-spec options of
 	 * selecting an output column by name or number.  Enforce by checking that
-	 * transformSortClause doesn't add any items to tlist.
+	 * transformSortClause doesn't add any items to tlist.  Note, if changing
+	 * this, add_setop_child_rel_equivalences() will need to be updated.
 	 */
 	tllen = list_length(qry->targetList);
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6ec81637c1..14ef296ab7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -400,6 +400,8 @@ struct PlannerInfo
 	List	   *distinct_pathkeys;
 	/* sortClause pathkeys, if any */
 	List	   *sort_pathkeys;
+	/* set operator pathkeys, if any */
+	List	   *setop_pathkeys;
 
 	/* Canonicalised partition schemes used in the query. */
 	List	   *part_schemes pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5f500a1c69..914d9bdef5 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -173,6 +173,10 @@ extern void add_child_join_rel_equivalences(PlannerInfo *root,
 											AppendRelInfo **appinfos,
 											RelOptInfo *parent_joinrel,
 											RelOptInfo *child_joinrel);
+extern void add_setop_child_rel_equivalences(PlannerInfo *root,
+											 RelOptInfo *child_rel,
+											 List *child_tlist,
+											 List *setop_pathkeys);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
 													RelOptInfo *rel,
 													ec_matches_callback_type callback,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index e1d79ffdf3..5aeff21b96 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -44,7 +44,8 @@ extern PlannedStmt *standard_planner(Query *parse, const char *query_string,
 
 extern PlannerInfo *subquery_planner(PlannerGlobal *glob, Query *parse,
 									 PlannerInfo *parent_root,
-									 bool hasRecursion, double tuple_fraction);
+									 bool hasRecursion, double tuple_fraction,
+									 SetOperationStmt *setops);
 
 extern RowMarkType select_rowmark_type(RangeTblEntry *rte,
 									   LockClauseStrength strength);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 8e00716dc8..a52dec285d 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -53,6 +53,6 @@ extern void preprocess_aggrefs(PlannerInfo *root, Node *clause);
  * prototypes for prepunion.c
  */
 extern RelOptInfo *plan_set_operations(PlannerInfo *root);
-
+extern bool set_operation_ordered_results_useful(SetOperationStmt *setop);
 
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 2de8924b52..7d59fb4431 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1396,6 +1396,7 @@ SELECT x FROM test3cs WHERE x ~ 'a';
  abc
 (1 row)
 
+SET enable_hashagg TO off;
 SELECT x FROM test1cs UNION SELECT x FROM test2cs ORDER BY x;
   x  
 -----
@@ -1448,6 +1449,7 @@ SELECT DISTINCT x FROM test3cs ORDER BY x;
  ghi
 (4 rows)
 
+RESET enable_hashagg;
 SELECT count(DISTINCT x) FROM test3cs;
  count 
 -------
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 7fdb685313..5fd54a10b1 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1472,14 +1472,19 @@ explain (costs off) select * from t union select * from t order by 1,3;
    Sort Key: t.a, t.c
    Presorted Key: t.a
    ->  Unique
-         ->  Sort
+         ->  Merge Append
                Sort Key: t.a, t.b, t.c
-               ->  Gather
+               ->  Gather Merge
                      Workers Planned: 2
-                     ->  Parallel Append
+                     ->  Sort
+                           Sort Key: t.a, t.b, t.c
                            ->  Parallel Seq Scan on t
+               ->  Gather Merge
+                     Workers Planned: 2
+                     ->  Sort
+                           Sort Key: t_1.a, t_1.b, t_1.c
                            ->  Parallel Seq Scan on t t_1
-(11 rows)
+(16 rows)
 
 -- Full sort, not just incremental sort can be pushed below a gather merge path
 -- by generate_useful_gather_paths.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 882017afc9..26b718e903 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -412,16 +412,17 @@ set enable_hashagg to off;
 explain (costs off)
 select count(*) from
   ( select unique1 from tenk1 union select fivethous from tenk1 ) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                           QUERY PLAN                           
+----------------------------------------------------------------
  Aggregate
    ->  Unique
-         ->  Sort
+         ->  Merge Append
                Sort Key: tenk1.unique1
-               ->  Append
-                     ->  Index Only Scan using tenk1_unique1 on tenk1
+               ->  Index Only Scan using tenk1_unique1 on tenk1
+               ->  Sort
+                     Sort Key: tenk1_1.fivethous
                      ->  Seq Scan on tenk1 tenk1_1
-(7 rows)
+(8 rows)
 
 select count(*) from
   ( select unique1 from tenk1 union select fivethous from tenk1 ) ss;
@@ -950,16 +951,9 @@ select except select;
 -- check hashed implementation
 set enable_hashagg = true;
 set enable_sort = false;
-explain (costs off)
-select from generate_series(1,5) union select from generate_series(1,3);
-                           QUERY PLAN                           
-----------------------------------------------------------------
- HashAggregate
-   ->  Append
-         ->  Function Scan on generate_series
-         ->  Function Scan on generate_series generate_series_1
-(4 rows)
-
+-- We've no way to check hashed UNION as the empty pathkeys in the Append are
+-- fine to make use of Unique, which is cheaper than HashAggregate and we've
+-- no means to disable Unique.
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
                               QUERY PLAN                              
@@ -972,10 +966,6 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
                ->  Function Scan on generate_series generate_series_1
 (6 rows)
 
-select from generate_series(1,5) union select from generate_series(1,3);
---
-(1 row)
-
 select from generate_series(1,5) union all select from generate_series(1,3);
 --
 (8 rows)
@@ -1045,6 +1035,20 @@ select from generate_series(1,5) except all select from generate_series(1,3);
 --
 (2 rows)
 
+-- Try a variation of the above but with a CTE which contains a column, again
+-- with an empty final select list.
+-- Ensure we get the expected 1 row with 0 columns
+with cte as materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+--
+(1 row)
+
+-- Ensure we get the same result as the above.
+with cte as not materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+--
+(1 row)
+
 reset enable_hashagg;
 reset enable_sort;
 --
@@ -1081,6 +1085,7 @@ INSERT INTO t2 VALUES ('ab'), ('xy');
 set enable_seqscan = off;
 set enable_indexscan = on;
 set enable_bitmapscan = off;
+set enable_sort = off;
 explain (costs off)
  SELECT * FROM
  (SELECT a || b AS ab FROM t1
@@ -1162,6 +1167,7 @@ explain (costs off)
 reset enable_seqscan;
 reset enable_indexscan;
 reset enable_bitmapscan;
+reset enable_sort;
 -- This simpler variant of the above test has been observed to fail differently
 create table events (event_id int primary key);
 create table other_events (event_id int primary key);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 03837de846..80f28a97d7 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -555,6 +555,7 @@ SELECT x FROM test3cs WHERE x LIKE 'a%';
 SELECT x FROM test3cs WHERE x ILIKE 'a%';
 SELECT x FROM test3cs WHERE x SIMILAR TO 'a%';
 SELECT x FROM test3cs WHERE x ~ 'a';
+SET enable_hashagg TO off;
 SELECT x FROM test1cs UNION SELECT x FROM test2cs ORDER BY x;
 SELECT x FROM test2cs UNION SELECT x FROM test1cs ORDER BY x;
 SELECT x FROM test1cs INTERSECT SELECT x FROM test2cs;
@@ -562,6 +563,7 @@ SELECT x FROM test2cs INTERSECT SELECT x FROM test1cs;
 SELECT x FROM test1cs EXCEPT SELECT x FROM test2cs;
 SELECT x FROM test2cs EXCEPT SELECT x FROM test1cs;
 SELECT DISTINCT x FROM test3cs ORDER BY x;
+RESET enable_hashagg;
 SELECT count(DISTINCT x) FROM test3cs;
 SELECT x, count(*) FROM test3cs GROUP BY x ORDER BY x;
 SELECT x, row_number() OVER (ORDER BY x), rank() OVER (ORDER BY x) FROM test3cs ORDER BY x;
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d160db5458..8afc580c63 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -302,12 +302,12 @@ select except select;
 set enable_hashagg = true;
 set enable_sort = false;
 
-explain (costs off)
-select from generate_series(1,5) union select from generate_series(1,3);
+-- We've no way to check hashed UNION as the empty pathkeys in the Append are
+-- fine to make use of Unique, which is cheaper than HashAggregate and we've
+-- no means to disable Unique.
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
 
-select from generate_series(1,5) union select from generate_series(1,3);
 select from generate_series(1,5) union all select from generate_series(1,3);
 select from generate_series(1,5) intersect select from generate_series(1,3);
 select from generate_series(1,5) intersect all select from generate_series(1,3);
@@ -330,6 +330,17 @@ select from generate_series(1,5) intersect all select from generate_series(1,3);
 select from generate_series(1,5) except select from generate_series(1,3);
 select from generate_series(1,5) except all select from generate_series(1,3);
 
+-- Try a variation of the above but with a CTE which contains a column, again
+-- with an empty final select list.
+
+-- Ensure we get the expected 1 row with 0 columns
+with cte as materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+
+-- Ensure we get the same result as the above.
+with cte as not materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+
 reset enable_hashagg;
 reset enable_sort;
 
@@ -361,6 +372,7 @@ INSERT INTO t2 VALUES ('ab'), ('xy');
 set enable_seqscan = off;
 set enable_indexscan = on;
 set enable_bitmapscan = off;
+set enable_sort = off;
 
 explain (costs off)
  SELECT * FROM
@@ -407,6 +419,7 @@ explain (costs off)
 reset enable_seqscan;
 reset enable_indexscan;
 reset enable_bitmapscan;
+reset enable_sort;
 
 -- This simpler variant of the above test has been observed to fail differently
 
-- 
2.34.1

v2-0002-Fix-UNION-planner-bug-and-add-regression-test.patchapplication/octet-stream; name=v2-0002-Fix-UNION-planner-bug-and-add-regression-test.patchDownload

From 6a078b3fdd5c52030ddc679d02a29b7d05ac5d0b Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 21 May 2024 12:32:05 +1200
Subject: [PATCH v2 2/2] Fix UNION planner bug and add regression test

---
 src/backend/optimizer/prep/prepunion.c | 17 ++++++++++-------
 src/test/regress/expected/union.out    | 13 +++++++++++++
 src/test/regress/sql/union.sql         |  6 ++++++
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 30068c27a1..e3ba0d17cf 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -714,7 +714,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	List	   *groupList = NIL;
 	Path	   *apath;
 	Path	   *gpath = NULL;
-	bool		try_sorted;
+	bool		try_sorted = false;
 	List	   *union_pathkeys = NIL;
 
 	/*
@@ -741,17 +741,20 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	*pTargetList = tlist;
 
 	/* For for UNIONs (not UNION ALL), try sorting, if sorting is possible */
-	try_sorted = !op->all && grouping_is_sortable(op->groupClauses);
-
-	if (try_sorted)
+	if (!op->all)
 	{
 		/* Identify the grouping semantics */
 		groupList = generate_setop_grouplist(op, tlist);
 
-		/* Determine the pathkeys for sorting by the whole target list */
-		union_pathkeys = make_pathkeys_for_sortclauses(root, groupList, tlist);
+		if (grouping_is_sortable(op->groupClauses))
+		{
+			try_sorted = true;
+			/* Determine the pathkeys for sorting by the whole target list */
+			union_pathkeys = make_pathkeys_for_sortclauses(root, groupList,
+														   tlist);
 
-		root->query_pathkeys = union_pathkeys;
+			root->query_pathkeys = union_pathkeys;
+		}
 	}
 
 	/*
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 26b718e903..0fd0e1c38b 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -815,6 +815,19 @@ select x from (values (row(1, 2)), (row(1, 3))) _(x) except select x from (value
  (1,3)
 (1 row)
 
+-- non-sortable type
+-- Ensure we get a HashAggregate plan.  Keep enable_hashagg=off to ensure
+-- there's no chance of a sort.
+explain (costs off) select '123'::xid union select '123'::xid;
+        QUERY PLAN         
+---------------------------
+ HashAggregate
+   Group Key: ('123'::xid)
+   ->  Append
+         ->  Result
+         ->  Result
+(5 rows)
+
 reset enable_hashagg;
 --
 -- Mixed types
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index 8afc580c63..f8826514e4 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -244,6 +244,12 @@ explain (costs off)
 select x from (values (row(1, 2)), (row(1, 3))) _(x) except select x from (values (row(1, 2)), (row(1, 4))) _(x);
 select x from (values (row(1, 2)), (row(1, 3))) _(x) except select x from (values (row(1, 2)), (row(1, 4))) _(x);
 
+-- non-sortable type
+
+-- Ensure we get a HashAggregate plan.  Keep enable_hashagg=off to ensure
+-- there's no chance of a sort.
+explain (costs off) select '123'::xid union select '123'::xid;
+
 reset enable_hashagg;
 
 --
-- 
2.34.1

Heikki Linnakangas

hlinnaka@iki.fi

over 1 year ago

In reply to: David Rowley (#1)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

On 21/05/2024 05:58, David Rowley wrote:

Let this thread be for at least the coding portion of this or be my
thread for this patch for the v18 cycle if the RMT rules in favour of
keeping that code reverted for v17.

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

If anyone wants to have a look, I'd be grateful for that. Tom did
call for further review after this being the 4th issue reported for
66c0185a3.

My planner experience is a bit rusty, but I took a quick look. Looks
generally OK to me. Some comments below:

+ /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible */

Duplicated word: "For for"

/*
* build_setop_child_paths
* Build paths for the set op child relation denoted by 'rel'.
*
* interesting_pathkeys: if not NIL, also include paths that suit these
* pathkeys, sorting any unsorted paths as required.
* *pNumGroups: if not NULL, we estimate the number of distinct groups
* in the result, and store it there

The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent.

I have a vague feeling that this comment deserves to be longer. The
function does a lot of things. How is 'child_tlist' different from
rel->reltarget for example?

'interesting_pathkeys' is modified by the call to
add_setop_child_rel_equivalences(): it adds members to the
EquivalenceClasses of the pathkeys. Is that worth mentioning here, or is
that obvious to someone who know more about the planner?

/*
* Create paths to suit final sort order required for setop_pathkeys.
* Here we'll sort the cheapest input path (if not sorted already) and
* incremental sort any paths which are partially sorted.
*/
is_sorted = pathkeys_count_contained_in(setop_pathkeys,
subpath->pathkeys,
&presorted_keys);

if (!is_sorted)
{

Maybe also mention that if it's already sorted, it's used as is.

BTW, could the same machinery be used for INTERSECT as well? There was a
brief mention of that in the original thread, but I didn't understand
the details. Not for v17, but I'm curious. I was wondering if
build_setop_child_paths() should be named build_union_child_paths(),
since it's only used with UNIONs, but I guess it could be used as is for
INTERSECT too.

# Testing

postgres=# begin; create table foo as select i from generate_series(1,
1000000) i; create index on foo (i); commit;
BEGIN
SELECT 1000000
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;
QUERY PLAN

------------------------------------------------------------------------------------------------------
Unique (cost=144370.89..149370.89 rows=1000001 width=4)
-> Sort (cost=144370.89..146870.89 rows=1000001 width=4)
Sort Key: (1)
-> Append (cost=0.00..31038.44 rows=1000001 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replace
the "union" with "union all" you do get a MergeAppend.

Some more cases where I hoped for a MergeAppend:

postgres=# explain (select i, 'foo' from foo union select i, 'foo' from
foo) order by 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Unique (cost=380767.54..395767.54 rows=2000000 width=36)
-> Sort (cost=380767.54..385767.54 rows=2000000 width=36)
Sort Key: foo.i, ('foo'::text)
-> Append (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=36)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)

postgres=# explain (select 'foo', i from foo union select 'bar', i from
foo) order by 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Unique (cost=380767.54..395767.54 rows=2000000 width=36)
-> Sort (cost=380767.54..385767.54 rows=2000000 width=36)
Sort Key: ('foo'::text), foo.i
-> Append (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=36)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)

The following two queries are the same from the user's point of view,
but one is written using WITH:

postgres=# explain (select i from foo union (select 1::int order by 1)
union select i from foo) order by 1;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Unique (cost=326083.66..336083.67 rows=2000001 width=4)
-> Sort (cost=326083.66..331083.67 rows=2000001 width=4)
Sort Key: foo.i
-> Append (cost=0.42..62076.87 rows=2000001 width=4)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=4)
(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1) (select i
from foo union select i from x union select i from foo) order by 1;
QUERY PLAN

------------------------------------------------------------------------------------------------------
Unique (cost=0.89..82926.54 rows=2000001 width=4)
-> Merge Append (cost=0.89..77926.54 rows=2000001 width=4)
Sort Key: foo.i
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
-> Sort (cost=0.02..0.03 rows=1 width=4)
Sort Key: (1)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=4)
(8 rows)

I would've expected a MergeAppend in both cases.

None of these test cases are broken as such, you just don't get the
benefit of the optimization. I suspect they might all have the same root
cause, as they all involve constants in the target list. I think that's
a pretty common use case of UNION though.

--
Heikki Linnakangas
Neon (https://neon.tech)

Alvaro Herrera

alvherre@alvh.no-ip.org

over 1 year ago

In reply to: David Rowley (#1)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

On 2024-May-21, David Rowley wrote:

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

I would like to request that you don't keep 0001's message as you have
it here. It'd be more readable to take 66c0185a3d14's whole commit
message with a small suffix like "try 2" in the commit title, and add an
additional second paragraph stating it was transiently reverted by
7204f35919b7. Otherwise it's harder to make sense of the commit on its
own later.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Alvaro Herrera (#3)

1 attachment(s)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

On Wed, 22 May 2024 at 00:35, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-May-21, David Rowley wrote:

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

I would like to request that you don't keep 0001's message as you have
it here. It'd be more readable to take 66c0185a3d14's whole commit
message with a small suffix like "try 2" in the commit title, and add an
additional second paragraph stating it was transiently reverted by
7204f35919b7. Otherwise it's harder to make sense of the commit on its
own later.

Thanks for having a look. I was planning to have the commit message
as per attached. I'd only split the patch for ease of review per
request of Tom. I should have mentioned that here.

I would adjust the exact wording in the final paragraph as required
depending on what plan materialises.

This also fixes up the comment stuff that Heikki mentioned.

David

Attachments:

v2-0001-Allow-planner-to-use-Merge-Append-to-efficiently-.patchapplication/octet-stream; name=v2-0001-Allow-planner-to-use-Merge-Append-to-efficiently-.patchDownload

From 2d8d97f75c4678966a8bb079018d9769c9f09358 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Tue, 21 May 2024 11:24:28 +1200
Subject: [PATCH v2] Allow planner to use Merge Append to efficiently implement
 UNION, take 2

Until now, UNION queries have often been suboptimal as the planner has
only ever considered using an Append node and making the results unique
by either using a Hash Aggregate, or by Sorting the entire Append result
and running it through the Unique operator.  Both of these methods
always require reading all rows from the union subqueries.

Here we adjust the union planner so that it can request that each subquery
produce results in target list order so that these can be Merge Appended
together and made unique with a Unique node.  This can improve performance
significantly as the union child can make use of the likes of btree
indexes and/or Merge Joins to provide the top-level UNION with presorted
input.  This is especially good if the top-level UNION contains a LIMIT
node that limits the output rows to a small subset of the unioned rows as
cheap startup plans can be used.

This was originally committed in 66c0185a3 but reverted by 7204f3591 due
to an unfortunately timed bug report shortly before beta1.

Author: David Rowley
Reviewed-by: Richard Guo, Andy Fan
Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
---
 .../postgres_fdw/expected/postgres_fdw.out    |   7 +
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   9 +
 src/backend/optimizer/path/allpaths.c         |   5 +-
 src/backend/optimizer/path/equivclass.c       |  61 ++
 src/backend/optimizer/path/pathkeys.c         |  19 +
 src/backend/optimizer/plan/planner.c          | 116 ++-
 src/backend/optimizer/plan/subselect.c        |  15 +-
 src/backend/optimizer/prep/prepunion.c        | 723 ++++++++++++------
 src/backend/parser/analyze.c                  |   3 +-
 src/include/nodes/pathnodes.h                 |   2 +
 src/include/optimizer/paths.h                 |   4 +
 src/include/optimizer/planner.h               |   3 +-
 src/include/optimizer/prep.h                  |   2 +-
 .../regress/expected/collate.icu.utf8.out     |   2 +
 .../regress/expected/incremental_sort.out     |  13 +-
 src/test/regress/expected/union.out           |  59 +-
 src/test/regress/sql/collate.icu.utf8.sql     |   2 +
 src/test/regress/sql/union.sql                |  25 +-
 18 files changed, 783 insertions(+), 287 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 9ae36d3059..078b8a966f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11511,6 +11511,10 @@ DROP INDEX base_tbl1_idx;
 DROP INDEX base_tbl2_idx;
 DROP INDEX async_p3_idx;
 -- UNION queries
+SET enable_sort TO off;
+SET enable_incremental_sort TO off;
+-- Adjust fdw_startup_cost so that we get an unordered path in the Append.
+ALTER SERVER loopback2 OPTIONS (ADD fdw_startup_cost '0.00');
 EXPLAIN (VERBOSE, COSTS OFF)
 INSERT INTO result_tbl
 (SELECT a, b, 'AAA' || c FROM async_p1 ORDER BY a LIMIT 10)
@@ -11592,6 +11596,9 @@ SELECT * FROM result_tbl ORDER BY a;
 (12 rows)
 
 DELETE FROM result_tbl;
+RESET enable_incremental_sort;
+RESET enable_sort;
+ALTER SERVER loopback2 OPTIONS (DROP fdw_startup_cost);
 -- Disable async execution if we use gating Result nodes for pseudoconstant
 -- quals
 EXPLAIN (VERBOSE, COSTS OFF)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1f31ac14df..09ba234e43 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3885,6 +3885,11 @@ DROP INDEX base_tbl2_idx;
 DROP INDEX async_p3_idx;
 
 -- UNION queries
+SET enable_sort TO off;
+SET enable_incremental_sort TO off;
+-- Adjust fdw_startup_cost so that we get an unordered path in the Append.
+ALTER SERVER loopback2 OPTIONS (ADD fdw_startup_cost '0.00');
+
 EXPLAIN (VERBOSE, COSTS OFF)
 INSERT INTO result_tbl
 (SELECT a, b, 'AAA' || c FROM async_p1 ORDER BY a LIMIT 10)
@@ -3911,6 +3916,10 @@ UNION ALL
 SELECT * FROM result_tbl ORDER BY a;
 DELETE FROM result_tbl;
 
+RESET enable_incremental_sort;
+RESET enable_sort;
+ALTER SERVER loopback2 OPTIONS (DROP fdw_startup_cost);
+
 -- Disable async execution if we use gating Result nodes for pseudoconstant
 -- quals
 EXPLAIN (VERBOSE, COSTS OFF)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 10aeebc2c1..4895cee994 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2633,9 +2633,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	Assert(root->plan_params == NIL);
 
 	/* Generate a subroot and Paths for the subquery */
-	rel->subroot = subquery_planner(root->glob, subquery,
-									root,
-									false, tuple_fraction);
+	rel->subroot = subquery_planner(root->glob, subquery, root, false,
+									tuple_fraction, NULL);
 
 	/* Isolate the params needed by this specific subplan */
 	rel->subplan_params = root->plan_params;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 13400af3ef..21ce1ae2e1 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2882,6 +2882,67 @@ add_child_join_rel_equivalences(PlannerInfo *root,
 	MemoryContextSwitchTo(oldcontext);
 }
 
+/*
+ * add_setop_child_rel_equivalences
+ *		Add equivalence members for each non-resjunk target in 'child_tlist'
+ *		to the EquivalenceClass in the corresponding setop_pathkey's pk_eclass.
+ *
+ * 'root' is the PlannerInfo belonging to the top-level set operation.
+ * 'child_rel' is the RelOptInfo of the child relation we're adding
+ * EquivalenceMembers for.
+ * 'child_tlist' is the target list for the setop child relation.  The target
+ * list expressions are what we add as EquivalenceMembers.
+ * 'setop_pathkeys' is a list of PathKeys which must contain an entry for each
+ * non-resjunk target in 'child_tlist'.
+ */
+void
+add_setop_child_rel_equivalences(PlannerInfo *root, RelOptInfo *child_rel,
+								 List *child_tlist, List *setop_pathkeys)
+{
+	ListCell   *lc;
+	ListCell   *lc2 = list_head(setop_pathkeys);
+
+	foreach(lc, child_tlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		EquivalenceMember *parent_em;
+		PathKey    *pk;
+
+		if (tle->resjunk)
+			continue;
+
+		if (lc2 == NULL)
+			elog(ERROR, "too few pathkeys for set operation");
+
+		pk = lfirst_node(PathKey, lc2);
+		parent_em = linitial(pk->pk_eclass->ec_members);
+
+		/*
+		 * We can safely pass the parent member as the first member in the
+		 * ec_members list as this is added first in generate_union_paths,
+		 * likewise, the JoinDomain can be that of the initial member of the
+		 * Pathkey's EquivalenceClass.
+		 */
+		add_eq_member(pk->pk_eclass,
+					  tle->expr,
+					  child_rel->relids,
+					  parent_em->em_jdomain,
+					  parent_em,
+					  exprType((Node *) tle->expr));
+
+		lc2 = lnext(setop_pathkeys, lc2);
+	}
+
+	/*
+	 * transformSetOperationStmt() ensures that the targetlist never contains
+	 * any resjunk columns, so all eclasses that exist in 'root' must have
+	 * received a new member in the loop above.  Add them to the child_rel's
+	 * eclass_indexes.
+	 */
+	child_rel->eclass_indexes = bms_add_range(child_rel->eclass_indexes, 0,
+											  list_length(root->eq_classes) - 1);
+}
+
 
 /*
  * generate_implied_equalities_for_column
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 157bc6a36d..8b258cbef9 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -2191,6 +2191,22 @@ pathkeys_useful_for_grouping(PlannerInfo *root, List *pathkeys)
 	return n;
 }
 
+/*
+ * pathkeys_useful_for_setop
+ *		Count the number of leading common pathkeys root's 'setop_pathkeys' in
+ *		'pathkeys'.
+ */
+static int
+pathkeys_useful_for_setop(PlannerInfo *root, List *pathkeys)
+{
+	int			n_common_pathkeys;
+
+	(void) pathkeys_count_contained_in(root->setop_pathkeys, pathkeys,
+									   &n_common_pathkeys);
+
+	return n_common_pathkeys;
+}
+
 /*
  * truncate_useless_pathkeys
  *		Shorten the given pathkey list to just the useful pathkeys.
@@ -2208,6 +2224,9 @@ truncate_useless_pathkeys(PlannerInfo *root,
 	if (nuseful2 > nuseful)
 		nuseful = nuseful2;
 	nuseful2 = pathkeys_useful_for_grouping(root, pathkeys);
+	if (nuseful2 > nuseful)
+		nuseful = nuseful2;
+	nuseful2 = pathkeys_useful_for_setop(root, pathkeys);
 	if (nuseful2 > nuseful)
 		nuseful = nuseful2;
 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ddd2387840..032818423f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -54,6 +54,7 @@
 #include "optimizer/tlist.h"
 #include "parser/analyze.h"
 #include "parser/parse_agg.h"
+#include "parser/parse_clause.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
@@ -119,12 +120,15 @@ typedef struct
 {
 	List	   *activeWindows;	/* active windows, if any */
 	grouping_sets_data *gset_data;	/* grouping sets data, if any */
+	SetOperationStmt *setop;	/* parent set operation or NULL if not a
+								 * subquery belonging to a set operation */
 } standard_qp_extra;
 
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
-static void grouping_planner(PlannerInfo *root, double tuple_fraction);
+static void grouping_planner(PlannerInfo *root, double tuple_fraction,
+							 SetOperationStmt *setops);
 static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
 static List *remap_to_groupclause_idx(List *groupClause, List *gsets,
 									  int *tleref_to_colnum_map);
@@ -249,6 +253,8 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
 								 List *targetList,
 								 List *groupClause);
 static int	common_prefix_cmp(const void *a, const void *b);
+static List *generate_setop_child_grouplist(SetOperationStmt *op,
+											List *targetlist);
 
 
 /*****************************************************************************
@@ -406,8 +412,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	}
 
 	/* primary planning entry point (may recurse for subqueries) */
-	root = subquery_planner(glob, parse, NULL,
-							false, tuple_fraction);
+	root = subquery_planner(glob, parse, NULL, false, tuple_fraction, NULL);
 
 	/* Select best Path and turn it into a Plan */
 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
@@ -598,6 +603,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
  * hasRecursion is true if this is a recursive WITH query.
  * tuple_fraction is the fraction of tuples we expect will be retrieved.
  * tuple_fraction is interpreted as explained for grouping_planner, below.
+ * setops is used for set operation subqueries to provide the subquery with
+ * the context in which it's being used so that Paths correctly sorted for the
+ * set operation can be generated.  NULL when not planning a set operation
+ * child.
  *
  * Basically, this routine does the stuff that should only be done once
  * per Query object.  It then calls grouping_planner.  At one time,
@@ -616,9 +625,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
  *--------------------
  */
 PlannerInfo *
-subquery_planner(PlannerGlobal *glob, Query *parse,
-				 PlannerInfo *parent_root,
-				 bool hasRecursion, double tuple_fraction)
+subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
+				 bool hasRecursion, double tuple_fraction,
+				 SetOperationStmt *setops)
 {
 	PlannerInfo *root;
 	List	   *newWithCheckOptions;
@@ -1077,7 +1086,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 	/*
 	 * Do the main planning.
 	 */
-	grouping_planner(root, tuple_fraction);
+	grouping_planner(root, tuple_fraction, setops);
 
 	/*
 	 * Capture the set of outer-level param IDs we have access to, for use in
@@ -1275,7 +1284,11 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
  *	  0 < tuple_fraction < 1: expect the given fraction of tuples available
  *		from the plan to be retrieved
  *	  tuple_fraction >= 1: tuple_fraction is the absolute number of tuples
- *		expected to be retrieved (ie, a LIMIT specification)
+ *		expected to be retrieved (ie, a LIMIT specification).
+ * setops is used for set operation subqueries to provide the subquery with
+ * the context in which it's being used so that Paths correctly sorted for the
+ * set operation can be generated.  NULL when not planning a set operation
+ * child.
  *
  * Returns nothing; the useful output is in the Paths we attach to the
  * (UPPERREL_FINAL, NULL) upperrel in *root.  In addition,
@@ -1286,7 +1299,8 @@ preprocess_phv_expression(PlannerInfo *root, Expr *expr)
  *--------------------
  */
 static void
-grouping_planner(PlannerInfo *root, double tuple_fraction)
+grouping_planner(PlannerInfo *root, double tuple_fraction,
+				 SetOperationStmt *setops)
 {
 	Query	   *parse = root->parse;
 	int64		offset_est = 0;
@@ -1321,17 +1335,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 
 	if (parse->setOperations)
 	{
-		/*
-		 * If there's a top-level ORDER BY, assume we have to fetch all the
-		 * tuples.  This might be too simplistic given all the hackery below
-		 * to possibly avoid the sort; but the odds of accurate estimates here
-		 * are pretty low anyway.  XXX try to get rid of this in favor of
-		 * letting plan_set_operations generate both fast-start and
-		 * cheapest-total paths.
-		 */
-		if (parse->sortClause)
-			root->tuple_fraction = 0.0;
-
 		/*
 		 * Construct Paths for set operations.  The results will not need any
 		 * work except perhaps a top-level sort and/or LIMIT.  Note that any
@@ -1501,6 +1504,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		qp_extra.activeWindows = activeWindows;
 		qp_extra.gset_data = gset_data;
 
+		/*
+		 * If we're a subquery for a set operation, store the SetOperationStmt
+		 * in qp_extra.
+		 */
+		qp_extra.setop = setops;
+
 		/*
 		 * Generate the best unsorted and presorted paths for the scan/join
 		 * portion of this Query, ie the processing represented by the
@@ -3453,6 +3462,27 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 									  parse->sortClause,
 									  tlist);
 
+	/* setting setop_pathkeys might be useful to the union planner */
+	if (qp_extra->setop != NULL &&
+		set_operation_ordered_results_useful(qp_extra->setop))
+	{
+		List	   *groupClauses;
+		bool		sortable;
+
+		groupClauses = generate_setop_child_grouplist(qp_extra->setop, tlist);
+
+		root->setop_pathkeys =
+			make_pathkeys_for_sortclauses_extended(root,
+												   &groupClauses,
+												   tlist,
+												   false,
+												   &sortable);
+		if (!sortable)
+			root->setop_pathkeys = NIL;
+	}
+	else
+		root->setop_pathkeys = NIL;
+
 	/*
 	 * Figure out whether we want a sorted result from query_planner.
 	 *
@@ -3462,7 +3492,9 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 	 * sortable DISTINCT clause that's more rigorous than the ORDER BY clause,
 	 * we try to produce output that's sufficiently well sorted for the
 	 * DISTINCT.  Otherwise, if there is an ORDER BY clause, we want to sort
-	 * by the ORDER BY clause.
+	 * by the ORDER BY clause.  Otherwise, if we're a subquery being planned
+	 * for a set operation which can benefit from presorted results and have a
+	 * sortable targetlist, we want to sort by the target list.
 	 *
 	 * Note: if we have both ORDER BY and GROUP BY, and ORDER BY is a superset
 	 * of GROUP BY, it would be tempting to request sort by ORDER BY --- but
@@ -3480,6 +3512,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)
 		root->query_pathkeys = root->distinct_pathkeys;
 	else if (root->sort_pathkeys)
 		root->query_pathkeys = root->sort_pathkeys;
+	else if (root->setop_pathkeys != NIL)
+		root->query_pathkeys = root->setop_pathkeys;
 	else
 		root->query_pathkeys = NIL;
 }
@@ -7923,3 +7957,43 @@ group_by_has_partkey(RelOptInfo *input_rel,
 
 	return true;
 }
+
+/*
+ * generate_setop_child_grouplist
+ *		Build a SortGroupClause list defining the sort/grouping properties
+ *		of the child of a set operation.
+ *
+ * This is similar to generate_setop_grouplist() but differs as the setop
+ * child query's targetlist entries may already have a tleSortGroupRef
+ * assigned for other purposes, such as GROUP BYs.  Here we keep the
+ * SortGroupClause list in the same order as 'op' groupClauses and just adjust
+ * the tleSortGroupRef to reference the TargetEntry's 'ressortgroupref'.
+ */
+static List *
+generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
+{
+	List	   *grouplist = copyObject(op->groupClauses);
+	ListCell   *lg;
+	ListCell   *lt;
+
+	lg = list_head(grouplist);
+	foreach(lt, targetlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lt);
+		SortGroupClause *sgc;
+
+		/* resjunk columns could have sortgrouprefs.  Leave these alone */
+		if (tle->resjunk)
+			continue;
+
+		/* we expect every non-resjunk target to have a SortGroupClause */
+		Assert(lg != NULL);
+		sgc = (SortGroupClause *) lfirst(lg);
+		lg = lnext(grouplist, lg);
+
+		/* assign a tleSortGroupRef, or reuse the existing one */
+		sgc->tleSortGroupRef = assignSortGroupRef(tle, targetlist);
+	}
+	Assert(lg == NULL);
+	return grouplist;
+}
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index d5fa281b10..e35ebea8b4 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -218,9 +218,8 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 	Assert(root->plan_params == NIL);
 
 	/* Generate Paths for the subquery */
-	subroot = subquery_planner(root->glob, subquery,
-							   root,
-							   false, tuple_fraction);
+	subroot = subquery_planner(root->glob, subquery, root, false,
+							   tuple_fraction, NULL);
 
 	/* Isolate the params needed by this specific subplan */
 	plan_params = root->plan_params;
@@ -266,9 +265,8 @@ make_subplan(PlannerInfo *root, Query *orig_subquery,
 		if (subquery)
 		{
 			/* Generate Paths for the ANY subquery; we'll need all rows */
-			subroot = subquery_planner(root->glob, subquery,
-									   root,
-									   false, 0.0);
+			subroot = subquery_planner(root->glob, subquery, root, false, 0.0,
+									   NULL);
 
 			/* Isolate the params needed by this specific subplan */
 			plan_params = root->plan_params;
@@ -967,9 +965,8 @@ SS_process_ctes(PlannerInfo *root)
 		 * Generate Paths for the CTE query.  Always plan for full retrieval
 		 * --- we don't have enough info to predict otherwise.
 		 */
-		subroot = subquery_planner(root->glob, subquery,
-								   root,
-								   cte->cterecursive, 0.0);
+		subroot = subquery_planner(root->glob, subquery, root,
+								   cte->cterecursive, 0.0, NULL);
 
 		/*
 		 * Since the current query level doesn't yet contain any RTEs, it
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 296f866677..1c69c6e97e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -43,11 +43,15 @@ static RelOptInfo *recurse_set_operations(Node *setOp, PlannerInfo *root,
 										  bool junkOK,
 										  int flag, List *refnames_tlist,
 										  List **pTargetList,
-										  double *pNumGroups);
+										  bool *istrivial_tlist);
 static RelOptInfo *generate_recursion_path(SetOperationStmt *setOp,
 										   PlannerInfo *root,
 										   List *refnames_tlist,
 										   List **pTargetList);
+static void build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
+									bool trivial_tlist, List *child_tlist,
+									List *interesting_pathkeys,
+									double *pNumGroups);
 static RelOptInfo *generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 										List *refnames_tlist,
 										List **pTargetList);
@@ -57,9 +61,8 @@ static RelOptInfo *generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *ro
 static List *plan_union_children(PlannerInfo *root,
 								 SetOperationStmt *top_union,
 								 List *refnames_tlist,
-								 List **tlist_list);
-static Path *make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
-							   PlannerInfo *root);
+								 List **tlist_list,
+								 List **istrivial_tlist);
 static void postprocess_setop_rel(PlannerInfo *root, RelOptInfo *rel);
 static bool choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 								Path *input_path,
@@ -114,10 +117,10 @@ plan_set_operations(PlannerInfo *root)
 	Assert(parse->distinctClause == NIL);
 
 	/*
-	 * In the outer query level, we won't have any true equivalences to deal
-	 * with; but we do want to be able to make pathkeys, which will require
-	 * single-member EquivalenceClasses.  Indicate that EC merging is complete
-	 * so that pathkeys.c won't complain.
+	 * In the outer query level, equivalence classes are limited to classes
+	 * which define that the top-level target entry is equivalent to the
+	 * corresponding child target entry.  There won't be any equivalence class
+	 * merging.  Mark that merging is complete to allow us to make pathkeys.
 	 */
 	Assert(root->eq_classes == NIL);
 	root->ec_merging_done = true;
@@ -152,6 +155,8 @@ plan_set_operations(PlannerInfo *root)
 	}
 	else
 	{
+		bool		trivial_tlist;
+
 		/*
 		 * Recurse on setOperations tree to generate paths for set ops. The
 		 * final output paths should have just the column types shown as the
@@ -163,7 +168,7 @@ plan_set_operations(PlannerInfo *root)
 										   true, -1,
 										   leftmostQuery->targetList,
 										   &top_tlist,
-										   NULL);
+										   &trivial_tlist);
 	}
 
 	/* Must return the built tlist into root->processed_tlist. */
@@ -172,6 +177,31 @@ plan_set_operations(PlannerInfo *root)
 	return setop_rel;
 }
 
+/*
+ * set_operation_ordered_results_useful
+ *		Return true if the given SetOperationStmt can be executed by utilizing
+ *		paths that provide sorted input according to the setop's targetlist.
+ *		Returns false when sorted paths are not any more useful then unsorted
+ *		ones.
+ */
+bool
+set_operation_ordered_results_useful(SetOperationStmt *setop)
+{
+	/*
+	 * Paths sorted by the targetlist are useful for UNION as we can opt to
+	 * MergeAppend the sorted paths then Unique them.  Ordered paths are no
+	 * more useful than unordered ones for UNION ALL.
+	 */
+	if (!setop->all && setop->op == SETOP_UNION)
+		return true;
+
+	/*
+	 * EXCEPT / EXCEPT ALL / INTERSECT / INTERSECT ALL cannot yet utilize
+	 * correctly sorted input paths.
+	 */
+	return false;
+}
+
 /*
  * recurse_set_operations
  *	  Recursively handle one step in a tree of set operations
@@ -184,8 +214,8 @@ plan_set_operations(PlannerInfo *root)
  *
  * Returns a RelOptInfo for the subtree, as well as these output parameters:
  * *pTargetList: receives the fully-fledged tlist for the subtree's top plan
- * *pNumGroups: if not NULL, we estimate the number of distinct groups
- *		in the result, and store it there
+ * *istrivial_tlist: true if, and only if, datatypes between parent and child
+ * match.
  *
  * The pTargetList output parameter is mostly redundant with the pathtarget
  * of the returned RelOptInfo, but for the moment we need it because much of
@@ -202,9 +232,11 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 					   bool junkOK,
 					   int flag, List *refnames_tlist,
 					   List **pTargetList,
-					   double *pNumGroups)
+					   bool *istrivial_tlist)
 {
-	RelOptInfo *rel = NULL;		/* keep compiler quiet */
+	RelOptInfo *rel;
+
+	*istrivial_tlist = true;	/* for now */
 
 	/* Guard against stack overflow due to overly complex setop nests */
 	check_stack_depth();
@@ -213,11 +245,9 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 	{
 		RangeTblRef *rtr = (RangeTblRef *) setOp;
 		RangeTblEntry *rte = root->simple_rte_array[rtr->rtindex];
+		SetOperationStmt *setops;
 		Query	   *subquery = rte->subquery;
 		PlannerInfo *subroot;
-		RelOptInfo *final_rel;
-		Path	   *subpath;
-		Path	   *path;
 		List	   *tlist;
 		bool		trivial_tlist;
 
@@ -229,11 +259,16 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 		/* plan_params should not be in use in current query level */
 		Assert(root->plan_params == NIL);
 
+		/*
+		 * Pass the set operation details to the subquery_planner to have it
+		 * consider generating Paths correctly ordered for the set operation.
+		 */
+		setops = castNode(SetOperationStmt, root->parse->setOperations);
+
 		/* Generate a subroot and Paths for the subquery */
-		subroot = rel->subroot = subquery_planner(root->glob, subquery,
-												  root,
-												  false,
-												  root->tuple_fraction);
+		subroot = rel->subroot = subquery_planner(root->glob, subquery, root,
+												  false, root->tuple_fraction,
+												  setops);
 
 		/*
 		 * It should not be possible for the primitive query to contain any
@@ -254,90 +289,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 
 		/* Return the fully-fledged tlist to caller, too */
 		*pTargetList = tlist;
-
-		/*
-		 * Mark rel with estimated output rows, width, etc.  Note that we have
-		 * to do this before generating outer-query paths, else
-		 * cost_subqueryscan is not happy.
-		 */
-		set_subquery_size_estimates(root, rel);
-
-		/*
-		 * Since we may want to add a partial path to this relation, we must
-		 * set its consider_parallel flag correctly.
-		 */
-		final_rel = fetch_upper_rel(subroot, UPPERREL_FINAL, NULL);
-		rel->consider_parallel = final_rel->consider_parallel;
-
-		/*
-		 * For the moment, we consider only a single Path for the subquery.
-		 * This should change soon (make it look more like
-		 * set_subquery_pathlist).
-		 */
-		subpath = get_cheapest_fractional_path(final_rel,
-											   root->tuple_fraction);
-
-		/*
-		 * Stick a SubqueryScanPath atop that.
-		 *
-		 * We don't bother to determine the subquery's output ordering since
-		 * it won't be reflected in the set-op result anyhow; so just label
-		 * the SubqueryScanPath with nil pathkeys.  (XXX that should change
-		 * soon too, likely.)
-		 */
-		path = (Path *) create_subqueryscan_path(root, rel, subpath,
-												 trivial_tlist,
-												 NIL, NULL);
-
-		add_path(rel, path);
-
-		/*
-		 * If we have a partial path for the child relation, we can use that
-		 * to build a partial path for this relation.  But there's no point in
-		 * considering any path but the cheapest.
-		 */
-		if (rel->consider_parallel && bms_is_empty(rel->lateral_relids) &&
-			final_rel->partial_pathlist != NIL)
-		{
-			Path	   *partial_subpath;
-			Path	   *partial_path;
-
-			partial_subpath = linitial(final_rel->partial_pathlist);
-			partial_path = (Path *)
-				create_subqueryscan_path(root, rel, partial_subpath,
-										 trivial_tlist,
-										 NIL, NULL);
-			add_partial_path(rel, partial_path);
-		}
-
-		/*
-		 * Estimate number of groups if caller wants it.  If the subquery used
-		 * grouping or aggregation, its output is probably mostly unique
-		 * anyway; otherwise do statistical estimation.
-		 *
-		 * XXX you don't really want to know about this: we do the estimation
-		 * using the subquery's original targetlist expressions, not the
-		 * subroot->processed_tlist which might seem more appropriate.  The
-		 * reason is that if the subquery is itself a setop, it may return a
-		 * processed_tlist containing "varno 0" Vars generated by
-		 * generate_append_tlist, and those would confuse estimate_num_groups
-		 * mightily.  We ought to get rid of the "varno 0" hack, but that
-		 * requires a redesign of the parsetree representation of setops, so
-		 * that there can be an RTE corresponding to each setop's output.
-		 */
-		if (pNumGroups)
-		{
-			if (subquery->groupClause || subquery->groupingSets ||
-				subquery->distinctClause ||
-				subroot->hasHavingQual || subquery->hasAggs)
-				*pNumGroups = subpath->rows;
-			else
-				*pNumGroups = estimate_num_groups(subroot,
-												  get_tlist_exprs(subquery->targetList, false),
-												  subpath->rows,
-												  NULL,
-												  NULL);
-		}
+		*istrivial_tlist = trivial_tlist;
 	}
 	else if (IsA(setOp, SetOperationStmt))
 	{
@@ -352,8 +304,6 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 			rel = generate_nonunion_paths(op, root,
 										  refnames_tlist,
 										  pTargetList);
-		if (pNumGroups)
-			*pNumGroups = rel->rows;
 
 		/*
 		 * If necessary, add a Result node to project the caller-requested
@@ -383,6 +333,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 												*pTargetList,
 												refnames_tlist,
 												&trivial_tlist);
+			*istrivial_tlist = trivial_tlist;
 			target = create_pathtarget(root, *pTargetList);
 
 			/* Apply projection to each path */
@@ -413,16 +364,16 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
 				lfirst(lc) = path;
 			}
 		}
+		postprocess_setop_rel(root, rel);
 	}
 	else
 	{
 		elog(ERROR, "unrecognized node type: %d",
 			 (int) nodeTag(setOp));
 		*pTargetList = NIL;
+		rel = NULL;				/* keep compiler quiet */
 	}
 
-	postprocess_setop_rel(root, rel);
-
 	return rel;
 }
 
@@ -441,7 +392,9 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 	Path	   *lpath;
 	Path	   *rpath;
 	List	   *lpath_tlist;
+	bool		lpath_trivial_tlist;
 	List	   *rpath_tlist;
+	bool		rpath_trivial_tlist;
 	List	   *tlist;
 	List	   *groupList;
 	double		dNumGroups;
@@ -461,7 +414,10 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 								  false, -1,
 								  refnames_tlist,
 								  &lpath_tlist,
-								  NULL);
+								  &lpath_trivial_tlist);
+	if (lrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, lrel, lpath_trivial_tlist, lpath_tlist,
+								NIL, NULL);
 	lpath = lrel->cheapest_total_path;
 	/* The right path will want to look at the left one ... */
 	root->non_recursive_path = lpath;
@@ -470,7 +426,10 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 								  false, -1,
 								  refnames_tlist,
 								  &rpath_tlist,
-								  NULL);
+								  &rpath_trivial_tlist);
+	if (rrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, rrel, rpath_trivial_tlist, rpath_tlist,
+								NIL, NULL);
 	rpath = rrel->cheapest_total_path;
 	root->non_recursive_path = NULL;
 
@@ -532,6 +491,204 @@ generate_recursion_path(SetOperationStmt *setOp, PlannerInfo *root,
 	return result_rel;
 }
 
+/*
+ * build_setop_child_paths
+ *		Build paths for the set op child relation denoted by 'rel'.
+ *
+ * interesting_pathkeys: if not NIL, also include paths that suit these
+ * pathkeys, sorting any unsorted paths as required.
+ * *pNumGroups: if not NULL, we estimate the number of distinct groups
+ * in the result, and store it there.
+ */
+static void
+build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
+						bool trivial_tlist, List *child_tlist,
+						List *interesting_pathkeys, double *pNumGroups)
+{
+	RelOptInfo *final_rel;
+	List	   *setop_pathkeys = rel->subroot->setop_pathkeys;
+	ListCell   *lc;
+
+	/* it can't be a set op child rel if it's not a subquery */
+	Assert(rel->rtekind == RTE_SUBQUERY);
+
+	/* when sorting is needed, add child rel equivalences */
+	if (interesting_pathkeys != NIL)
+		add_setop_child_rel_equivalences(root,
+										 rel,
+										 child_tlist,
+										 interesting_pathkeys);
+
+	/*
+	 * Mark rel with estimated output rows, width, etc.  Note that we have to
+	 * do this before generating outer-query paths, else cost_subqueryscan is
+	 * not happy.
+	 */
+	set_subquery_size_estimates(root, rel);
+
+	/*
+	 * Since we may want to add a partial path to this relation, we must set
+	 * its consider_parallel flag correctly.
+	 */
+	final_rel = fetch_upper_rel(rel->subroot, UPPERREL_FINAL, NULL);
+	rel->consider_parallel = final_rel->consider_parallel;
+
+	/* Generate subquery scan paths for any interesting path in final_rel */
+	foreach(lc, final_rel->pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		List	   *pathkeys;
+		Path	   *cheapest_input_path = final_rel->cheapest_total_path;
+		bool		is_sorted;
+		int			presorted_keys;
+
+		/*
+		 * Include the cheapest path as-is so that the set operation can be
+		 * cheaply implemented using a method which does not require the input
+		 * to be sorted.
+		 */
+		if (subpath == cheapest_input_path)
+		{
+			/* Convert subpath's pathkeys to outer representation */
+			pathkeys = convert_subquery_pathkeys(root, rel, subpath->pathkeys,
+												 make_tlist_from_pathtarget(subpath->pathtarget));
+
+			/* Generate outer path using this subpath */
+			add_path(rel, (Path *) create_subqueryscan_path(root,
+															rel,
+															subpath,
+															trivial_tlist,
+															pathkeys,
+															NULL));
+		}
+
+		/* skip dealing with sorted paths if the setop doesn't need them */
+		if (interesting_pathkeys == NIL)
+			continue;
+
+		/*
+		 * Create paths to suit final sort order required for setop_pathkeys.
+		 * Here we'll sort the cheapest input path (if not sorted already) and
+		 * incremental sort any paths which are partially sorted.
+		 */
+		is_sorted = pathkeys_count_contained_in(setop_pathkeys,
+												subpath->pathkeys,
+												&presorted_keys);
+
+		if (!is_sorted)
+		{
+			double		limittuples = rel->subroot->limit_tuples;
+
+			/*
+			 * Try at least sorting the cheapest path and also try
+			 * incrementally sorting any path which is partially sorted
+			 * already (no need to deal with paths which have presorted keys
+			 * when incremental sort is disabled unless it's the cheapest
+			 * input path).
+			 */
+			if (subpath != cheapest_input_path &&
+				(presorted_keys == 0 || !enable_incremental_sort))
+				continue;
+
+			/*
+			 * We've no need to consider both a sort and incremental sort.
+			 * We'll just do a sort if there are no presorted keys and an
+			 * incremental sort when there are presorted keys.
+			 */
+			if (presorted_keys == 0 || !enable_incremental_sort)
+				subpath = (Path *) create_sort_path(rel->subroot,
+													final_rel,
+													subpath,
+													setop_pathkeys,
+													limittuples);
+			else
+				subpath = (Path *) create_incremental_sort_path(rel->subroot,
+																final_rel,
+																subpath,
+																setop_pathkeys,
+																presorted_keys,
+																limittuples);
+		}
+
+		/*
+		 * subpath is now sorted, so add it to the pathlist.  We already added
+		 * the cheapest_input_path above, so don't add it again unless we just
+		 * sorted it.
+		 */
+		if (subpath != cheapest_input_path)
+		{
+			/* Convert subpath's pathkeys to outer representation */
+			pathkeys = convert_subquery_pathkeys(root, rel, subpath->pathkeys,
+												 make_tlist_from_pathtarget(subpath->pathtarget));
+
+			/* Generate outer path using this subpath */
+			add_path(rel, (Path *) create_subqueryscan_path(root,
+															rel,
+															subpath,
+															trivial_tlist,
+															pathkeys,
+															NULL));
+		}
+	}
+
+	/* if consider_parallel is false, there should be no partial paths */
+	Assert(final_rel->consider_parallel ||
+		   final_rel->partial_pathlist == NIL);
+
+	/*
+	 * If we have a partial path for the child relation, we can use that to
+	 * build a partial path for this relation.  But there's no point in
+	 * considering any path but the cheapest.
+	 */
+	if (rel->consider_parallel && bms_is_empty(rel->lateral_relids) &&
+		final_rel->partial_pathlist != NIL)
+	{
+		Path	   *partial_subpath;
+		Path	   *partial_path;
+
+		partial_subpath = linitial(final_rel->partial_pathlist);
+		partial_path = (Path *)
+			create_subqueryscan_path(root, rel, partial_subpath,
+									 trivial_tlist,
+									 NIL, NULL);
+		add_partial_path(rel, partial_path);
+	}
+
+	postprocess_setop_rel(root, rel);
+
+	/*
+	 * Estimate number of groups if caller wants it.  If the subquery used
+	 * grouping or aggregation, its output is probably mostly unique anyway;
+	 * otherwise do statistical estimation.
+	 *
+	 * XXX you don't really want to know about this: we do the estimation
+	 * using the subquery's original targetlist expressions, not the
+	 * subroot->processed_tlist which might seem more appropriate.  The reason
+	 * is that if the subquery is itself a setop, it may return a
+	 * processed_tlist containing "varno 0" Vars generated by
+	 * generate_append_tlist, and those would confuse estimate_num_groups
+	 * mightily.  We ought to get rid of the "varno 0" hack, but that requires
+	 * a redesign of the parsetree representation of setops, so that there can
+	 * be an RTE corresponding to each setop's output.
+	 */
+	if (pNumGroups)
+	{
+		PlannerInfo *subroot = rel->subroot;
+		Query	   *subquery = subroot->parse;
+
+		if (subquery->groupClause || subquery->groupingSets ||
+			subquery->distinctClause || subroot->hasHavingQual ||
+			subquery->hasAggs)
+			*pNumGroups = rel->cheapest_total_path->rows;
+		else
+			*pNumGroups = estimate_num_groups(subroot,
+											  get_tlist_exprs(subquery->targetList, false),
+											  rel->cheapest_total_path->rows,
+											  NULL,
+											  NULL);
+	}
+}
+
 /*
  * Generate paths for a UNION or UNION ALL node
  */
@@ -542,41 +699,38 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 {
 	Relids		relids = NULL;
 	RelOptInfo *result_rel;
-	double		save_fraction = root->tuple_fraction;
 	ListCell   *lc;
-	List	   *pathlist = NIL;
+	ListCell   *lc2;
+	ListCell   *lc3;
+	List	   *cheapest_pathlist = NIL;
+	List	   *ordered_pathlist = NIL;
 	List	   *partial_pathlist = NIL;
 	bool		partial_paths_valid = true;
 	bool		consider_parallel = true;
 	List	   *rellist;
 	List	   *tlist_list;
+	List	   *trivial_tlist_list;
 	List	   *tlist;
-	Path	   *path;
-
-	/*
-	 * If plain UNION, tell children to fetch all tuples.
-	 *
-	 * Note: in UNION ALL, we pass the top-level tuple_fraction unmodified to
-	 * each arm of the UNION ALL.  One could make a case for reducing the
-	 * tuple fraction for later arms (discounting by the expected size of the
-	 * earlier arms' results) but it seems not worth the trouble. The normal
-	 * case where tuple_fraction isn't already zero is a LIMIT at top level,
-	 * and passing it down as-is is usually enough to get the desired result
-	 * of preferring fast-start plans.
-	 */
-	if (!op->all)
-		root->tuple_fraction = 0.0;
+	List	   *groupList = NIL;
+	Path	   *apath;
+	Path	   *gpath = NULL;
+	bool		try_sorted = false;
+	List	   *union_pathkeys = NIL;
 
 	/*
 	 * If any of my children are identical UNION nodes (same op, all-flag, and
 	 * colTypes) then they can be merged into this node so that we generate
-	 * only one Append and unique-ification for the lot.  Recurse to find such
-	 * nodes and compute their children's paths.
+	 * only one Append/MergeAppend and unique-ification for the lot.  Recurse
+	 * to find such nodes.
 	 */
-	rellist = plan_union_children(root, op, refnames_tlist, &tlist_list);
+	rellist = plan_union_children(root,
+								  op,
+								  refnames_tlist,
+								  &tlist_list,
+								  &trivial_tlist_list);
 
 	/*
-	 * Generate tlist for Append plan node.
+	 * Generate tlist for Append/MergeAppend plan node.
 	 *
 	 * The tlist for an Append plan isn't important as far as the Append is
 	 * concerned, but we must make it look real anyway for the benefit of the
@@ -584,15 +738,71 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	 */
 	tlist = generate_append_tlist(op->colTypes, op->colCollations, false,
 								  tlist_list, refnames_tlist);
-
 	*pTargetList = tlist;
 
+	/* For UNIONs (not UNION ALL), try sorting, if sorting is possible */
+	if (!op->all)
+	{
+		/* Identify the grouping semantics */
+		groupList = generate_setop_grouplist(op, tlist);
+
+		if (grouping_is_sortable(op->groupClauses))
+		{
+			try_sorted = true;
+			/* Determine the pathkeys for sorting by the whole target list */
+			union_pathkeys = make_pathkeys_for_sortclauses(root, groupList,
+														   tlist);
+
+			root->query_pathkeys = union_pathkeys;
+		}
+	}
+
+	/*
+	 * Now that we've got the append target list, we can build the union child
+	 * paths.
+	 */
+	forthree(lc, rellist, lc2, trivial_tlist_list, lc3, tlist_list)
+	{
+		RelOptInfo *rel = lfirst(lc);
+		bool		trivial_tlist = lfirst_int(lc2);
+		List	   *child_tlist = lfirst_node(List, lc3);
+
+		/* only build paths for the union children */
+		if (rel->rtekind == RTE_SUBQUERY)
+			build_setop_child_paths(root, rel, trivial_tlist, child_tlist,
+									union_pathkeys, NULL);
+	}
+
 	/* Build path lists and relid set. */
 	foreach(lc, rellist)
 	{
 		RelOptInfo *rel = lfirst(lc);
+		Path	   *ordered_path;
 
-		pathlist = lappend(pathlist, rel->cheapest_total_path);
+		cheapest_pathlist = lappend(cheapest_pathlist,
+									rel->cheapest_total_path);
+
+		if (try_sorted)
+		{
+			ordered_path = get_cheapest_path_for_pathkeys(rel->pathlist,
+														  union_pathkeys,
+														  NULL,
+														  TOTAL_COST,
+														  false);
+
+			if (ordered_path != NULL)
+				ordered_pathlist = lappend(ordered_pathlist, ordered_path);
+			else
+			{
+				/*
+				 * If we can't find a sorted path, just give up trying to
+				 * generate a list of correctly sorted child paths.  This can
+				 * happen when type coercion was added to the targetlist due
+				 * to mismatching types from the union children.
+				 */
+				try_sorted = false;
+			}
+		}
 
 		if (consider_parallel)
 		{
@@ -615,28 +825,21 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	result_rel = fetch_upper_rel(root, UPPERREL_SETOP, relids);
 	result_rel->reltarget = create_pathtarget(root, tlist);
 	result_rel->consider_parallel = consider_parallel;
+	result_rel->consider_startup = (root->tuple_fraction > 0);
 
 	/*
-	 * Append the child results together.
-	 */
-	path = (Path *) create_append_path(root, result_rel, pathlist, NIL,
-									   NIL, NULL, 0, false, -1);
-
-	/*
-	 * For UNION ALL, we just need the Append path.  For UNION, need to add
-	 * node(s) to remove duplicates.
+	 * Append the child results together using the cheapest paths from each
+	 * union child.
 	 */
-	if (!op->all)
-		path = make_union_unique(op, path, tlist, root);
-
-	add_path(result_rel, path);
+	apath = (Path *) create_append_path(root, result_rel, cheapest_pathlist,
+										NIL, NIL, NULL, 0, false, -1);
 
 	/*
 	 * Estimate number of groups.  For now we just assume the output is unique
 	 * --- this is certainly true for the UNION case, and we want worst-case
 	 * estimates anyway.
 	 */
-	result_rel->rows = path->rows;
+	result_rel->rows = apath->rows;
 
 	/*
 	 * Now consider doing the same thing using the partial paths plus Append
@@ -644,7 +847,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	 */
 	if (partial_paths_valid)
 	{
-		Path	   *ppath;
+		Path	   *papath;
 		int			parallel_workers = 0;
 
 		/* Find the highest number of workers requested for any subpath. */
@@ -673,21 +876,137 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 		}
 		Assert(parallel_workers > 0);
 
-		ppath = (Path *)
+		papath = (Path *)
 			create_append_path(root, result_rel, NIL, partial_pathlist,
-							   NIL, NULL,
-							   parallel_workers, enable_parallel_append,
-							   -1);
-		ppath = (Path *)
-			create_gather_path(root, result_rel, ppath,
+							   NIL, NULL, parallel_workers,
+							   enable_parallel_append, -1);
+		gpath = (Path *)
+			create_gather_path(root, result_rel, papath,
 							   result_rel->reltarget, NULL, NULL);
-		if (!op->all)
-			ppath = make_union_unique(op, ppath, tlist, root);
-		add_path(result_rel, ppath);
 	}
 
-	/* Undo effects of possibly forcing tuple_fraction to 0 */
-	root->tuple_fraction = save_fraction;
+	if (!op->all)
+	{
+		double		dNumGroups;
+		bool		can_sort = grouping_is_sortable(groupList);
+		bool		can_hash = grouping_is_hashable(groupList);
+
+		/*
+		 * XXX for the moment, take the number of distinct groups as equal to
+		 * the total input size, i.e., the worst case.  This is too
+		 * conservative, but it's not clear how to get a decent estimate of
+		 * the true size.  One should note as well the propensity of novices
+		 * to write UNION rather than UNION ALL even when they don't expect
+		 * any duplicates...
+		 */
+		dNumGroups = apath->rows;
+
+		if (can_hash)
+		{
+			Path	   *path;
+
+			/*
+			 * Try a hash aggregate plan on 'apath'.  This is the cheapest
+			 * available path containing each append child.
+			 */
+			path = (Path *) create_agg_path(root,
+											result_rel,
+											apath,
+											create_pathtarget(root, tlist),
+											AGG_HASHED,
+											AGGSPLIT_SIMPLE,
+											groupList,
+											NIL,
+											NULL,
+											dNumGroups);
+			add_path(result_rel, path);
+
+			/* Try hash aggregate on the Gather path, if valid */
+			if (gpath != NULL)
+			{
+				/* Hashed aggregate plan --- no sort needed */
+				path = (Path *) create_agg_path(root,
+												result_rel,
+												gpath,
+												create_pathtarget(root, tlist),
+												AGG_HASHED,
+												AGGSPLIT_SIMPLE,
+												groupList,
+												NIL,
+												NULL,
+												dNumGroups);
+				add_path(result_rel, path);
+			}
+		}
+
+		if (can_sort)
+		{
+			Path	   *path = apath;
+
+			/* Try Sort -> Unique on the Append path */
+			if (groupList != NIL)
+				path = (Path *) create_sort_path(root, result_rel, path,
+												 make_pathkeys_for_sortclauses(root, groupList, tlist),
+												 -1.0);
+
+			path = (Path *) create_upper_unique_path(root,
+													 result_rel,
+													 path,
+													 list_length(path->pathkeys),
+													 dNumGroups);
+
+			add_path(result_rel, path);
+
+			/* Try Sort -> Unique on the Gather path, if set */
+			if (gpath != NULL)
+			{
+				path = gpath;
+
+				path = (Path *) create_sort_path(root, result_rel, path,
+												 make_pathkeys_for_sortclauses(root, groupList, tlist),
+												 -1.0);
+
+				path = (Path *) create_upper_unique_path(root,
+														 result_rel,
+														 path,
+														 list_length(path->pathkeys),
+														 dNumGroups);
+				add_path(result_rel, path);
+			}
+		}
+
+		/*
+		 * Try making a MergeAppend path if we managed to find a path with the
+		 * correct pathkeys in each union child query.
+		 */
+		if (try_sorted && groupList != NIL)
+		{
+			Path	   *path;
+
+			path = (Path *) create_merge_append_path(root,
+													 result_rel,
+													 ordered_pathlist,
+													 union_pathkeys,
+													 NULL);
+
+			/* and make the MergeAppend unique */
+			path = (Path *) create_upper_unique_path(root,
+													 result_rel,
+													 path,
+													 list_length(tlist),
+													 dNumGroups);
+
+			add_path(result_rel, path);
+		}
+	}
+	else
+	{
+		/* UNION ALL */
+		add_path(result_rel, apath);
+
+		if (gpath != NULL)
+			add_path(result_rel, gpath);
+	}
 
 	return result_rel;
 }
@@ -713,6 +1032,8 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root,
 			   *tlist,
 			   *groupList,
 			   *pathlist;
+	bool		lpath_trivial_tlist,
+				rpath_trivial_tlist;
 	double		dLeftGroups,
 				dRightGroups,
 				dNumGroups,
@@ -732,14 +1053,26 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root,
 								  false, 0,
 								  refnames_tlist,
 								  &lpath_tlist,
-								  &dLeftGroups);
+								  &lpath_trivial_tlist);
+	if (lrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, lrel, lpath_trivial_tlist, lpath_tlist,
+								NIL, &dLeftGroups);
+	else
+		dLeftGroups = lrel->rows;
+
 	lpath = lrel->cheapest_total_path;
 	rrel = recurse_set_operations(op->rarg, root,
 								  op->colTypes, op->colCollations,
 								  false, 1,
 								  refnames_tlist,
 								  &rpath_tlist,
-								  &dRightGroups);
+								  &rpath_trivial_tlist);
+	if (rrel->rtekind == RTE_SUBQUERY)
+		build_setop_child_paths(root, rrel, rpath_trivial_tlist, rpath_tlist,
+								NIL, &dRightGroups);
+	else
+		dRightGroups = rrel->rows;
+
 	rpath = rrel->cheapest_total_path;
 
 	/* Undo effects of forcing tuple_fraction to 0 */
@@ -876,13 +1209,16 @@ static List *
 plan_union_children(PlannerInfo *root,
 					SetOperationStmt *top_union,
 					List *refnames_tlist,
-					List **tlist_list)
+					List **tlist_list,
+					List **istrivial_tlist)
 {
 	List	   *pending_rels = list_make1(top_union);
 	List	   *result = NIL;
 	List	   *child_tlist;
+	bool		trivial_tlist;
 
 	*tlist_list = NIL;
+	*istrivial_tlist = NIL;
 
 	while (pending_rels != NIL)
 	{
@@ -921,75 +1257,14 @@ plan_union_children(PlannerInfo *root,
 														false, -1,
 														refnames_tlist,
 														&child_tlist,
-														NULL));
+														&trivial_tlist));
 		*tlist_list = lappend(*tlist_list, child_tlist);
+		*istrivial_tlist = lappend_int(*istrivial_tlist, trivial_tlist);
 	}
 
 	return result;
 }
 
-/*
- * Add nodes to the given path tree to unique-ify the result of a UNION.
- */
-static Path *
-make_union_unique(SetOperationStmt *op, Path *path, List *tlist,
-				  PlannerInfo *root)
-{
-	RelOptInfo *result_rel = fetch_upper_rel(root, UPPERREL_SETOP, NULL);
-	List	   *groupList;
-	double		dNumGroups;
-
-	/* Identify the grouping semantics */
-	groupList = generate_setop_grouplist(op, tlist);
-
-	/*
-	 * XXX for the moment, take the number of distinct groups as equal to the
-	 * total input size, ie, the worst case.  This is too conservative, but
-	 * it's not clear how to get a decent estimate of the true size.  One
-	 * should note as well the propensity of novices to write UNION rather
-	 * than UNION ALL even when they don't expect any duplicates...
-	 */
-	dNumGroups = path->rows;
-
-	/* Decide whether to hash or sort */
-	if (choose_hashed_setop(root, groupList, path,
-							dNumGroups, dNumGroups,
-							"UNION"))
-	{
-		/* Hashed aggregate plan --- no sort needed */
-		path = (Path *) create_agg_path(root,
-										result_rel,
-										path,
-										create_pathtarget(root, tlist),
-										AGG_HASHED,
-										AGGSPLIT_SIMPLE,
-										groupList,
-										NIL,
-										NULL,
-										dNumGroups);
-	}
-	else
-	{
-		/* Sort and Unique */
-		if (groupList)
-			path = (Path *)
-				create_sort_path(root,
-								 result_rel,
-								 path,
-								 make_pathkeys_for_sortclauses(root,
-															   groupList,
-															   tlist),
-								 -1.0);
-		path = (Path *) create_upper_unique_path(root,
-												 result_rel,
-												 path,
-												 list_length(path->pathkeys),
-												 dNumGroups);
-	}
-
-	return path;
-}
-
 /*
  * postprocess_setop_rel - perform steps required after adding paths
  */
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 40ea19e6f1..28fed9d87f 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1890,7 +1890,8 @@ transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt)
 	 * For now, we don't support resjunk sort clauses on the output of a
 	 * setOperation tree --- you can only use the SQL92-spec options of
 	 * selecting an output column by name or number.  Enforce by checking that
-	 * transformSortClause doesn't add any items to tlist.
+	 * transformSortClause doesn't add any items to tlist.  Note, if changing
+	 * this, add_setop_child_rel_equivalences() will need to be updated.
 	 */
 	tllen = list_length(qry->targetList);
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6ec81637c1..14ef296ab7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -400,6 +400,8 @@ struct PlannerInfo
 	List	   *distinct_pathkeys;
 	/* sortClause pathkeys, if any */
 	List	   *sort_pathkeys;
+	/* set operator pathkeys, if any */
+	List	   *setop_pathkeys;
 
 	/* Canonicalised partition schemes used in the query. */
 	List	   *part_schemes pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5f500a1c69..914d9bdef5 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -173,6 +173,10 @@ extern void add_child_join_rel_equivalences(PlannerInfo *root,
 											AppendRelInfo **appinfos,
 											RelOptInfo *parent_joinrel,
 											RelOptInfo *child_joinrel);
+extern void add_setop_child_rel_equivalences(PlannerInfo *root,
+											 RelOptInfo *child_rel,
+											 List *child_tlist,
+											 List *setop_pathkeys);
 extern List *generate_implied_equalities_for_column(PlannerInfo *root,
 													RelOptInfo *rel,
 													ec_matches_callback_type callback,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index e1d79ffdf3..5aeff21b96 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -44,7 +44,8 @@ extern PlannedStmt *standard_planner(Query *parse, const char *query_string,
 
 extern PlannerInfo *subquery_planner(PlannerGlobal *glob, Query *parse,
 									 PlannerInfo *parent_root,
-									 bool hasRecursion, double tuple_fraction);
+									 bool hasRecursion, double tuple_fraction,
+									 SetOperationStmt *setops);
 
 extern RowMarkType select_rowmark_type(RangeTblEntry *rte,
 									   LockClauseStrength strength);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 8e00716dc8..a52dec285d 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -53,6 +53,6 @@ extern void preprocess_aggrefs(PlannerInfo *root, Node *clause);
  * prototypes for prepunion.c
  */
 extern RelOptInfo *plan_set_operations(PlannerInfo *root);
-
+extern bool set_operation_ordered_results_useful(SetOperationStmt *setop);
 
 #endif							/* PREP_H */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 2de8924b52..7d59fb4431 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1396,6 +1396,7 @@ SELECT x FROM test3cs WHERE x ~ 'a';
  abc
 (1 row)
 
+SET enable_hashagg TO off;
 SELECT x FROM test1cs UNION SELECT x FROM test2cs ORDER BY x;
   x  
 -----
@@ -1448,6 +1449,7 @@ SELECT DISTINCT x FROM test3cs ORDER BY x;
  ghi
 (4 rows)
 
+RESET enable_hashagg;
 SELECT count(DISTINCT x) FROM test3cs;
  count 
 -------
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 7fdb685313..5fd54a10b1 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1472,14 +1472,19 @@ explain (costs off) select * from t union select * from t order by 1,3;
    Sort Key: t.a, t.c
    Presorted Key: t.a
    ->  Unique
-         ->  Sort
+         ->  Merge Append
                Sort Key: t.a, t.b, t.c
-               ->  Gather
+               ->  Gather Merge
                      Workers Planned: 2
-                     ->  Parallel Append
+                     ->  Sort
+                           Sort Key: t.a, t.b, t.c
                            ->  Parallel Seq Scan on t
+               ->  Gather Merge
+                     Workers Planned: 2
+                     ->  Sort
+                           Sort Key: t_1.a, t_1.b, t_1.c
                            ->  Parallel Seq Scan on t t_1
-(11 rows)
+(16 rows)
 
 -- Full sort, not just incremental sort can be pushed below a gather merge path
 -- by generate_useful_gather_paths.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 882017afc9..0fd0e1c38b 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -412,16 +412,17 @@ set enable_hashagg to off;
 explain (costs off)
 select count(*) from
   ( select unique1 from tenk1 union select fivethous from tenk1 ) ss;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                           QUERY PLAN                           
+----------------------------------------------------------------
  Aggregate
    ->  Unique
-         ->  Sort
+         ->  Merge Append
                Sort Key: tenk1.unique1
-               ->  Append
-                     ->  Index Only Scan using tenk1_unique1 on tenk1
+               ->  Index Only Scan using tenk1_unique1 on tenk1
+               ->  Sort
+                     Sort Key: tenk1_1.fivethous
                      ->  Seq Scan on tenk1 tenk1_1
-(7 rows)
+(8 rows)
 
 select count(*) from
   ( select unique1 from tenk1 union select fivethous from tenk1 ) ss;
@@ -814,6 +815,19 @@ select x from (values (row(1, 2)), (row(1, 3))) _(x) except select x from (value
  (1,3)
 (1 row)
 
+-- non-sortable type
+-- Ensure we get a HashAggregate plan.  Keep enable_hashagg=off to ensure
+-- there's no chance of a sort.
+explain (costs off) select '123'::xid union select '123'::xid;
+        QUERY PLAN         
+---------------------------
+ HashAggregate
+   Group Key: ('123'::xid)
+   ->  Append
+         ->  Result
+         ->  Result
+(5 rows)
+
 reset enable_hashagg;
 --
 -- Mixed types
@@ -950,16 +964,9 @@ select except select;
 -- check hashed implementation
 set enable_hashagg = true;
 set enable_sort = false;
-explain (costs off)
-select from generate_series(1,5) union select from generate_series(1,3);
-                           QUERY PLAN                           
-----------------------------------------------------------------
- HashAggregate
-   ->  Append
-         ->  Function Scan on generate_series
-         ->  Function Scan on generate_series generate_series_1
-(4 rows)
-
+-- We've no way to check hashed UNION as the empty pathkeys in the Append are
+-- fine to make use of Unique, which is cheaper than HashAggregate and we've
+-- no means to disable Unique.
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
                               QUERY PLAN                              
@@ -972,10 +979,6 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
                ->  Function Scan on generate_series generate_series_1
 (6 rows)
 
-select from generate_series(1,5) union select from generate_series(1,3);
---
-(1 row)
-
 select from generate_series(1,5) union all select from generate_series(1,3);
 --
 (8 rows)
@@ -1045,6 +1048,20 @@ select from generate_series(1,5) except all select from generate_series(1,3);
 --
 (2 rows)
 
+-- Try a variation of the above but with a CTE which contains a column, again
+-- with an empty final select list.
+-- Ensure we get the expected 1 row with 0 columns
+with cte as materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+--
+(1 row)
+
+-- Ensure we get the same result as the above.
+with cte as not materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+--
+(1 row)
+
 reset enable_hashagg;
 reset enable_sort;
 --
@@ -1081,6 +1098,7 @@ INSERT INTO t2 VALUES ('ab'), ('xy');
 set enable_seqscan = off;
 set enable_indexscan = on;
 set enable_bitmapscan = off;
+set enable_sort = off;
 explain (costs off)
  SELECT * FROM
  (SELECT a || b AS ab FROM t1
@@ -1162,6 +1180,7 @@ explain (costs off)
 reset enable_seqscan;
 reset enable_indexscan;
 reset enable_bitmapscan;
+reset enable_sort;
 -- This simpler variant of the above test has been observed to fail differently
 create table events (event_id int primary key);
 create table other_events (event_id int primary key);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 03837de846..80f28a97d7 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -555,6 +555,7 @@ SELECT x FROM test3cs WHERE x LIKE 'a%';
 SELECT x FROM test3cs WHERE x ILIKE 'a%';
 SELECT x FROM test3cs WHERE x SIMILAR TO 'a%';
 SELECT x FROM test3cs WHERE x ~ 'a';
+SET enable_hashagg TO off;
 SELECT x FROM test1cs UNION SELECT x FROM test2cs ORDER BY x;
 SELECT x FROM test2cs UNION SELECT x FROM test1cs ORDER BY x;
 SELECT x FROM test1cs INTERSECT SELECT x FROM test2cs;
@@ -562,6 +563,7 @@ SELECT x FROM test2cs INTERSECT SELECT x FROM test1cs;
 SELECT x FROM test1cs EXCEPT SELECT x FROM test2cs;
 SELECT x FROM test2cs EXCEPT SELECT x FROM test1cs;
 SELECT DISTINCT x FROM test3cs ORDER BY x;
+RESET enable_hashagg;
 SELECT count(DISTINCT x) FROM test3cs;
 SELECT x, count(*) FROM test3cs GROUP BY x ORDER BY x;
 SELECT x, row_number() OVER (ORDER BY x), rank() OVER (ORDER BY x) FROM test3cs ORDER BY x;
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d160db5458..f8826514e4 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -244,6 +244,12 @@ explain (costs off)
 select x from (values (row(1, 2)), (row(1, 3))) _(x) except select x from (values (row(1, 2)), (row(1, 4))) _(x);
 select x from (values (row(1, 2)), (row(1, 3))) _(x) except select x from (values (row(1, 2)), (row(1, 4))) _(x);
 
+-- non-sortable type
+
+-- Ensure we get a HashAggregate plan.  Keep enable_hashagg=off to ensure
+-- there's no chance of a sort.
+explain (costs off) select '123'::xid union select '123'::xid;
+
 reset enable_hashagg;
 
 --
@@ -302,12 +308,12 @@ select except select;
 set enable_hashagg = true;
 set enable_sort = false;
 
-explain (costs off)
-select from generate_series(1,5) union select from generate_series(1,3);
+-- We've no way to check hashed UNION as the empty pathkeys in the Append are
+-- fine to make use of Unique, which is cheaper than HashAggregate and we've
+-- no means to disable Unique.
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
 
-select from generate_series(1,5) union select from generate_series(1,3);
 select from generate_series(1,5) union all select from generate_series(1,3);
 select from generate_series(1,5) intersect select from generate_series(1,3);
 select from generate_series(1,5) intersect all select from generate_series(1,3);
@@ -330,6 +336,17 @@ select from generate_series(1,5) intersect all select from generate_series(1,3);
 select from generate_series(1,5) except select from generate_series(1,3);
 select from generate_series(1,5) except all select from generate_series(1,3);
 
+-- Try a variation of the above but with a CTE which contains a column, again
+-- with an empty final select list.
+
+-- Ensure we get the expected 1 row with 0 columns
+with cte as materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+
+-- Ensure we get the same result as the above.
+with cte as not materialized (select s from generate_series(1,5) s)
+select from cte union select from cte;
+
 reset enable_hashagg;
 reset enable_sort;
 
@@ -361,6 +378,7 @@ INSERT INTO t2 VALUES ('ab'), ('xy');
 set enable_seqscan = off;
 set enable_indexscan = on;
 set enable_bitmapscan = off;
+set enable_sort = off;
 
 explain (costs off)
  SELECT * FROM
@@ -407,6 +425,7 @@ explain (costs off)
 reset enable_seqscan;
 reset enable_indexscan;
 reset enable_bitmapscan;
+reset enable_sort;
 
 -- This simpler variant of the above test has been observed to fail differently
 
-- 
2.34.1

Robert Haas

robertmhaas@gmail.com

over 1 year ago

In reply to: David Rowley (#4)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

On Tue, May 21, 2024 at 8:44 AM David Rowley <dgrowleyml@gmail.com> wrote:

Thanks for having a look. I was planning to have the commit message
as per attached. I'd only split the patch for ease of review per
request of Tom. I should have mentioned that here.

I would adjust the exact wording in the final paragraph as required
depending on what plan materialises.

This also fixes up the comment stuff that Heikki mentioned.

The consensus on pgsql-release was to unrevert this patch and commit
the fix now, rather than waiting for the next beta. However, the
consensus was also to push the un-revert as a separate commit from the
bug fix, rather than together as suggested by Álvaro. Since time is
short due to the impending release and it's very late where you are,
I've taken care of this. Hope that's OK.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Robert Haas (#5)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

On Wed, 22 May 2024 at 05:36, Robert Haas <robertmhaas@gmail.com> wrote:

The consensus on pgsql-release was to unrevert this patch and commit
the fix now, rather than waiting for the next beta. However, the
consensus was also to push the un-revert as a separate commit from the
bug fix, rather than together as suggested by Álvaro. Since time is
short due to the impending release and it's very late where you are,
I've taken care of this. Hope that's OK.

Thanks for handling that. It's much appreciated.

David

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Heikki Linnakangas (#2)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

(Thanks for your review. I'm sorry I didn't have time and energy to
respond properly until now)

On Tue, 21 May 2024 at 23:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

BTW, could the same machinery be used for INTERSECT as well? There was a
brief mention of that in the original thread, but I didn't understand
the details. Not for v17, but I'm curious. I was wondering if
build_setop_child_paths() should be named build_union_child_paths(),
since it's only used with UNIONs, but I guess it could be used as is for
INTERSECT too.

I'd previously thought about that, but when I thought about it I'd
considered getting rid of the SetOp Intersect and replacing with a
join. To do that my conclusion was that we'd first need to improve
joins using IS NOT DISTINCT FROM, as that's the behaviour we need for
correct setop NULL handling. However, on relooking, I see that we
could still use SetOp Intersect with the flags injected into the
targetlist and get sorted results to it via Merge Append rather than
Append. That might require better Const handling than what's in the
patch today due to the 1/0 flag that gets added to the subquery tlist.
I was unsure how much trouble to go to for INTERSECT. I spent about 7
years in a job writing queries and don't recall ever feeling the need
to use INTERSECT. I did use EXCEPT, however... like at least once.
I'll probably circle back to it one day. People maybe don't use it
because it's so terribly optimised.

# Testing

postgres=# begin; create table foo as select i from generate_series(1,
1000000) i; create index on foo (i); commit;
BEGIN
SELECT 1000000
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;
QUERY PLAN

------------------------------------------------------------------------------------------------------
Unique (cost=144370.89..149370.89 rows=1000001 width=4)
-> Sort (cost=144370.89..146870.89 rows=1000001 width=4)
Sort Key: (1)
-> Append (cost=0.00..31038.44 rows=1000001 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replace
the "union" with "union all" you do get a MergeAppend.

Some more cases where I hoped for a MergeAppend:

I've not looked again in detail, but there was some discussion along
these lines in [1]/messages/by-id/CAApHDvqo1rV8O4pMU2-22iTASBXgnm4kbHF6A8_VMqiDR3hG8A@mail.gmail.com. I think the problem is down to how we remove
redundant PathKeys when the EquivalenceClass has a Const. There can
only be 1 value, so no need for a PathKey to represent that. The
problem with that comes with lack of equivalence visibility through
subqueries. The following demonstrates:

create table ab(a int, b int, primary key(a,b));
set enable_seqscan=0;
set enable_bitmapscan=0;

explain (costs off) select * from (select * from ab where a=1 order by
b) order by a,b;
QUERY PLAN
-------------------------------------------
Sort
Sort Key: ab.a, ab.b
-> Index Only Scan using ab_pkey on ab
Index Cond: (a = 1)
(4 rows)

explain (costs off) select * from (select * from ab where a=1 order by
b) order by b;
QUERY PLAN
-------------------------------------
Index Only Scan using ab_pkey on ab
Index Cond: (a = 1)
(2 rows)

Because the subquery only publishes that it's ordered by "b", the
outer query thinks it needs to sort on "a,b". That's a wasted effort
since the subquery has an equivalence class for "a" with a constant.
The outer query doesn't know that.

postgres=# explain (select i, 'foo' from foo union select i, 'foo' from
foo) order by 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Unique (cost=380767.54..395767.54 rows=2000000 width=36)
-> Sort (cost=380767.54..385767.54 rows=2000000 width=36)
Sort Key: foo.i, ('foo'::text)
-> Append (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=36)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)

postgres=# explain (select 'foo', i from foo union select 'bar', i from
foo) order by 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Unique (cost=380767.54..395767.54 rows=2000000 width=36)
-> Sort (cost=380767.54..385767.54 rows=2000000 width=36)
Sort Key: ('foo'::text), foo.i
-> Append (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=36)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)

This isn't great. I think it's for the same reason as mentioned above.
I didn't test, but I think the patch in [1]/messages/by-id/CAApHDvqo1rV8O4pMU2-22iTASBXgnm4kbHF6A8_VMqiDR3hG8A@mail.gmail.com should fix it. I need to
spend more time on it before proposing it for v18. It adds some
possibly expensive lookups and requires recursively searching
PathKeys. It's quite complex and needs more study.

The following two queries are the same from the user's point of view,
but one is written using WITH:

postgres=# explain (select i from foo union (select 1::int order by 1)
union select i from foo) order by 1;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Unique (cost=326083.66..336083.67 rows=2000001 width=4)
-> Sort (cost=326083.66..331083.67 rows=2000001 width=4)
Sort Key: foo.i
-> Append (cost=0.42..62076.87 rows=2000001 width=4)
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=4)
(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1) (select i
from foo union select i from x union select i from foo) order by 1;
QUERY PLAN

------------------------------------------------------------------------------------------------------
Unique (cost=0.89..82926.54 rows=2000001 width=4)
-> Merge Append (cost=0.89..77926.54 rows=2000001 width=4)
Sort Key: foo.i
-> Index Only Scan using foo_i_idx on foo
(cost=0.42..26038.42 rows=1000000 width=4)
-> Sort (cost=0.02..0.03 rows=1 width=4)
Sort Key: (1)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1
(cost=0.42..26038.42 rows=1000000 width=4)
(8 rows)

I would've expected a MergeAppend in both cases.

That's surprising. I don't have an answer without debugging and I
can't quite motivate myself to do that right now for this patch.

None of these test cases are broken as such, you just don't get the
benefit of the optimization. I suspect they might all have the same root
cause, as they all involve constants in the target list. I think that's
a pretty common use case of UNION though.

It's true that there are quite a few things left on the table here. I
think the refactoring work that has been done moves some of the
barriers away for future improvements. There just wasn't enough time
to get those done for v17. I hope to get some time and energy for it
in v18. I'm just thankful that you found no bugs. If you do happen to
find any, I can tell you a good time not to report them! :)

David

[1]: /messages/by-id/CAApHDvqo1rV8O4pMU2-22iTASBXgnm4kbHF6A8_VMqiDR3hG8A@mail.gmail.com